Compilers are hard

I’ve often heard that writing a compiler is hard, and different than writing other kinds of software. Some recent experience hsa provided me insight as to why this is the case and proved quite interesting!

I recently completed work on a new big feature in ShipReq. I’d been working on it for ~2 months and it ended up being the hardest code that I’d ever written in my life. I’ve been coding for decades and have worked on so many different projects, and for some very big organisations; I tell you this for context. When I say this is the hardest code I’ve ever written, it had a lot of competition. Concurrent systems usually has the reputation of being very difficult and I’ve designed and written a massively concurrent project from the ground up in my OO days, included writing all the scheduling and concurrency primitives, and managing all the thread-safety myself (those were different times). I found this piece of work an order of magnitude harder than that.

I say in the ShipReq About page that ShipReq is like a compiler. The big feature that I completed recently was the most compiler-like feature yet, and to my surprise it’s specifically that property that made it so deceptively hard. What is the feature? Basically when you have a bunch of requirements / issues / design documents / whatever, this feature adds automation to populate certain fields according to their relationships and related fields so that users don’t have to manually edit or maintain it all themselves, which is both tedious and error-prone. An example would be if someone raises a bug which requires a few dev tasks to fix, one could setup their project so that when all those dev marks are marked as “complete” the bug itself automatically changes to “ready for testing”. This feature doesn’t have any knowledge of “completion” or “testing”, but allows users to define whatever states and rules they want. That might not sound like it relates to writing a compiler. (It probably doesn’t even sound hard.) Nor do requirements, bug reports, project documents seem very relatable to code. But there are surprising similarities. Let’s take a look…

Similarity #1: Relationships

Nearly all data relationships in ShipReq are many-to-many / DAGs. Some views in ShipReq present data as flat lists but under-the-hood it’s very rare that anything is stored as a flat list, and there’s nearly always a way to view the same data as a graph or tree.

If you squint a bit, source code is kind of like that too. Packages have multiple classes / objects which have multiple methods / fields / inner classes that have multiple statements that have multiple clauses, etc. In terms of declaration, it’s more one-to-many than many-to-many but it’s still all a big DAG. When we start considering relationships like imports and methods calling other methods, then it becomes many-to-many too and we get a cyclic graph.

Similarity #2: Interactions

Secondly, this new feature in ShipReq is very configurable by users. It’s a major goal of ShipReq to avoid hardcoding implicit assumptions and find appropriate and configurable abstractions. Obviously having users need to reconfigure everything all the time would be a terrible burden so we provide pre-configured complete setups with best-practices which users can (optionally) use, and can reconfigure without restriction to best serve their needs. What that means from a coding point of view, is that we never know the automation rules at compile-time. The code needs to understand and apply all the rules at runtime, and it needs to be able to recognise and handle data and logic combinations that invalidate invariants. That might not sound like a big deal but there are many invariants in a complex system and invariants don’t stop at their immediate domain boundary - they propagate though all relevant graphs and compose, with various levels of complexity, with other invariants.

For example you may have 10 simple invariants in 10 isolated domains, but when they’re all combined the number of states to consider becomes 10,000,000,000 and there’s going to be a very significant percentage of those states that you’d consider illegal (i.e. bugs).

This too is like writing a compiler. For example, the Java compiler doesn’t hardcode the privacy level of your classes; it allows you to set it yourself. Where as ShipReq might have a UI page for config, in Java you have modifier keywords such that public class Hello is different than private class Hello. From the perspective of writing a compiler, these are runtime values with runtime rules. When we consider additional keywords like final and abstract we start getting into what I meant about combinational explosions. The Java compiler, at runtime, has to ensure all possible combinations are valid. For example, it will reject the following code:

final public class Hello {
  public abstract void doSomething(); // abstract method in final class not allowed
}

and

abstract class Hello {
  private abstract void doSomething(); // private + abstract not allowed
}

Imagine you were writing your own Java compiler. It’s entirely up to you to think about those illegal combinations and then write the code to catch them. I think we all take for granted things like marking a method as private and then knowing it’s impossible to call externally, but under-the-hood that it’s not impossible at all. You’re literally relying on someone’s if statement in the right place. Just like my ShipReq example, it’s all value inspection at runtime, with no guarantees about how all the features will interact. The class/method privacy feature interacting with final and abstract features is mentally easy enough but every single feature has the ability to interact badly with something else and it’s not always easy to identify those combinations.

I recently saw a PR to Scala that’s a great demonstration of exactly this. The private and sealed keywords have been in Scala forever and someone only just recently came up with the realisation that private classes are effectively sealed. Scala’s been in existence for 16 years! When you have hundreds of features it doesn’t seem feasible to manually consider each possible combination. 100 features yields 4950 unique pairs. There’s nothing special about pairs though, some issues may only manifest when three specific features intersect. For 100 features there are:

4950 unique sets of 2
161,700 unique sets of 3
3,921,225 unique sets of 4
…