Code correctly by conforming to the real world

I recently wrote about why it makes sense to use events in many programming situations. There's more to say about the example and why events are correct there: the lesson applies more generally and will help you make good decisions about more than just event modeling.

We are modeling a soccer game and we have:

  1. The basic apparatus of the game: the field, the ball, players, and teams.
  2. An assistant referee, which in the primitive state of this hypothetical application is just a mechanism for determining whether the ball is out of bounds or not.
  3. The notion of a throw-in when a ball is out of bounds.

There are, broadly, a few ways to handle the mechanics of awarding a throw-in:

  1. The assistant referee, upon determining that the ball is out of bounds, examines the recent state of the game (which team touched the ball last) and instigates the throw-in (to the other team) directly.
  2. Teams (or players) will attempt to take the throw-in, and the assistant referee will permit or deny this.
  3. The assistant referee will perform an event. This event will be processed by some central entity (you can think of this as a Referee object or a more general message bus.

This sort of decision comes up all the time. And they're some of the most important decisions you make in programming. Important, common decisions are the best ones to study!

In that post I argued for the last option. Here are two further notes about making this kind of decision:

Study dependency problems as dependency problems

First, notice that this is a dependency problem. It can look like, and is often treated as, a purely mechanical problem about imports or an ergonomic problem about the convenience of methods you'll be implementing. There are import mechanics and ergonomics here, but those are manifestations of the fact that dependencies are being created. Worrying about dependencies as dependencies helps you resolve these other issues; trying to think about the issues independently of the fact that dependencies are involved leads to confusion.

What are some good heuristics for determining dependency directions?

  1. Think about operations you need to do now, and choose the direction that will accommodate them most easily.
  2. Try to anticipate possible future operations, and choose the direction that will be accommodate those changes most easily.
  3. Try to anticipate possible future modifications, and choose the direction that will facilitate those.

Here, by the way, are some bad but commonly used heuristics for making this choice:

  1. Think about the order data will flow through the system, and make the later objects depend on the earlier ones. (John Ousterhout diagnoses "temporal bias" errors very well.)
  2. Think about your preferred database and choose the one that most intuitively fits it.

Here, the first two options are poor because they introduce inappropriate dependencies. The AssistantReferee object should not be rifling through game history, communicating with players, and doing whatever else would need doing if it were doing more than simply raising events. Nor should several players (or teams, or scorekeepers) have to issue separate requests to an assistant referee in order to determine what is happening. It might not sound as bad at first, but remember that not only the assistant referee but also the head referee, the head coach, and even the scoreboard or weather (is the game over? has thunder struck?) might change the result.

I find I make better decisions when I take the "because" in the previous paragraph very literally: bad decisions here are bad because they get the dependencies wrong. When you find yourself doing unnecessary quadratic work, debugging circular dependencies, or duplicating information because the alternative is too cumbersome, inappropriate dependencies are usually involved. Consider the case of quadratic work: this often happens because something is querying something else, and there are O(n) of each of those things. That, in turn, is often caused by information being stored with the latter thing instead of with the former thing. And that, in turn, is usually caused by a bad dependency.

Another hypothetical example in the spirit of something I've seen dozens of times: a library application is slow because every search requires a linear scan over all books, which in turn is because books are storing information about which users can check them out when. If the information were stored with the user or with some sort of neutral arbiter, this wouldn't be necessary.

The second note about the soccer-events example is more controversial:

Worry more about what the things you're modeling really are

The previous argument attempted to establish that many problems are not only correlated with but caused by dependency problems. These issues become clearer when you also recognize that dependency problems (and other problems) are caused by a bad match between your code and reality. That is, your bad dependency is a bad dependency because it fails to conform to the world. In the soccer example, when we introduce events, we conform more closely to reality: the job of the head referee really is to be, among other things, an event broker.

What about the heuristics from before? Going in turn:

  1. The operations you need to do now are very likely to be operations that have some real-world analogue. The more faithfully you model reality, the more elegantly you'll be able to implement those operations.
  2. Same with the operations you don't need to support now but do need to support later.
  3. And when things need to change more fundamentally in your code, it's often because something changed in the real world. The way things change in the real world are governed by their relationships in the real world. So, for example, if X persists even though some related Y changes, that's a good sign that X was more fundamental than Y and should have been modeled that way. Another common headache here is needing to do a large migration to, e.g., change a value from being treated as boolean to being treated as a member of a more robust enumeration. Almost always, the thing in the world that the value is representing never really had boolean structure. Rather, for contingent reasons there was no need to express more than two states for that value, and a bad assumption that only two states were possible was baked into the interface.

Each of those heuristics works because it keeps us in touch with reality. And this is where I find myself in deep agreement with the domain-driven design folks. I find the famous books in the tradition very difficult to read (except for Architecture Patterns with Python, which was a pleasure), and there tends to be a bit too much jargon and indoctrination there. But the core of DDD, as I understand it, is that you have to know which real-world domains you're working with, separate them cleanly, and model them correctly. This insight is valuable enough that it's worth wading through a lot of finger-wagging about CQRS.

That is, each of those three heuristics (and pretty much any other reasonable heuristic you can think of) is a manifestation of a more fundamental principle. Your code will ultimately be accountable to the real world in some way, and that its success and failure will depend largely on how faithfully it models the world and how nimbly it adapts to changes in it.


More posts about programming


Home page