Nate Meyvis

Observability first

If software engineering is a war between the forces of order and those of chaos, AI-assisted engineering is delivering new and thunderously powerful tools, in huge quantities, to both sides of the struggle. What can we do to help the forces of order win the war?

Some things that appear constructive might not be. Testing, for example, is great, but just adding more tests is often counterproductive. The tests might be the wrong ones; they might be brittle; they might be expensive; they might have gaps; and so on. Other pro-order interventions (human-review policies, other administrative interventions, devising better prompts...) are like this too. The're often worth doing, but they come with tradeoffs and can be hard to get right.

One exception I've found is doing lots of observability work early in the development cycle. I mean "observability" in the broad sense: not just metrics and alarms, but also admin panels, database tables with streams of events, extra logs, test environments, and so on. Tackling these early is particularly apt in an AI-first environment:

  1. These tools are particularly valuable when you don't have the sense of the codebase you would have gained from writing it line by line.
  2. They're not hard to implement, at least on average. If you're governing your project with the AWS CDK,1 you can just tell it to make a test environment, alarm on X, emit metrics for Y and Z, and so on. These can, of course, sometimes go wrong, but they're unusually amenable to stabilization and de-risking as you iterate on your AI-assisted development processes. (Here's how I do it.)
  3. The marginal cost of the extra observability is usually very small, especially relative to the labor costs when things go wrong.
  4. The benefits you get usually have only mild tradeoffs, and are less likely than other interventions to cause big problems.2
  5. Different aspects of observability--the logs, the metrics, the admin panel, and so on--tend to complement each other better than, say, just adding lots of extra tests or requiring more and more human code review.

Those last two points are particularly important. Planning observability work often makes me think about investing, where it's so often essential to worry not just about a trade's expected value but its variance, its duration, and its correlation with other trades. I find that getting observability in place early gives steady, useful information, for a long time, at low cost, and in a way that doesn't interfere with other goals of the project.3

Observability work traditionally comes near the end of development, but the reasons for that are mostly obsolete, and in some cases inverted. We need more observability, earlier, than we ever have before. Meanwhile, our AI tools know all the fiddly details that have traditionally slowed us down in implementing observability tools, and they make us much better at using their outputs. Here as elsewhere, using AI effectively doesn't mean doing the same things in the same order but more quickly; rather, it requires structural changes. Many such changes will become necessary, and this is one that most of us can do, and benefit from, right away.


  1. ...but my claims here don't depend on specifics of the AWS ecosystem except insofar as observability is well-supported there.

  2. I imagine some readers will disagree with this. A full argument would require a different, and possibly longer, post (which I hope to write soon!). For now, I'll just note that it's generally easy to disable unhelpful alarms, to keep logs fast and helpful, and so on--especially given that you have AI helping you here also. The biggest exception is maintaining a test environment, which adds time and (possible) spurious failures to your deployments, and which one might reasonably choose not to implement at all until later in a development process.

  3. To be clear, I am not a finance whiz and spend effectively no time planning skill-intensive financial moves (I'm too busy programming). If I'm getting this metaphor all wrong, I'd be grateful for correction.

#future of work #generative AI #observability #psychology of software #software