More on cognitive debt and AI
Addy Osmani's recent article on cognitive debt1 is being cited widely and, as far as I can tell, approvingly. Osmani holds the common view that cognitive debt "applies most to agentic engineering" and is an important threat to software systems.
My view is still what I laid out here. My main claims there were:
I suspect, however, that:
- When you control for the size and scope of the project, cognitive debt tends to be at least as bad, and usually worse, in pre-AI codebases than in AI codebases;
- This fact is obscured because so much of what is normalized as traditional engineering work is in fact either managing crippling cognitive debt or avoiding it at enormous cost;
- The best users of AI are already pretty good at avoiding cognitive debt, and we're only going to get better at it.
Rather than repeat those arguments, I'll extract and comment on some claims from the "comprehension debt" piece (all blockquotes below are from that piece):
Unlike technical debt, which announces itself through mounting friction - slow builds, tangled dependencies, the creeping dread every time you touch that one module - comprehension debt breeds false confidence. The codebase looks clean. The tests are green.
Traditional technical debt also tends to coexist with passing tests. I can't speak to every agentic developer's experience, but I strongly suspect that the "creeping dread" also happens as technical debt mounts in codebases built with AI. It has certainly happened to me--just much, much faster.
[T]he human review process has always been a bottleneck - but a productive and educational one. Reading their PR forces comprehension...
I doubt that human code review has been so effective: again, once you correct for the amount of code being reviewed, traditionally engineered codebases also had as much "cognitive debt." This felt normal when the code had taken months or years to write, but stuns us when the process happens two orders of magnitude faster.
Another way to see this: if you took an AI-generated codebase and told senior engineers to spend as much time studying and improving it as they would have over years of code review, I suspect the "cognitive debt" would go away.
"[Deterministic verification] helps. It has a hard ceiling..."
Osmani's discussion here focuses on tests and only briefly mentions other aspects of deterministic verification. I'd emphasize that:
- If you really do use AI to make good narrow- and wide-scope tests, and enforce a variety of reasonable static analysis tools, your harness is vastly better than if you just check in whatever unit tests the AI gives you on its first pass.
- If you combine this with lots of observability tools and new AI-specific kinds of tooling, you can do even better.
- Again, if you correct for the size and maturity of the codebase, our understanding of traditional codebases was vastly inferior to what is being implied here.
A common proposed solution: write a detailed natural language spec first. Include it in the PR. Review the spec, not the code. Trust that the AI faithfully translated intent into implementation.
I have not seen many people argue that we won't need to worry about the translation between specification and code (really, I can't think of anyone who has said that, but I don't doubt that people have). Learning the hidden problems of a specification, and what was non-obviously underspecified, has always been one of the hard parts of software development. Getting from specification to implementation faster helps with that: when the source of the problem is that some things are irreducibly empirical, it's great to get the empirical evidence faster.
To be clear: this is only great if you have the discipline to examine the evidence and fix what needs fixing, and if your code is modular and intelligible enough to be fixed. Both traditional and agentic practices often fall short of that standard. I suspect that agentic engineering is positioned at least as well, and probably better, to address problems with specifications. Moreover, it ought to do even better as the field matures.
What doesn’t change is the need for someone with deep system context to maintain coherent understanding of what the codebase is actually doing and why.
At scale, it has never been necessary for a single person to understand a full codebase, because it has never been possible for a single person to understand a full codebase. Much modern software engineering can be viewed as attempts to address the problems that arise when nobody understands a full system.
At more modest sizes, it is possible for a single person to understand a whole codebase. Here I'm optimistic that AI is already making it easier, not harder, for a single person to maintain an effective understanding of a full system.
Finally, a meta-level note: I had trouble synthesizing an argument from the piece, and I suspect this is because the piece was largely generated with AI. It feels to me like LLM output, and Pangram agrees ("79% AI-generated;" "We are confident that this document is a mix of AI-generated, and human-written content."). I've done my best to interpret claims reasonably, but the more carefully I read and reread it, the more I found it to resist a strong, synthetic interpretation. This is further evidence that it is in fact primarily an AI output. If I'm wrong about this, I'd love to be corrected.
That's not to say that the piece is bad, or that people are wrong to agree with it. I don't publish such things, but I'm open-minded about what is valuable and what is possible. But if you're wondering why I haven't said more about the overall argument of the piece, it's because I don't think there's one to be found. (Consistency of mood is not the same thing as coherence of argument.)
It's called "Comprehension debt," but the first paragraph gives the more familiar "cognitive debt" as a synonym.↩