Amusing mistakes and what they teach us
AI is getting better and better, but still very bad at time estimates for software development. They tend to be between one and two orders of magnitude too slow, if you're a strong agentic programmer. This is true whether it's offering the estimate unprompted or responding to a question. Yesterday Codex optimistically mentioned that a project would take only one to two days; it was done 20 minutes later.
The cause is obvious on the surface but subtle in its implications: the AI has only the shallowest understanding that you are using AI.1 If you know how to interpret the time estimates, the mistakes don't matter too much. The cost to you will probably be just a few extra approvals or PR reviews when the AI has overestimated the scope of what it's done.2
This phenomenon has more costly manifestations, though:
- Getting this sort of estimate wrong can cause it to undertake the wrong category of project. A sufficiently fast, clean migration becomes a more routine code change, and complicated multi-stage cutover processes add risk and complexity for little benefit.
- It might recommend the wrong tool. The AI probably doesn't think you know Rust--or, at least, its implicit judgment of your Rust abilities doesn't much account for the fact that you have its help. The idea to use Rust is probably going to have to come from you.3
- Code comments from AI seem designed to be human-readable, which is understandable given the history and design of code commenting systems. But most consumption of AI-generated comments will not be done by humans. I'm nearly certain that much more effective systems of commenting should and will develop. (I plan to write more about this.)
We will, I think, look back on this era as one in which human and AI inputs were comically undifferentiated. Time estimates are only an obvious example of that comedy. AI is choosing system architectures, languages, explanations, and more; even if it's obvious when AI made something, the really striking thing is that this is all being done as if AI and humans will be doing approximately the same things indefinitely. Imagine learning that the instruction codes in assembly language had been devised to remind you of your favorite foods. This would be absurd, despite the fact that some humans, some of the time, need to be able to work with assembly language.
I won't pretend to know where the equilibria of this system are, but I'm quite sure we're not at any of them. Here as elsewhere, we laugh at things when they indicate absurdity, instability, or disorder. We don't have to know exactly what the stable, ordered future state will be to let our sense of humor help get us there.
There are subtleties here. Is this to be understood just as an expression of the training data? Does it matter that many people getting information from AI are not AI-first users, so that the estimates will be more accurate for them? Could and should we try to correct this in a system prompt or in other guidance? However important these questions are, they're largely independent of my discussion here.↩
To be clear: I am all for small PRs and deliberate progress. The less constructive, more comical side of this is when the AI decides to create a whole phase of a project, with its own documentation or feature flags or whatever, for something that is 90% likely to be done over a cup of coffee.↩
I'm sympathetic to the idea that AI will cause a lot more of our systems to be written in languages like Rust. That's a complicated sociological-economic question, though, and the point I'm making doesn't depend on it.↩