Why aren't we fine-tuning more?

23 Mar, 2026

Last year I thought that fine-tuning models would be common by now, but it's not. Anthropic doesn't even have a generally available fine-tuning API.

Here are some (overlapping) conjectures about why I got this wrong:

A good prompt can do a lot of the work that fine-tuning does, more easily and more cheaply.
Models are so good that you don't need to fine-tune to get the results that you want.
The surrounding software (e.g., Claude Code) has improved, and lets you combine non-fine-tuned models with domain-specific tooling in ways that make fine-tuning unnecessary.
The extra development overhead of curating fine-tuning examples, fine-tuning new models as they become available, and doing all the other peripheral engineering work (e.g., updating code to point to new fine-tuned models as you make them) isn't worth it.
Fine-tuning is particularly good for the problem of making good flashcards, which gave me an inflated sense of how valuable the technique is generally.

My current views are:

I was wrong and overestimated how important it would be for the average engineer to know about fine-tuning;
But it's still worth knowing about...
...and even if you don't want to fine-tune, there are other situations where you'll want to curate a set of reference inputs and outputs.
I don't plan to stop fine-tuning models for Zippyflash.