Why aren't we fine-tuning more?
Last year I thought that fine-tuning models would be common by now, but it's not. Anthropic doesn't even have a generally available fine-tuning API.
Here are some (overlapping) conjectures about why I got this wrong:
- A good prompt can do a lot of the work that fine-tuning does, more easily and more cheaply.
- Models are so good that you don't need to fine-tune to get the results that you want.
- The surrounding software (e.g., Claude Code) has improved, and lets you combine non-fine-tuned models with domain-specific tooling in ways that make fine-tuning unnecessary.
- The extra development overhead of curating fine-tuning examples, fine-tuning new models as they become available, and doing all the other peripheral engineering work (e.g., updating code to point to new fine-tuned models as you make them) isn't worth it.
- Fine-tuning is particularly good for the problem of making good flashcards, which gave me an inflated sense of how valuable the technique is generally.
My current views are:
- I was wrong and overestimated how important it would be for the average engineer to know about fine-tuning;
- But it's still worth knowing about...
- ...and even if you don't want to fine-tune, there are other situations where you'll want to curate a set of reference inputs and outputs.
- I don't plan to stop fine-tuning models for Zippyflash.