Nate Meyvis

Why aren't we fine-tuning more?

Last year I thought that fine-tuning models would be common by now, but it's not. Anthropic doesn't even have a generally available fine-tuning API.

Here are some (overlapping) conjectures about why I got this wrong:

  1. A good prompt can do a lot of the work that fine-tuning does, more easily and more cheaply.
  2. Models are so good that you don't need to fine-tune to get the results that you want.
  3. The surrounding software (e.g., Claude Code) has improved, and lets you combine non-fine-tuned models with domain-specific tooling in ways that make fine-tuning unnecessary.
  4. The extra development overhead of curating fine-tuning examples, fine-tuning new models as they become available, and doing all the other peripheral engineering work (e.g., updating code to point to new fine-tuned models as you make them) isn't worth it.
  5. Fine-tuning is particularly good for the problem of making good flashcards, which gave me an inflated sense of how valuable the technique is generally.

My current views are:

  1. I was wrong and overestimated how important it would be for the average engineer to know about fine-tuning;
  2. But it's still worth knowing about...
  3. ...and even if you don't want to fine-tune, there are other situations where you'll want to curate a set of reference inputs and outputs.
  4. I don't plan to stop fine-tuning models for Zippyflash.

#I was wrong #generative AI #software