First impressions of Opus 4.7
Happy Opus 4.7 day! This seems to be the largest model upgrade I can remember in a while. I remember feeling that things were getting qualitatively much better sometime around this post, but that was almost three months ago!
A few misc. notes:
- I'm probably comparing this not just to 4.6 but to the recently degraded versions of 4.6.
- It's very hard to separate the evaluation of a new model from the interest in seeing a new model.
- The blog post I linked above emphasizes a combination of "real-world" know-how, especially on subtler, longer-running software tasks. This is quite true to my experience so far: I find myself less frequently annoyed at its having misinterpreted an instruction, gotten something wrong, or overlooked something obvious.
- I still think we can't measure these things well.
- I'm still stunned to be living through something that causes me to write "three months ago!" and obviously mean it as a huge amount of time.
- ...but literally as I wrote this, 4.7 did some frustratingly silly things. I'm still often using Codex, which I think remains underrated.