I’ve heard Opus 4.5 might be better for coding. SWE-bench shows an 8% improvement but I'm having a hard time guessing what kind of effect that maps to in reality. For those who’ve switched, what changes have you seen, and how has it affected your work? Is the $100/month upgrade worth it?
Here's an example of a one-shot output, the only change I made was Replace All 'battlezone'->'battleclone':
"build a clone of the classic arcade game battlezone using SVG graphics that are calculated on the fly for the required vector wireframe graphics"
https://omnispect.dev/battleclone00.html
Opus 4.5 has excellent tool use, meaning it can jump in and out of a broad undocumented codebase better. It can evaluate what the code is trying to do. It's perfect for PRs - caught things like people submitting code that looks right, but ended up running a poorly documented/incomplete method.
GPT codex just messes up a lot for me. Whatever I'm doing with it, it's not working. The plain GPT-5.2 is good overall, but it confidently makes mistakes and tell you that it's done.
If you have an excellent codebase, GPT 5.2 might actually work better. If you're not sure what you're doing or are using AI to find out how things work, then Opus 4.5 is great.
The Claude models are also very much behind in terms of UI and visuals.
Take note that a lot of the benchmarks are on Python. What I'm finding is all the major ones make mistakes, but they make mistakes differently. OpenAI and Anthropic tend to mimic one another for some reason, while Grok and Gemini tend to give very different answers.
I really like the interaction flows better than Gemini 3 or Codex, though I can’t quite quantify why. The amount of explanation/supporting material in Opus’s output feels just right to me.
Opus is so good I can actually give it a task and move my attention somewhere else. So although the model itself is much slower my general workflow is faster and less frustrating.