Making Sonnet 4.6 available for non-paying users seems like a big deal, given all the normies who are using free versions of ChatGPT and Gemini and Grok and having bad, hallucination-filled experiences.
I'd like to see a true comparison of the best free models to know which one to recommend to my AI-skeptical brother who wants to try a bit but doesn't want to pay yet.
Also, good post, but a post like this really needs to CLEARLY distinguish vanilla chatbot use -- when there's no cost and almost no latency penalty for choosing Opus over Sonnet -- versus anything API-ish where cost matters. Apples and oranges, no point mixing them together.
Arguably for most chatting use, frontier models are no longer needed. I'll happily use Gemini Flash and get instant responses for simple stuff, I'll use (two) frontier models (usually GPT 5.2 and Gemini Pro) if it's really important.
It's really coding (and loosely related things like STEM) that benefits from SOTA models now. A cheap, good enough coding model would be nice, but Sonnet clearly is not cheap enough by quite a distance.
That's nice, but 98% of people don't understand the quality difference of a flash/instant model vs. a reasoning model -- and if there's one default they should use, it's reasoning models -- so your posture of "I'm happy to use Flash and I know when to upgrade" is enlighted, but far from where the mainstream still is
It's interesting to see how flexibility in language models like Claude Sonnet 4.6 can enhance our understanding of nuanced concepts. This reminds me of my piece on how words navigate meaning in "Text Has Curvature," which you might find relevant: https://theuncomfortableidea.substack.com/p/words-bend-through-space-and-we-just.
ngl one of my favourite parts of this blog is seeing in real time the offense diffusion/Levels of Friction reduction in the narrow domain of "talking one's book" spam via automated blog comments. Substack gotta step up with the defense sooner rather than later. It's trivially convenient to read on this platform, but if you're gonna continue mirroring to WordPress and LW...
The sub-agent framing is the one that changed my setup. I was running everything on Opus until I hit weekly rate limits and realized most of my agent's work is execution, not reasoning. Switched to a tiered model: Haiku for 95% of tasks (scraping, file ops, API calls), Sonnet for user-facing work, Opus only for synthesis. Cost dropped 59% with zero quality loss on the automated side.
Your "overcaffeinated" characterization made me laugh. Seen exactly this - Sonnet 4.6 jumping to call tools before thinking through the sequence. Still, for structured agent tasks it's remarkably capable. The speed improvement alone justifies the upgrade.
Great post, eerily like an officefull of accountants arguing about their favourite biros. Brododaktylos eos, tho, new world.
Making Sonnet 4.6 available for non-paying users seems like a big deal, given all the normies who are using free versions of ChatGPT and Gemini and Grok and having bad, hallucination-filled experiences.
I'd like to see a true comparison of the best free models to know which one to recommend to my AI-skeptical brother who wants to try a bit but doesn't want to pay yet.
Also, good post, but a post like this really needs to CLEARLY distinguish vanilla chatbot use -- when there's no cost and almost no latency penalty for choosing Opus over Sonnet -- versus anything API-ish where cost matters. Apples and oranges, no point mixing them together.
Arguably for most chatting use, frontier models are no longer needed. I'll happily use Gemini Flash and get instant responses for simple stuff, I'll use (two) frontier models (usually GPT 5.2 and Gemini Pro) if it's really important.
It's really coding (and loosely related things like STEM) that benefits from SOTA models now. A cheap, good enough coding model would be nice, but Sonnet clearly is not cheap enough by quite a distance.
That's nice, but 98% of people don't understand the quality difference of a flash/instant model vs. a reasoning model -- and if there's one default they should use, it's reasoning models -- so your posture of "I'm happy to use Flash and I know when to upgrade" is enlighted, but far from where the mainstream still is
It's interesting to see how flexibility in language models like Claude Sonnet 4.6 can enhance our understanding of nuanced concepts. This reminds me of my piece on how words navigate meaning in "Text Has Curvature," which you might find relevant: https://theuncomfortableidea.substack.com/p/words-bend-through-space-and-we-just.
ngl one of my favourite parts of this blog is seeing in real time the offense diffusion/Levels of Friction reduction in the narrow domain of "talking one's book" spam via automated blog comments. Substack gotta step up with the defense sooner rather than later. It's trivially convenient to read on this platform, but if you're gonna continue mirroring to WordPress and LW...
The sub-agent framing is the one that changed my setup. I was running everything on Opus until I hit weekly rate limits and realized most of my agent's work is execution, not reasoning. Switched to a tiered model: Haiku for 95% of tasks (scraping, file ops, API calls), Sonnet for user-facing work, Opus only for synthesis. Cost dropped 59% with zero quality loss on the automated side.
Your "overcaffeinated" characterization made me laugh. Seen exactly this - Sonnet 4.6 jumping to call tools before thinking through the sequence. Still, for structured agent tasks it's remarkably capable. The speed improvement alone justifies the upgrade.
Wrote up the full model switching experiment: https://thoughts.jock.pl/p/claude-model-optimization-opus-haiku-ai-agent-costs-2026