> r1 is good if you need it where it got better fine tuning like creative writing
R1 got better fine tuning for writing style only in the sense that it got *less* RLHF. It’s more base-model-like, less polished, less “refined”.
While Anthropic talks about racing to AGI to get a persistent advantage, it’s possible that their (and other American labs) heavy emphasis on safety makes them perpetually behind riskier labs when it comes to style. I know there is a niche that prefers Claude’s style, but most people find it, along with ChatGPT and Gemini, to be stilted.
Notice how o1-mini and o3-mini are stylistically regressions from GPT-4, which itself is a regression from DaVinci. Notice how in the demo of OpenAI’s latest release, Deep Research, they don’t even bother reading an excerpt of the 10-page report that it generates. Certainly, nobody wants to read those reports except to extract their actionable utility.
As of February 2025, it is still a deep mystery how to improve the writing style of AI models. The persona sculptors — eg Amanda Askell, Roon — are doing their best, but they are on the wrong side of the Bitter Lesson.
I wonder how much of Claude's "quality without a name" is really "complements you constantly, which you find consciously annoying but less consciously attractive".
Iesus, do you ever rest?
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/o3-mini-early-days
Such a wonderful and thorough piece, thank you!
> r1 is good if you need it where it got better fine tuning like creative writing
R1 got better fine tuning for writing style only in the sense that it got *less* RLHF. It’s more base-model-like, less polished, less “refined”.
While Anthropic talks about racing to AGI to get a persistent advantage, it’s possible that their (and other American labs) heavy emphasis on safety makes them perpetually behind riskier labs when it comes to style. I know there is a niche that prefers Claude’s style, but most people find it, along with ChatGPT and Gemini, to be stilted.
Notice how o1-mini and o3-mini are stylistically regressions from GPT-4, which itself is a regression from DaVinci. Notice how in the demo of OpenAI’s latest release, Deep Research, they don’t even bother reading an excerpt of the 10-page report that it generates. Certainly, nobody wants to read those reports except to extract their actionable utility.
As of February 2025, it is still a deep mystery how to improve the writing style of AI models. The persona sculptors — eg Amanda Askell, Roon — are doing their best, but they are on the wrong side of the Bitter Lesson.
I wonder how much of Claude's "quality without a name" is really "complements you constantly, which you find consciously annoying but less consciously attractive".
I mean, I find it annoying and for me I assume it subtracts from that quality on net, perhaps others like it.
I always start with "be less obsequious" and it obeys, and I still like it a lot.
Many Thanks for all of the detailed reporting and analysis! And yes, also Many Thanks for the pointer to my benchmark-ette. :-)
Tried o3 mini in cursor and it just stopped generating output without warning multiple times. I'm back to Claude for a few days at least