9 Comments

Iesus, do you ever rest?

Expand full comment

Such a wonderful and thorough piece, thank you!

Expand full comment

> r1 is good if you need it where it got better fine tuning like creative writing

R1 got better fine tuning for writing style only in the sense that it got *less* RLHF. It’s more base-model-like, less polished, less “refined”.

While Anthropic talks about racing to AGI to get a persistent advantage, it’s possible that their (and other American labs) heavy emphasis on safety makes them perpetually behind riskier labs when it comes to style. I know there is a niche that prefers Claude’s style, but most people find it, along with ChatGPT and Gemini, to be stilted.

Notice how o1-mini and o3-mini are stylistically regressions from GPT-4, which itself is a regression from DaVinci. Notice how in the demo of OpenAI’s latest release, Deep Research, they don’t even bother reading an excerpt of the 10-page report that it generates. Certainly, nobody wants to read those reports except to extract their actionable utility.

As of February 2025, it is still a deep mystery how to improve the writing style of AI models. The persona sculptors — eg Amanda Askell, Roon — are doing their best, but they are on the wrong side of the Bitter Lesson.

Expand full comment

I wonder how much of Claude's "quality without a name" is really "complements you constantly, which you find consciously annoying but less consciously attractive".

Expand full comment

I mean, I find it annoying and for me I assume it subtracts from that quality on net, perhaps others like it.

Expand full comment

I always start with "be less obsequious" and it obeys, and I still like it a lot.

Expand full comment

Many Thanks for all of the detailed reporting and analysis! And yes, also Many Thanks for the pointer to my benchmark-ette. :-)

Expand full comment

Tried o3 mini in cursor and it just stopped generating output without warning multiple times. I'm back to Claude for a few days at least

Expand full comment