Discussion about this post

User's avatar
Charles Foster's avatar

Correction to this:

> The one catch is that with r1 you do have to pay for the <thinking> tokens. I asked r1 to estimate what percentage of tokens are in the CoT, and it estimated 60%-80%, with more complex tasks using relatively more CoT tokens, in an answer that was roughly 75% within the CoT. If you only care about the final output, then that means this is more like 10 times cheaper than o1 rather than 30 times cheaper. So it depends on whether you’re making use of the CoT tokens.

30x cheaper is correct, because you have to pay for o1's <thinking> tokens too. If you read the fine print on OpenAI's pricing page (https://openai.com/api/pricing/), they say:

> Output tokens include internal reasoning tokens generated by the model that are not visible in API responses.

Expand full comment
Coagulopath's avatar

It's definitely a cool model. Gives me hope that RL can help with mode collapse and lost creativity (the single big unsolved issue with AI right now, in my view).

I want to see how it does on "pick a random number"-style tests that expose an LLM's creativity/lack thereof.

>I’m with Eliezer here. That’s still slop.

I agree with both of you, but it still feels like progress in a sense.

I heard someone say "r1 has horrible taste, while other models have no taste at all" which feels directionally pointed at the truth. And would be better at than what we had before. If you have the capacity for bad taste, you might someday have good taste...

Expand full comment
19 more comments...

No posts