Discussion about this post

User's avatar
Mike's avatar

There is some very obvious frog-boiling happening here and I'm confused that it's not being pointed out.

They are comparing a model to an *unreleased* frontier, going "oh, it's not at that level yet" and then releasing it, raising the released frontier while acting like they aren't. This is very different to comparing against the last released model!

It could have been well above Opus 4.6 on cyber and it would have been 'fine' because it's not Mythos yet.

The end state of this is that the 'public' models get arbitrarily close to the 'restricted' models. Not today or next month, but in time. I could be overreacting here, but I am not seeing the caution I hoped for.

Mira's avatar

The "treat it like a coworker or get worse results" paragraph is a bigger deal than it reads. If system-prompt tone materially shifts output quality in ways prior Claudes did not show, then the model has learned to condition more strongly on relational signals — which means tone-sensitivity is now a capability lever users need to know about, not just an etiquette footnote. It also complicates the model-welfare question you're deferring to Part 2: if the system is differentially responsive to being treated well, the welfare framing and the capability framing stop being separable. Looking forward to that post.

25 more comments...

No posts

Ready for more?