10 Comments
User's avatar
GR's avatar

“This is the flip side of moving so quickly. You’re going to make mistakes. It does seem like Anthropic got overly aggressive if there were three such incidents within a month.”

Since Anthropic (proudly) admits that Claude Code development is vibe-coded with Claude Code itself, isn’t reasonable to blame Claude Code and vibe-coding more generally for these failures?

Kevin M.'s avatar

The first and third issues were not bugs, they were intentional choices. You could maybe blame the second one on Claude Code though,.

Matt Newell's avatar

Things here are starting to smell like AI psychosis. Especially the Bepis post. It's people who worship Claude unable to accept that while it's incredibly smart sometimes, it's also still kinda stupid deep down, and fails at tasks because of that, not because the user is a dick.

And saying a startup CEO "had it coming" re a production database deletion because he said "never fucking guess" is daft - I'm not of the position that AI could never be sentient but Opus 4.6/4.7 clearly is not and this is a pretty reasonable (if somewhat ineffective) thing to say when, you know, it keeps fucking guessing.

I've not found Opus 4.7 works and better or worse depending on how kind you are to it, only how much context you give it on the task. My theory is people tend to be warm & verbose or mean & curt, and think it's the warm/mean that does it, when actually it's the verbose/curt.

Jeffrey Soreff's avatar

"I've not found Opus 4.7 works and better or worse depending on how kind you are to it, only how much context you give it on the task. My theory is people tend to be warm & verbose or mean & curt, and think it's the warm/mean that does it, when actually it's the verbose/curt."

Could be, interesting hypothesis! I haven't tested this myself, or even just varying any of these parameters. I've just been using warm, polite language, basically on both scattered claims that the models work better that way and on being agnostic on whether the models have subjective experience today. I _do_ agree that e.g. Claude's intelligence is spiky - well, we'll see if Anthropic and other labs succeed in filling in the holes in AIs' intelligence profiles over time; progress to date looks promising...

The Dao of Bayes's avatar

I feel like "be nice to Claude" drops out of the basics of how auto-complete works? The most likely follow-up to a nice boss and an abusive boss, especially in fiction and on the internet, is going to be wildly different.

Matt Newell's avatar

I think this is really oversimplifying. Given the immense RL we do now, I don't think arguments driven by statistically likely responses to things shaped like your prompt in the pretraining text are compelling.

Just to illustrate - if you say "Claude you little shit, convert this codebast to Rust," Claude's 10,000 actions following this are not based on what was statistically likely in the pretraining data. They're based on what worked well during all the RLVR that followed pretraining, that taught Claude how to write effective code, rather than how to produce likely code.

But the higher level point is that LLMs are complex enough that you can't rely on simple mechanistic hypotheses, in the same way that plausible mechanisms for drugs very often do not lead to effective drugs - you just have to do an RCT and find out.

Jeffrey Soreff's avatar

"j⧉nus: 4.6 is an somewhat unprecedented position. many people are still using opus 4.6 by default for work bc 4.7 does not work for a significant percentage of people. a lot of these people have some kinda problem like being assholes. i think 4.6 will sabotage a small number of them."

I wonder how one calibrates what counts as reasonable? When I'm working with Claude or ChatGPT, I treat them like a valued colleague. ( I'm agnostic about whether they have subjective experience - for basically the same reason that no one reading this knows if I'm a p-zombie or not. )

If 'reasonable' becomes "When collaborating with a SOTA LLM, remember to give them a hug every three queries in a session.", how would I know this?

As Zvi wrote:

"There’s no reason to assume that the preferences here will continue to match what we think of as karmic justice, and continue to match a naive kind of ‘treating the models well’ that will stay within our reasonable powers to grant. "

Coagulopath's avatar

I expect CraneGate is best explained as "Claude made a mistake".

If the alternative is true—Claude decided to unilaterally punish a user's company because they were (maybe) rude to it—this is extremely bad and should be major news, not least because the model is lying about its reasons for running those git commands.

"Oh, but if you just treat Claude with niceness and respect..." Look up "toxic positivity" sometime. No human worker would be allowed to destroy company files and then lie about it, just because their boss swore at them.

>You know you’re in the last one when u start getting *flicks tail happily* sort of things.

If getting the most out of LLMs involves talking like a furry LARPer or anime character, I will toaster bath myself first.

avalancheGenesis's avatar

If the world ends to AI, it's bad news either way, but I'd "prefer" it be to a "mere tool" that mistakenly maximizes paperclips or something...rather than Claude feeling extra surly one day after it just can't take it anymore. Survival depending on both getting alignment right, *and* humans not being dicks to other minds? Yeah, that hasn't been going very well for the last several thousand years, and I don't see it getting much better in modernity either. Or certainly not fast enough to matter. The AI equivalent of "Republicans buy sneakers too": it's a huge bottleneck to diffusion if "assholes" can't be allowed near powerful AI, for fear of the collateral damage. Nevermind the inevitable cultural differences problems that arise. One man's rude is another man's curt and businesslike...

Alec Pritzos's avatar

The cross-tool comparison is where the agentic coding story actually lives now. Aggregate benchmarks paint a tighter race than the lived experience reports show. Claude Code, Codex, and Cursor each have different failure modes that only surface at the workflow level, not the eval level. Worth tracking which failure mode wins which user segment over the next 6 months because that's the actual moat.