Discussion about this post

User's avatar
Tom P's avatar

On software engineering productivity gains from AI, there could be bottlenecks that keep overall productivity gains to 2.5x while increasing 'speed to write code' 100x. I'm not a software engineer, I work in finance. If excel model creation became 100x faster, my overall productivity would only increase 2-3x as excel isn't the only bottleneck.

Mike's avatar

I am very frustrated with how Anthropic reported on their blackmail-related research.

First, the trivial sin: they posted both a paper and a blog post both called "Teaching Claude Why". But the content is not the same; the paper is much longer and detailed. The blog also does not link to the paper. It's easy to land on the blog and not realise you're missing the full paper. I don't want to labour the point but this is really amateurish.

But then the 'tweet thread' version makes claims not backed up by the blog:

"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

The logic seems to be just that the post-training did not make it worse or better, so it must be in the base model. Obviously.

The paper goes a bit further, but doesn't fully distinguish between misaligned *human* actions in the data, with the assistant persona being influenced by *that*, or between fictional AI misbehaviour, or between safety-related writing of what a misaligned AI would do.

In any case, they never bothered with the experiment everyone always says to do - just filter the pretaining data, and see if it goes down, or not, or if it creates other problems. Disappointing that this is not enough of a priority for them.

And so they tweet an unproven claim alongside solid research, and hand more fuel to the usual suspects and their "memes".

This is just sloppy and I expected better.

32 more comments...

No posts

Ready for more?