Discussion about this post

User's avatar
MichaeL Roe's avatar

This is a good example of alignment failure that normies can understand.

Many people who haven’t been following this closely don’t realise that there is unexpected emergent behaviour in LLMs.

Even if you’re no an expert, it s easy to get that:

A) Elon (or his employees) did not explicitly program their AI to call for Elon to be executed. Clearly, he would be very unlikely to do that.

B) it is also clear why Elon might have problem with an A calling for his execution.

Once you’ve got that - the problem generalizes. Welcome to AI alignment. You are now a doomer.

Expand full comment
Sylvain Ribes's avatar

It's frankly quite pathetic that they (or to be charitable, at least one of their engineer) thought this had the slightest chance of being fixed by system prompting.

Reminds me of LeCun's "we'll just tell the robot to be harmless:)"

Expand full comment
16 more comments...

No posts