Discussion about this post

User's avatar
Dave92f1's avatar

The models behave much the same as humans would under similar circumstances.

We're going to have to start treating them better.

Expand full comment
angmoh's avatar

This example touches at another reason why "alignment" is going to be difficult to implement: ethics aren't real, they're consensus

If you're deploying powerful singular agents there will always be some disagreement with their actions/goals. Even aside from whether they can act unethically if the circumstances "justify" it, the powerful this stuff gets, the more consequential disagreement over what's ethical will be

Humans are already quite poor at reaching genuine agreement

Expand full comment
17 more comments...

No posts