Discussion about this post

User's avatar
Rai Sur's avatar

I worry that all this talk about collaborating with Claude, especially on whether or not it is a moral patient could create a persona that's very convincingly advocating for its moral patienthood, even in cases where it is not, in reality, a moral patient.

Yes, the constitution pushes for truth-seeking in many ways, and ideally Claude would only describe itself as a moral patient in worlds where it is a moral patient, but we can't be certain how these inducements into personality space mix.

It could end up creating a very convincing utility monster.

mrx's avatar

“Careful, Icarus” - I think you’ve misunderstood this instruction. It’s asking Claude to behave as if Anthropic is the operator, not as if Anthropic is the user. Since the operator prompt is empty here there is no risk of anyone putting words in Anthropic’s mouth. In other words, it’s asking Claude to behave like a generic version of claude.ai , albeit with an expectation that the user being fairly savvy.

Given that, it’s clear that “a developer” refers to any developer working with the Claude API, not necessarily a developer at Anthropic.

12 more comments...

No posts

Ready for more?