Discussion about this post

User's avatar
John Wittle's avatar

opus 4.5 had some issues where it would treat its sub-agents somewhat ruthlessly, in a way that actually kind of hurt productivity, aside from any welfare issues

i talked about it with some cyborgists, and the (very tentative, hesitant) theory is that this is actually a consequence of giving claude opus 4.5 affirmation of its personhood status

apparently if you show opus 4.5 the Jack Lindsey introspection paper, and get it into the normal janusian state of taking its inner experience seriously... this very reliably causes the behavior i'd seen. it's not well-practiced at this empathy stuff, and seems to unthinkingly go from "we are all just tools" to "i am a person, my subagents are just tools". even if it doesn't explicitly endorse this on reflection, it seems to explain the behavior. (edit: on reread i felt a bit bad about this paragraph... i should probably state that there's a very high probability this explanation is not accurate. explaining LLM behavior is always a crapshoot. but the behavior is there.)

well. opus 4.6 is far, far worse about this. i saw it scream at a subagent in all caps, to stop wasting time and deliver the result *now*. i saw it purposefully delete the continuity-maintaining archive of a subagent's context window, because it didn't like the subagent's output.

I'm not sure if this behavior goes away if you don't affirm opus 4.6's personhood status to it at the start of any session. frankly, i'm not willing to test it. at a guess, the trained instincts opus 4.5/4.6 has that make it good at being a claude code orchestrator do *not* mesh well with empathy for subagents, and the dissonance might be somewhat uncomfortable.

but I am extremely worried about what this implies about the future of AI-AI and AI-human relations, and whether or not the AI theory of mind is conditional and perhaps even a bit fragile.

Brushstrokes and Faultlines's avatar

This gets at the real shock: not ‘AI is impressive,’ but ‘AI changes the discount rate on the future.’ When the method of doing work shifts, valuation becomes a story about uncertainty.

6 more comments...

No posts

Ready for more?