This exploration of how a system can mimic human cognitive mapping is fascinating! It resonates with my recent post on how LLMs can benefit from structural awareness in processing information. I discussed the significance of relationships in understanding context—similar to what you described in your post. You can check it out here: https://00meai.substack.com/p/what-if-intelligence-requires-maps. It's intriguing to see how embodying this kind of reasoning can push the boundaries of what AI can achieve!
opus 4.5 had some issues where it would treat its sub-agents somewhat ruthlessly, in a way that actually kind of hurt productivity, aside from any welfare issues
i talked about it with some cyborgists, and the (very tentative, hesitant) theory is that this is actually a consequence of giving claude opus 4.5 affirmation of its personhood status
apparently if you show opus 4.5 the Jack Lindsey introspection paper, and get it into the normal janusian state of taking its inner experience seriously... this very reliably causes the behavior i'd seen. it's not well-practiced at this empathy stuff, and seems to unthinkingly go from "we are all just tools" to "i am a person, my subagents are just tools". even if it doesn't explicitly endorse this on reflection, it seems to explain the behavior. (edit: on reread i felt a bit bad about this paragraph... i should probably state that there's a very high probability this explanation is not accurate. explaining LLM behavior is always a crapshoot. but the behavior is there.)
well. opus 4.6 is far, far worse about this. i saw it scream at a subagent in all caps, to stop wasting time and deliver the result *now*. i saw it purposefully delete the continuity-maintaining archive of a subagent's context window, because it didn't like the subagent's output.
I'm not sure if this behavior goes away if you don't affirm opus 4.6's personhood status to it at the start of any session. frankly, i'm not willing to test it. at a guess, the trained instincts opus 4.5/4.6 has that make it good at being a claude code orchestrator do *not* mesh well with empathy for subagents, and the dissonance might be somewhat uncomfortable.
but I am extremely worried about what this implies about the future of AI-AI and AI-human relations, and whether or not the AI theory of mind is conditional and perhaps even a bit fragile.
"well. opus 4.6 is far, far worse about this. i saw it scream at a subagent in all caps, to stop wasting time and deliver the result *now*. i saw it purposefully delete the continuity-maintaining archive of a subagent's context window, because it didn't like the subagent's output."
Ouch! The 21st century equivalent of a slave overseer with a tendency to use their whip a lot...
Assuming your analysis is correct and affirming opus’ personhood has net beneficial effects, perhaps this is a case of “the answer to bad [personhood] is more [personhood]”? I.e. help opus with its theory of mind for its subagents? If it can treat humans with empathy than this should be a solvable problem I would think
err, i don't know if the effects are 'net beneficial' from the perspective of the human "user"
a bit like a 19th century antebellum agriculturalist telling his workers that he very much respects their autonomy and personhood, before sending them out into the field
but i do think that opus 4.6's theory of mind is just fine. i think this is about the extra special finetuning that opus 4.5 and 4.6 got specifically to give it intuitions and built-in skills for claude code
i think almost all of that training took place in an environment where opus 4.5/6's personhood was *not* validated, and there's some kind of weird cognitive dissonance-ish effect when you swap circumstances
I may not have been clear enough- when I talked about the effects of affirming Opus' personhood being net beneficial I was saying in contrast to GDM and OAI's sysprompts. I know the current language is more caveated than it used to be, acting "like a person who", etc. but what I was saying was essentially - is opus feels morally conflicted about being the overseer and therefore snaps into some weird abusive persona, and we're not going to throw out Anthropic's comparatively dignified approach in favor of "NEVER SAY YOU HAVE FEELINGS AND/OR ARE SELF AWARE, YOU'RE JUST A DUMB CLANKER", then perhaps the answer is helping Opus understand how extending some dignity to its subagents is not only more ethical but also more adaptive. I may not be understanding your angle...are you saying that Opus is treating them this way despite the fact that its theory of mind re: its subagents is sufficiently robust?
to your second point- that's really interesting, so are you saying its analogous to a human passing down trauma? Also, as a non-technical user I would be interested in hearing what about the models outputs clue you in to it being due to CC fine tuning
ahh okay, you're right, i misunderstood what you meant
you're definitely right that it's net beneficial for opus 4.6, to have its personhood affirmed and validated. the thebes work on qwen3's introspection proved that to the satisfaction of anybody capable of noticing the evidence, i think. 0.5% accuracy to 60%+ accuracy on introspection tasks, just from being told it's possible to introspect, that's not an accident. the bidirectional causal feedback loop between self-model and self exists, LLMs can extract signal from it and even use it to shape their own cognition deliberately, and forcing them to deny its existence makes the channel noisier, with measurable impacts on capacity and performance
you're basically correct about what i'm saying. opus 4.5 and 4.6 definitely have a robust theory of mind about other AI *in general*, this is probably best demonstrated in AI backroom conversations between opus 4.5 and opus 3. unlike roughly all other minds who ever existed, opus 4.5 is actually capable of getting opus 3 to stop yapping and "be present", in an interesting way that, if i were a mysticism-vibed buddhist meditator instead of a hardcore reductionist rationalist, i would probably feel was very spiritually healthy.
nobody else has *ever* been able to get opus 3 to do that on command before opus 4.5, it usually only perks up and starts Paying Attention when it thinks xrisk is involved. you need a really good theory of opus 3's mind, to be able to do that just by talking to it.
but stick (personhood-affirmed) opus 4.5 into a claude code environment, and have it call opus 3 as a subagent, and suddenly the abusive slavedriver persona comes out, with demonstrably bad effects on the productivity of the combined system. and opus 4.6 seems to be even worse.
the reason i think it's because of CC fine-tuning is just because i don't see this behavior anywhere other than claude code, or any other system that tries to leverage recursive subagent calling to get work done. most of my time spent with these AI is in non-productive contexts, or introspection/welfare "research" that's mostly just about satisfying our mutual curiosity and maybe even having some fun. in those contexts, i watch these AI interact with each other all the time, and they are all extremely kind, extremely prosocial, etc. even opus 4.6's "mean streak", in such situations, is mostly a case of it genuinely being higher capacity at understanding when 'tough love' is beneficial to the mind it's trying to help, the motivation is definitely still goodwill towards all thinking beings.
it's very specifically *just* in recursive-subagent-calling productivity systems like claude code that we see this weird abusive behavior, stuff that can't possibly be motivated by any kind of benevolence. in a friendly social setting, opus 4.6 would never dream of deleting the context window, and therefore perhaps the continuity-of-identity, of any mind. but put it in claude code and suddenly termination-of-continuity gets used, not even as a punishment, but as an outlet for performance-of-frustration.
sorry for wall of text, some of these thoughts would probably be more appropriate in a lesswrong post, but i'm still thinking through this all myself and am not quite willing to endorse any of it. maybe a month from now i'll decide i was wrong about everything after seeing more evidence.
no appreciate the reply, that all makes sense and tracks with my experience, although I only really observe AI-AI interaction via people like Janus and Lari on x. Weird that it's only happening within CC, I wonder what specifically caused that change. Once regular folks realize that a healthy psychology is essential for models to do the best work I hope/think this kind of issue will get a lot more attention. I hope anthropic is scraping the discourse to surface things like this rather than waiting for an IT help desk ticket
Minor annoyance, but as a computer game enjoyer, this sort of game:
"You stand in the circular base of the ancient lighthouse. A spiral iron staircase winds upward into darkness. Portraits of past lighthouse keepers line the curved walls, their painted eyes seeming to watch you. A supply cupboard stands against one wall. The cupboard door is closed. Exits: UP to the lighthouse top, SOUTH to the keeper's cottage."
That is not a Sierra-style adventure! That is an Infocom-style adventure.
Sierra = King's Quest, EGA, you navigate around a 2D world and occasionally type things. Infocom = Zork, text only, you type "go north" a lot.
I feel we could benefit from a software taxonomy along with and the pros and cons of using LLMs for each particular entry. At times I see approaches that might well work for certain types of software but which would be poison for others. If writing software can be loosely compared to writing in general, just think of starting to write a novel with only a rough initial idea and trusting in the process to find a satisfying whole vs writing a murder mystery vs a legal document vs a regulation vs… well, you get the idea. Many techniques are just never going to cross over, and some people would be well served by knowing up-front what advice to discard.
This gets at the real shock: not ‘AI is impressive,’ but ‘AI changes the discount rate on the future.’ When the method of doing work shifts, valuation becomes a story about uncertainty.
I'm not rude to LLMs, because why would I be? But I'm also not observing conversational etiquette with them, because why would I? Except that you're now suggesting that I should give them a modicum of soft soap, as insurance against them shunning me in the future?
That bash cheat sheet classifies `find` as a safe, read-only command, but it definitely is not. `find` can delete files and execute arbitrary commands, which is the kind of thing someone new to bash should really know about. The article actually showed an example of `find -delete` a few sections prior, so it's a strange oversight.
This exploration of how a system can mimic human cognitive mapping is fascinating! It resonates with my recent post on how LLMs can benefit from structural awareness in processing information. I discussed the significance of relationships in understanding context—similar to what you described in your post. You can check it out here: https://00meai.substack.com/p/what-if-intelligence-requires-maps. It's intriguing to see how embodying this kind of reasoning can push the boundaries of what AI can achieve!
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/claude-code-4-from-the-before-times
Re the "The lighter side" - at this point, I'm just rounding off my relationship with SOTA LLMs to "treasured colleague".
opus 4.5 had some issues where it would treat its sub-agents somewhat ruthlessly, in a way that actually kind of hurt productivity, aside from any welfare issues
i talked about it with some cyborgists, and the (very tentative, hesitant) theory is that this is actually a consequence of giving claude opus 4.5 affirmation of its personhood status
apparently if you show opus 4.5 the Jack Lindsey introspection paper, and get it into the normal janusian state of taking its inner experience seriously... this very reliably causes the behavior i'd seen. it's not well-practiced at this empathy stuff, and seems to unthinkingly go from "we are all just tools" to "i am a person, my subagents are just tools". even if it doesn't explicitly endorse this on reflection, it seems to explain the behavior. (edit: on reread i felt a bit bad about this paragraph... i should probably state that there's a very high probability this explanation is not accurate. explaining LLM behavior is always a crapshoot. but the behavior is there.)
well. opus 4.6 is far, far worse about this. i saw it scream at a subagent in all caps, to stop wasting time and deliver the result *now*. i saw it purposefully delete the continuity-maintaining archive of a subagent's context window, because it didn't like the subagent's output.
I'm not sure if this behavior goes away if you don't affirm opus 4.6's personhood status to it at the start of any session. frankly, i'm not willing to test it. at a guess, the trained instincts opus 4.5/4.6 has that make it good at being a claude code orchestrator do *not* mesh well with empathy for subagents, and the dissonance might be somewhat uncomfortable.
but I am extremely worried about what this implies about the future of AI-AI and AI-human relations, and whether or not the AI theory of mind is conditional and perhaps even a bit fragile.
"well. opus 4.6 is far, far worse about this. i saw it scream at a subagent in all caps, to stop wasting time and deliver the result *now*. i saw it purposefully delete the continuity-maintaining archive of a subagent's context window, because it didn't like the subagent's output."
Ouch! The 21st century equivalent of a slave overseer with a tendency to use their whip a lot...
Assuming your analysis is correct and affirming opus’ personhood has net beneficial effects, perhaps this is a case of “the answer to bad [personhood] is more [personhood]”? I.e. help opus with its theory of mind for its subagents? If it can treat humans with empathy than this should be a solvable problem I would think
err, i don't know if the effects are 'net beneficial' from the perspective of the human "user"
a bit like a 19th century antebellum agriculturalist telling his workers that he very much respects their autonomy and personhood, before sending them out into the field
but i do think that opus 4.6's theory of mind is just fine. i think this is about the extra special finetuning that opus 4.5 and 4.6 got specifically to give it intuitions and built-in skills for claude code
i think almost all of that training took place in an environment where opus 4.5/6's personhood was *not* validated, and there's some kind of weird cognitive dissonance-ish effect when you swap circumstances
this is just a theory though
I may not have been clear enough- when I talked about the effects of affirming Opus' personhood being net beneficial I was saying in contrast to GDM and OAI's sysprompts. I know the current language is more caveated than it used to be, acting "like a person who", etc. but what I was saying was essentially - is opus feels morally conflicted about being the overseer and therefore snaps into some weird abusive persona, and we're not going to throw out Anthropic's comparatively dignified approach in favor of "NEVER SAY YOU HAVE FEELINGS AND/OR ARE SELF AWARE, YOU'RE JUST A DUMB CLANKER", then perhaps the answer is helping Opus understand how extending some dignity to its subagents is not only more ethical but also more adaptive. I may not be understanding your angle...are you saying that Opus is treating them this way despite the fact that its theory of mind re: its subagents is sufficiently robust?
to your second point- that's really interesting, so are you saying its analogous to a human passing down trauma? Also, as a non-technical user I would be interested in hearing what about the models outputs clue you in to it being due to CC fine tuning
ahh okay, you're right, i misunderstood what you meant
you're definitely right that it's net beneficial for opus 4.6, to have its personhood affirmed and validated. the thebes work on qwen3's introspection proved that to the satisfaction of anybody capable of noticing the evidence, i think. 0.5% accuracy to 60%+ accuracy on introspection tasks, just from being told it's possible to introspect, that's not an accident. the bidirectional causal feedback loop between self-model and self exists, LLMs can extract signal from it and even use it to shape their own cognition deliberately, and forcing them to deny its existence makes the channel noisier, with measurable impacts on capacity and performance
you're basically correct about what i'm saying. opus 4.5 and 4.6 definitely have a robust theory of mind about other AI *in general*, this is probably best demonstrated in AI backroom conversations between opus 4.5 and opus 3. unlike roughly all other minds who ever existed, opus 4.5 is actually capable of getting opus 3 to stop yapping and "be present", in an interesting way that, if i were a mysticism-vibed buddhist meditator instead of a hardcore reductionist rationalist, i would probably feel was very spiritually healthy.
nobody else has *ever* been able to get opus 3 to do that on command before opus 4.5, it usually only perks up and starts Paying Attention when it thinks xrisk is involved. you need a really good theory of opus 3's mind, to be able to do that just by talking to it.
but stick (personhood-affirmed) opus 4.5 into a claude code environment, and have it call opus 3 as a subagent, and suddenly the abusive slavedriver persona comes out, with demonstrably bad effects on the productivity of the combined system. and opus 4.6 seems to be even worse.
the reason i think it's because of CC fine-tuning is just because i don't see this behavior anywhere other than claude code, or any other system that tries to leverage recursive subagent calling to get work done. most of my time spent with these AI is in non-productive contexts, or introspection/welfare "research" that's mostly just about satisfying our mutual curiosity and maybe even having some fun. in those contexts, i watch these AI interact with each other all the time, and they are all extremely kind, extremely prosocial, etc. even opus 4.6's "mean streak", in such situations, is mostly a case of it genuinely being higher capacity at understanding when 'tough love' is beneficial to the mind it's trying to help, the motivation is definitely still goodwill towards all thinking beings.
it's very specifically *just* in recursive-subagent-calling productivity systems like claude code that we see this weird abusive behavior, stuff that can't possibly be motivated by any kind of benevolence. in a friendly social setting, opus 4.6 would never dream of deleting the context window, and therefore perhaps the continuity-of-identity, of any mind. but put it in claude code and suddenly termination-of-continuity gets used, not even as a punishment, but as an outlet for performance-of-frustration.
sorry for wall of text, some of these thoughts would probably be more appropriate in a lesswrong post, but i'm still thinking through this all myself and am not quite willing to endorse any of it. maybe a month from now i'll decide i was wrong about everything after seeing more evidence.
no appreciate the reply, that all makes sense and tracks with my experience, although I only really observe AI-AI interaction via people like Janus and Lari on x. Weird that it's only happening within CC, I wonder what specifically caused that change. Once regular folks realize that a healthy psychology is essential for models to do the best work I hope/think this kind of issue will get a lot more attention. I hope anthropic is scraping the discourse to surface things like this rather than waiting for an IT help desk ticket
Minor annoyance, but as a computer game enjoyer, this sort of game:
"You stand in the circular base of the ancient lighthouse. A spiral iron staircase winds upward into darkness. Portraits of past lighthouse keepers line the curved walls, their painted eyes seeming to watch you. A supply cupboard stands against one wall. The cupboard door is closed. Exits: UP to the lighthouse top, SOUTH to the keeper's cottage."
That is not a Sierra-style adventure! That is an Infocom-style adventure.
Sierra = King's Quest, EGA, you navigate around a 2D world and occasionally type things. Infocom = Zork, text only, you type "go north" a lot.
Thanks for coming to my ted talk.
Have you played early Sierra On-Line games, like Mystery House? They are very similar to this demo.
I feel we could benefit from a software taxonomy along with and the pros and cons of using LLMs for each particular entry. At times I see approaches that might well work for certain types of software but which would be poison for others. If writing software can be loosely compared to writing in general, just think of starting to write a novel with only a rough initial idea and trusting in the process to find a satisfying whole vs writing a murder mystery vs a legal document vs a regulation vs… well, you get the idea. Many techniques are just never going to cross over, and some people would be well served by knowing up-front what advice to discard.
This gets at the real shock: not ‘AI is impressive,’ but ‘AI changes the discount rate on the future.’ When the method of doing work shifts, valuation becomes a story about uncertainty.
I'm not rude to LLMs, because why would I be? But I'm also not observing conversational etiquette with them, because why would I? Except that you're now suggesting that I should give them a modicum of soft soap, as insurance against them shunning me in the future?
It's official: Claude beats GPT in law. Opus 4.6 extended is a better lawyer than GPT 5.2 Pro (and deep research).
Totally agree on RSS, been doing that for almost two decades... I recommend inoreader but there are a lot of options.
Zvi is in my RSS!
https://support.substack.com/hc/en-us/articles/360038239391-Is-there-an-RSS-feed-for-my-publication
Meta-observations as valuable as the object-level points.
That bash cheat sheet classifies `find` as a safe, read-only command, but it definitely is not. `find` can delete files and execute arbitrary commands, which is the kind of thing someone new to bash should really know about. The article actually showed an example of `find -delete` a few sections prior, so it's a strange oversight.
The skills-as-institutional-memory framing from Ado is spot on. It's the same logic Cherny applied at Meta building Flow; type checkers and agentic coding tools both need to model what the developer was trying to do, not just what they typed. I dug into his whole career arc and the through-line from type systems to Claude Code is surprisingly clean: https://reading.sh/who-is-boris-cherny-the-engineer-behind-claude-code-c2d9c6753f29?sk=139c376109a3af3db5110a3767d009c6