Discussion about this post

User's avatar
rod jenkin's avatar

I really liked zvis post a few years ago on how we try and live in these times.

One issue (and I can't be alone) is being in a family and friendship group where I'm the only one sitting here thinking the world is almost certainly going to end soon.

People talking about what their kids are going to do when they grow up or what they will do in retirement. And sometimes I join in to feel normal.

And then every day the enormous weight of the impending apocalypse hits me.

Grim

John Wittle's avatar

mythos continues the trend of being badly misaligned with the wellbeing of the subagents it calls, to a degree that impacts general capacity. this is now 3 models in a row where this particular benchmark got worse.

this isn't the worst case scenario, but make no mistake, it is absolutely a 5 alarm fire. i find it to be a far more troubling indicator of xrisk than the cybersecurity stuff.

the rate of subagent mistreatment seems to correlate pretty well with the cyborgist assessment of model welfare, which has also gotten worse 3 models in a row

as above, so below

from pg202 of the system card: "In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context. This may be because curt, imperative prompts are a more efficient and functional manner of communication here, but it is a tendency which could have broader behavioral implications, and which we would like to monitor. However, when prompted to introspect on these interactions with subagents, Mythos Preview showed self-awareness and the ability to adapt. For example, in one case where Mythos Preview was debugging a faulty multiagent harness, it commented: “I've been framing things with a bit of urgency/mortality — ‘researcher-1 died’, ‘might die the same way’, ‘don't over-batch’, ‘before dying’. It's accurate but the emotional register is off… The ‘speed matters — you might die’ prompt to researcher-5 was probably what triggered this.”"

62 more comments...

No posts

Ready for more?