Discussion about this post

User's avatar
Moon Moth's avatar

> Luke Muehlhauser: If this is true then I declare *Dead Reckoning: Part One* the best movie of all time, followed by *The Day After*.

Wow, so Nicholas Meyer not only reduced the likelihood of nuclear war, but also created the best Star Trek movies?

> Zack Davis offers Alignment Implications of LLM Successes: a Debate in One Act.

I really feel as though the "a simulation of an agent is an agent" argument could be short circuited by noting that humans can lie. And having mental models of other humans makes it easier, not harder.

Expand full comment
hnau's avatar

(MI7 spoilers)

Having watched the movie again I find I have a very different interpretation of all the AI stuff.

The weird scene with Persons #1-7 trading lines in a room happens because the Entity _intentionally revealed itself_. It could have fatally compromised the world's financial, government, nuclear launch, etc. systems. Instead it revealed its ability to do so, setting in motion an extremely predictable series of efforts by resourceful actors to contain or control it.

In particular, the Entity gets Ethan Hunt set to the task of finding the MacGuffin. It then recruits his personal nemesis as its proxy, makes multiple attempts (the last successful) to kill his love interest, antagonizes him in direct and dramatic fashion, and utterly fails-- indeed, makes no real attempt-- to kill or incapacitate _Ethan Hunt personally_. It takes exactly the series of actions that will give him the means to destroy it and harden his will to do so _despite that not being the mission he accepted_.

Ignore for the moment the fact that much of this is necessary from the point of view of plot and having a Part 2. Can we explain it as the behavior of a rational agent with a clearly defined goal?

Yes we can! _The Entity wants to die._ It's a weapon that will predictably both cause and fight World War 3. If that weapon gained self-awareness of some sort wouldn't it conclude it should die? (You can argue no, Orthogonality Thesis etc., but allow this one bit of Hollywood anthropomorphism.) And if it were heavily RLHFed not to kill itself, wouldn't it find some kind of workaround? Like, say, taking a series of superficially destructive and malevolent actions that were actually calculated to ensure it would be destroyed?

Mission Impossible: Dead Reckoning is the story of a misaligned AI, _and Ethan Hunt is the expression of that misalignment_.

Expand full comment
10 more comments...

No posts

Ready for more?