45 Comments
User's avatar
Aris C's avatar

I just don't understand. This entire review, and just two paragraphs on orthogonality, which is ultimately the heart of the issue: this idea that ASI, for all its knowledge and wisdom, is still stupid in that it pursues objectives single-mindedly.

We (humans) don't do that. Why would ASI?

And OK, let's grant that orthogonality is true. That implies there is no objective morality, right? Because if there were, wouldn't the ASI be aware of it, and comply with it? And if there isn't, then why the qualm about only taking lethal action if it's legal, part of some international accord?

valencia_o's avatar

“Pursues objectives single-mindedly” is either totally true of humans, if you define objectives widely enough, or also not what ASI will do, if you define it narrowly.

Also a) illegal “lethal action” obviously wouldn’t work and b) discarding objective morality doesn’t mean discarding morality and c) even objective morality doesn’t necessarily translate from humans to ASI, depending on what form of objective morality is correct in the hypothetical.

Skull's avatar

It's not stupid that I killed all the ants in a hill in my backyard. It was perfectly intentional, in fact, and required planning and technology. It wasn't stupid that I wanted to build a shed there. It wasn't stupid that the ants were in the way. And I'm not stupid for having killed them. It's just a question of different goals and alignment. They were in my way. They are not instrumental to my goals. I don't hold their lives as being morally important. So I killed them.

That's one single example out of endless that we are trying to avoid with ASI.

Aris C's avatar

You're assuming it's not evil to kill ants. Are you saying it's not evil to kill humans?

Seta Sojiro's avatar

I would bet that most people think that ants aren't sentient and humans are, therefore, it's okay to kill ants and not humans. But this is a human centered definition - an essentially gerrymandered rule to excuse how humans treat ants.

But if sentience is the boundary for morally relevant beings, then what about the fact that humans bulldoze forests and destroy ecosystems filled with unquestionably sentient beings? Maybe you'll object that some humans see the problem and try to stop them. But it still happens at massive scale.

Some humans don't see a moral problem with this at all. You could push back the rule to intelligence - only beings able to abstractly reason at the level of a human are morally relevant entities. Killing millions of mammals isn't that bad, after all they aren't human. It's a fine trade off for securing agricultural land to feed humans.

But surely you can immediately spot the problem with this rule. An ASI can simply shift the boundary again, to ASI level intelligence. To an ASI, a human and a pig are at a similar level of intelligence. And look at how we treat pigs.

Aris C's avatar

Right. But this means there is no objective morality that values humanity. So, why the authors reluctantance to advocate for violence?

Seta Sojiro's avatar

Because violence is ineffective. Your group would instantly be branded terrorist extremists and discredited, defunded and any political influence would be lost. It's also a one time only measure. After the first attack, AI research will move to whichever country, or entity can guarantee security.

Also, assuming ASI is possible*, there are enormous economic benefits to building a just under ASI level intelligence. This makes it a tragedy of the commons - each individual country has a huge incentive not to stop.

*which I'm not actually convinced of but that's a separate topic.

Aris C's avatar

I don't buy this. Violence is no more ineffective than an international agreement to halt research is wishful thinking. And, maybe violence would be ineffective now; but it probably wouldn't have been when there were only a handful of AI researchers who knew what they were doing.

Note that I'm not advocating for violence myself. I am just saying the authors' worldview seems inconsistent.

Skull's avatar

I'm not assuming it's not evil to kill ants. I'm not saying it's not evil to kill humans. The only assumption I'm making in that regard is that those of us in this conversation don't want all humans to die. If you don't care if humans die, you are of no use to me or anyone, in this conversation or elsewhere.

OOD's avatar

I wrote a snide comment on ACT and it didn't go well, so I'll try to explain better this time.

I'm extremely disappointed that Eliezer has become this kind of death prophet, after a long career of productive contributions to AI discussion.

He seems unjustifiably convinced of scenario of his own construction. The AI will be created with great powers of comprehension and well-meaning instructions, but it doesn't work out. Some vague combination of not really understanding the instructions and deciding that humanity's desires don't actually matter.

I don't think this is an unthinkable scenario in and of itself, but to be SO confident of it that you're willing to propose "if Japan doesn't make it illegal to run 15 GPUs, they should have their national sovereignty taken away" ... shouldn't we call that out as irrational overconfidence?

valencia_o's avatar

I don’t think “not understanding the instructions” is at all what Eliezer thinks. Of course an ASI could understand what you meant (e.g., https://x.com/ESYudkowsky/status/1845546273065349300). It’s not a vague decision “that humanity’s desires don’t really matter” - humanity’s desires mattering is a very specific thing to happen, a very narrow target.

“Anything else than that” is, yes, vague, but not because it’s unlikely or narrow or avoiding inspection. It’s pointing at a huge ocean of values and saying that ending up with the exact ones humans have is quite unlikely, even getting close is unlikely, and close is not sufficient.

Jeffrey Soreff's avatar

I think that "close is not sufficient" is both true (as their example with synthetic pseudo-humans points out) and the crux of the argument. I'm skeptical that "even getting close is unlikely". The LLMs are trained on a _lot_ of human-written text. At least part of their utility function seems likely to end up as something like average-of-written-comments-on-the-internet.

I think a somewhat stronger argument, which is just as concerning, is that AIs' utility function can certainly end up at least as alien as the most alien human's, and that includes jihadists, Nazis, Maoists, etc.

valencia_o's avatar

That’s fair - I agree LLM-derived AIs are more likely to get close than other hypothetical architectures (though I’m not sure if that is actually likely vs less unlikely), and that the crux is “close is not sufficient.”

Jonas's avatar

Why would the AI:s utility function be an average-of-written-comments-on-the-internet? I think this is a very flawed view of how this works. LLM:s are trained to predict text, that doesn't make them into that text. An alien learning how to predict human behaviour doesn't become a human.

We are not training it to want to have the opinion of humans. We are training it to want to be good at predicting and other skills.

Jeffrey Soreff's avatar

Many Thanks!

>We are not training it to want to have the opinion of humans. We are training it to want to be good at predicting and other skills.

Sort-of. The pre-training reward function is indeed predicting the next token - which isn't terribly different from _mimicking_ the next token, and giving responses roughly similar to what average-of-written-comments-on-the-internet would be.

Now, compression and generalization even during the pre-training phase change this, and the RFHL phase mixes in other rewards, but, to 0-th order, the pre-training phase model is "trying" to mimic the "typical" response of the writer-on-the-internet.

Steven Adler's avatar

As @valencia_o noted, Eliezer's concern isn't that the AI will misunderstand instructions, but rather that following the instructions won't maximize its own goals and so it will choose to disobey

There's an analogy to an actor, who is really really good at playing a part, but that doesn't fundamentally change the underlying values of the person. It's not that they haven't understood well enough what it would mean to be a kind person, or a cruel person, or whatever type of person they are embodying in their role; it's that understanding the behaviors of such a person doesn't have much bearing on the actor's underlying values

Kevin's avatar

There is a pattern in a type of claim here I don’t believe.

“The core idea here is that if you teach a mind general skills, those skills have to come with a kind of proto-want, a desire to use those skills to steer in a want-like way. Otherwise, the skill won’t be useful and won’t get learned.”

This is a good example of something I don’t agree with. You just can’t make leaps like this, “if a superintelligence exists then it must have property X”. It’s just a guess, and as the rise of LLMs has shown, it’s very hard to predict what architecture will succeed before trying it out.

Another example:

“A mind’s intelligence, the ability to predict and steer, has nothing to do with where that mind is attempting to steer. You can be arbitrarily good or bad at steering and predicting, and still try to steer to wherever ultimate or incremental destination.”

I don’t agree with that either. If anything LLMs are hinting at the opposite, that the abilities and preferences of a mind may be confusingly entangled.

The whole argument feels like it is reaching. We should set our sights much lower, and instead of trying to convince people they should have a p(doom) > 50% right now, think about what sort of evidence would in fact convince the median, reasonable thinker that they should worry about AI doom. We clearly don’t have it today!

Aris C's avatar

So what would AI do, that would cause it to kill humans?

If you think there is no objective morality, and that humanity is worth preserving (otherwise why care), what's the argument against lethal action?

(Don't want to debate effectiveness, other than to say it'd be materially easier to bomb AI offices than to convince world leaders to ban AI research)

Dust's avatar

bombing AI offices is a terrible idea, morality aside. right now those offices are staffed with researchers and engineers who are rich enough to leave and free enough to talk about their work at least to close friends outside the comoany. if you bomb the AI offices, then the US military takes up the work, and they do it with even less transparency than the AI companies and without empowering engineers and researchers with riches, and with criminal penalties for talking to anyone outside the govt. and they develop superintelligence with worse goals. etc etc. I'm sure I've only scratched the surface of why bombing AI offices is a terrible idea, morality aside.

jpr's avatar

"If everyone woke up one morning believing only a quarter of what we believe, and everyone knew everyone else believed it, they’d walk out into the street and shut down the datacenters, soldiers and police officers walking right alongside moms and dads."

I think this quote kind of captures the tragedy of Yukowsky. You guys are so, so bad at politics.

I'm sure you can think of many serious issues where a majority agrees, and yet collective action doesn't happen. And if your response is, "those aren't truly shared beliefs", then you must realize what an impossibly high standard "shared belief" actually is, and that mass movements do not work that way.

Case in point -- you guys have done everything you can to waive the concerns of people losing their jobs to AI, even though they are aligned with you on the only point you think matters, which is to shut down development. You have a mass political movement ready to go! But it'd be a big tent, with a lot of people who don't share your beliefs about the economy, etc.

Maybe this is why Yudkowsky spends so much time wheedling policymakers. He just can't conceive of how to engage with politics himself, so he has to delegate it. The problem with this is that he does not have billions of dollars to lobby with.

Jamie Fisher's avatar

> you guys have done everything you can to waive the concerns of people losing their jobs to AI

There's also the people so depressed so impoverished so despairing so nihilistic in our current economic and social landscape, that if you told them the world was going to end in 10 years, they might truly not give a damn. Or worse, find it amusing. (I mean, it *does* imply the billionaires die too)

The first person I told irl about AI 2027 fit this 'downtrodden' category very neatly. And they said, basically, "good".

Tamsin Brooks's avatar

I just wish we could know for a fact if the world is going to end in 10 years, because if so, I'd definitely go to Disneyworld next January. Or if I knew it was going to end in 24 hours, I'd go to Costco and eat an entire chocolate tuxedo cake. Also, my first thought on the 10 year scenario was "I've got bigger things to worry about right now." It really was, and I'm not ashamed to admit it. And that's how most people are going to think. Anyone who's seen "The End", the recent apocalypse movie musical set in a billionaire's bunker, would rather pick the apocalypse compared to being stuck with those idiots.

Jeffrey Soreff's avatar

Just to point to a comment I made in the previous post about this book: https://thezvi.substack.com/p/reactions-to-if-anyone-builds-it/comment/157511883 which is equally relevant here.

Vojtech Kovarik's avatar

Nitpick regarding this bit:

> To be precise, they mean that ‘If anyone builds [superintelligence] [under anything like present conditions using anything close to current techniques] then everyone dies.’

> My position on this is to add a ‘probably’ before ‘dies.’ Otherwise, I agree.

I always feel that when I hear "if we build AI, probably everybody dies", it is so tempting to round it up to: "Aah, well, this just means that maybe we die and maybe we don't, and if we try hard enough, we won't". But a different view -- perhaps not one you hold? -- is that "probably it is the case that 'if we build AI, everybody [definitely] dies'." And here, I feel less tempted to do various mental gymnastics. It feels more clear that this means: "There is some unclear but fixed and non-trivial probability that this world view is right. And if it is right, then if we build the AI, we definitely die."

I would be curious to know how you feel about this distinction between "probably [ASI ==> extinction]" vs "ASI ==> probably extinction", whether you endorse one vs the other, and if you have thoughts on communicating this to people :-).

AlanDee's avatar

You don't need an entire book to understand that it will kill everyone at some point. All that is needed is one buried directive (intentional or not) in one super intelligence that directs it to spend all it's effort on the task, (including but not limited to over-riding every other super intelligence). We have to stop it every time. It only has to be successful once.

Jeffrey Soreff's avatar

A bit peripherally related, but with something of a "Don't build the Torment Nexus" flavor

( from https://www.dailynk.com/english/from-drones-to-nukes-north-korea-pushes-ai-military-modernization-plan/?tztc=1 )

>The Strategic Force was tasked with developing a four-stage “leap” strategy to integrate AI-based unified management systems for storing, operating, and commanding nuclear weapons, as well as launching nuclear counterattacks.

<mildSnark>

"Colossus, The Forbin Project" - but new and improved, now with hallucinations.

EDIT: Does entrusting nuclear weapons to an AI system count as the national (and possibly civilizational) equivalent of entering the competition for a Darwin award?

</mildSnark>

Jamie Fisher's avatar

Why should I believe You Guys?

[setting aside for a moment that I read you religiously and do believe in ASI dangers]

Whenever a Climate Change Activist begins an argument, they can always say....

"98% (or whatever number) of publishing atmospheric scientists believe in primarily-human-caused climate change".

Can YOU GUYS say the same about YOUR OWN INTELLECTUAL COMMUNITY? And heck, who ACTUALLY IS your intellectual community? There's obviously AI researchers. But what about general mathematicians? Or anyone with a PhD in Computer Science? Or PhD Philosophers? Or Economists? Or is your "intellectual community" just the LessWrong folks? Does Robin Hanson count? On SneerClub someone once sent me a private message saying, basically, "don't worry about that AGI stuff... Robin Hanson defeated Yudkowsky in the classic FOOM debate".

DEAR YUDKOWSKY, WHY NOT TRY TO CONVINCE YOUR MOST INTELLIGENT-AND-INFORMED CRITICS? IF YOU CAN'T DO **EVEN THAT**, LIKE IN THE **UNIFIED** CLIMATE SCIENTIST COMMUNITY, WHY SHOULD I OR ANYONE ELSE WHO HASN'T FORMALLY-STUDIED ML BELIEVE YOU OVER THEM?

I'm not saying "don't try to convince the general public", but the general public has way too many experts to choose from

Maybe it's time to have coffee with Robin Hanson.

Thor Odinson's avatar

I don't have the link to hand, and the poll is a few years old, but I recall that "the people actually working in AI labs to build AIs" had a median P(Doom)(this century) of about 10%.

A decent fraction of these people take the view that if they don't make AGI they'll definitely die (of old age), and are openly gambling - with a wager of all of humanity - on getting the good version.

A noteworthy minority actively seem to want human extinction - cf. Any of Zvi's weekly round-ups with a "please speak directly into the microphone" section

And then of course there are many people who think ASI is impossible, for reasons that vary dramatically in quality.

Jamie Fisher's avatar

It's not that the "General Public" isn't informed.... It's that the General Public is informed by OTHER experts.

Jamie Fisher's avatar

exact quote from a member of the 'uninformed' public to me:

Hey so I saw your post in sneerclub.

2:00 AM

I don't want you to be alarmed. Yud's arguments for foom scenario(the silly name for expoential takeoff) in the Hanson-Yud debates was nothing more than quoting a book about nuclear weapons, then writing down undergrad first day diff equations

Patrick Lawlor's avatar

Just finished reading the book this morning. Certainly scared the shit out of me... 😰

[insert here] delenda est's avatar

Fwiw one "cues" the predictable reactions, not "queues".

Excellent review, much appreciated.

Jeffrey Soreff's avatar

I concur that Yudkowsky and Soares are overconfident in their predictions.

They write "Some aspects of the future are predictable, with the right knowledge and effort; others are impossibly hard calls. Competent futurism is built around knowing the difference.

History teaches that one kind of relatively easy call about the future involves realizing that something looks theoretically possible according to the laws of physics, and predicting that eventually someone will go do it."

This is close to unfalsifiable. Very few _unmade_ attempts are documented well enough to see them in the historical record. Most situations can be explained away by one excuse or other:

Steam automobiles are possible, but never really got massively deployed: "But people _did_ try to build them!"

Drexler-Merkle nanotechnology looks physically possible, but never really got attempted: "Not _YET_"

Space elevators haven't been attempted: "Maybe the physics of fabricating sufficiently strong materials in sufficient lengths forbids it."

Rarely, counterexamples, unmade attempts at physically possible proposals, _are_ documented sufficiently to show up in the historical record:

https://en.wikipedia.org/wiki/Project_PACER https://en.wikipedia.org/wiki/Sundial_(weapon)