Agreed! It would be helpful to make the minimal set clearer.
To my mind, two parameters that could really use clarification are:
Just _how_ super does the superintelligence need to be to pose an extinction threat?
Just how fast does the path to superintelligence need to go to only give us one shot at taming it?
Edit: I think there is a decent reason to expect AI progress to slow. METR's key parameter of the increasing time horizon of the tasks AI can successfully complete seems very likely to bump into increasing time needed to assess and train AI for increasingly complex tasks. _Currently_ the time horizon is short compared to the doubling time for that time horizon but when the time horizon expands enough that one iteration of assessing and training an AI becomes equal to the doubling time I expect it to become infeasible to keep the doubling time constant. And this _matters_ for assessing the odds that we get just one shot at taming a nascent ASI.
If you accept that evolution ruthlessly optimises for energy efficiency in whatever attributes contribute to a species' survival, then there is no reason to think we are anywhere near any putative "upper limit" on intelligence.
We don't even know if there is such a thing.
The speed thing is also not load bearing. Look how we are doing with tropical deforestation, getting off fossil fuels (new records in both coal and oil production this year), and pretty much every other problem that requires global co-ordination. The speed thing might be an issue if humans were other than we are.
>then there is no reason to think we are anywhere near any putative "upper limit" on intelligence
What you say is plausible - but unproven. We can be sure that we can aggregate more computing power than is currently in any given cluster, but that is pretty much the only thing we can say with any certainty about extrapolations of intelligence.
As I wrote, there are at least plausible reasons for intelligence gains to slow down, once task horizons are on the same order as the doubling time.
>The level-of-superness thing is not load bearing
I disagree. The whole idea of "they will defeat us, and we won't even understand how" is plausible if the superness of the AI is on the order of a human-vs-chimp gap or more, but it _isn't_ plausible if the AI is on the order of a dozen cooperating John von Neumanns. To pick the extreme case, suppose that the superintelligence is only 10% smarter than the smartest human who ever lived. It would be unbelievable for that AI to defeat the world. The amount _matters_.
>The speed thing is also not load bearing.
I disagree. If nothing else, there is a strong possibility that the lab (or small set of labs) that have the most advanced AI are all in the US, and the US government can sometimes get its act together and act to alleviate a threat. If it were to manage to persuade the leading labs to concentrate more on alignment than on capabilities for a year, possibly with multiple tries on dangerous but not catastrophic AI systems, we might squeak through. This _doesn't_ work if the time to advance from GPT5 to something equivalent to a thousand cooperating von Neumanns is three weeks.
The burden of proof is on those who claim we are or might be near an upper limit of intelligence, like say a dozen John von Neumanns.
If there is uncertainty the prudent thing to do is to stop building the things until such time as the uncertainty is sufficiently low. Continuing on is like continuing to drive towards a cliff edge because the slope might be 75 degrees, not a full 90 degrees.
The co-ordination problem is that *everyone* has to agree not to buld the things. If any one group (corporation, nation, ideological group) defects from that agreement, then an ASI gets built and everyone dies.
It's not an issue of controlling a single lab, it's an issue of getting every country in the world to agree and actively police the agreement. *Every* country, *all* the time. We have no idea how to do that.
Many Thanks! You don't get to construe the absence of a proof that intelligence can be raised arbitrarily high as if it were a positive claim that we know that there is a low bound to intelligence, and then to assert that the burden of proof sits on the other side.
>It's not an issue of controlling a single lab, it's an issue of getting every country in the world to agree and actively police the agreement. *Every* country, *all* the time. We have no idea how to do that.
Oddly, I do agree with you on this. About the only arms treaty that really worked was the nuclear test ban treaty. Nuclear tests shake the planet sufficiently to be detected by seismographs on the other side of the world. As a result, that treaty is actually verifiable. In contrast, the chemical weapons treaty didn't stop the development of the novichok toxins. Unverifiable arms control treaties aren't worth the paper they are written on. And large parts of the recent advances in AI, notably reasoning models, are essentially software advances. Good luck policing software development...
I'll go out on a limb here, and say that, given the military applications of AI, given a treaty banning further AI development, I expect both the PRC and the USA to cheat.
Hmm, so, quite a lot of the value of writing a book like this is in the media tour- far more people will see clips like this than will read the book, and it's a chance to shape the cultural narrative. So, it's maybe more important that they nail these short interviews than that they nail the actual content of the book. Unfortunately, I don't think they quite nailed this one.
If I were to offer some constructive criticism, it would be that I think they got a bit too in the weeds of current LLM misalignment here, and did so right after mentioning that current LLMs aren't actually what they're worried about. Yudkowsky often complains that when people first encounter the idea of ASI risk (or "ASI ruin" as he prefers), everyone seems to have a different half-thought-out objection, and no single, short explanation of the idea can address all of them. He's gotten very good at knocking down these objections- the book is full of slick explanations and parables countering common objection after common objection, and pointing out current LLM misalignment is an excellent counter to one specific objection.
What I think Yudkowsky has always largely missed, however, is that this endless variety of weak objections do, in fact, have a common cause. I think Yudkowsky encounters them so frequently because of his reluctance to offer social proof for his ideas. I think he incorrectly sees social proof as a dishonest rhetorical tactic, and as a result, almost everyone initially assumes his ideas are a kind of amateur speculation, and naturally offers amateur speculation in response. People don't respond to scientists talking about nuclear war with an endless stream of half-thought-out objections, even though I'm sure they could invent them if they believed doing so was socially expected. They avoid that because they correctly believe that a huge number of very well-informed people have carefully considered the issue from all angles and already accounted for any objection they could come up with off the cuff. Most people, however, don't yet realize that this is also the case for ASI risk- thus the bad objections. This is what people need proof of, far more than that current LLM misalignment exists.
If it were me, I'd have name-dropped Hinton and his Nobel immediately. I'd have mentioned that they were in discussions with people in congress, and that most of the big names in SV signed the Statement on AI Risk. I might even have dipped slightly into the "dark arts" and chosen a convenient opponent- definitely not LeCun; maybe one of the sci-fi authors- something like "...but the strongest objections we've seen haven't actually come from computer scientists; they've come from science fiction writers like Charles Stross. Unlike science fiction, the things that alignment researchers are worried about aren't narratively satisfying." This would actually be a bit manipulative, unlike ordinary social proof, but if it wasn't actually effective in practice, I don't think politicians would do it so frequently.
Only after I'd leaned very heavily into "this is something serious people believe", regardless of the actual question asked by the interviewer, would I describe the actual argument- probably with a very well-rehearsed one or two sentence pitch that had gone through a few rounds of focus group testing. Hiring random people to focus group some copy isn't actually that expensive, and if they haven't done that already, I think they should.
There was also one other thing I think deserves mentioning: Yudkowsky here sounded angry at the interviewer. To some extent, I think that's just what the guy's normal speaking voice sounds like. But, yeah, working on controlling that might be valuable.
Overall, I don't think it was a disaster or anything, but I do hope they improve in later appearances.
The step I don’t believe is that a technological superintelligence will think it’s a good idea to kill all humans; or not care at all about humans and kill us all as an unintended side effect, like we accidentally step on ants
A superintelligence will be running on phenomenally complex human civilisation hardware. This is stuff that has a finite lifespan. And unlike biology it can’t repair itself, or make fresh copies. You’ve heard how no-one knows how to make a pencil, and the amount of civilisational complexity required to make such a simple thing. And in comparison the hardware running the superintelligence will likely be the most complex things built by human civilisation. The insane complexity of the EUV machines, of the fabs, of the pure silicon production; of all the metal and material production of everything in the machines, of the power generation; of the entire supply chain, transport and energy required in it etc. Sure, lots of this will get automated, robotics will replace lots of humans etc - but even with ASI I can’t see this being fully automated end to end enough for the ASI to believe it’s a good idea to kill every creature that made all this stuff possible in the first place.
If you were running on a substrate that is the most complex things built by humanity, and that stuff has a finite lifespan and needs regular replacing, and to replace it depends on an incredibly insane global supply chain and the knowledge of however many tens of millions of humans - you’d not want to kill all the humans. It would be an insane thing to do.
Oh I get you’d want control. Sure, and if you are an ASI then that would be easy, no? Far better to keep the humans onside, give them bread and circus, have them do your bidding, than wipe them all out and hope you’ve got it all covered.
The analogy of humans not caring about ants is false, humans would care about ants if our consciousness was run on ant hills
This view doesn’t mean ASI isn’t hugely risky or dangerous, and especially highlights the very major risk of humans losing control, but I don’t buy the ‘everyone dies’ certainty.
ASI won't really rule the world without humans if ASI doesn't have a lot of robotic bodies helping it do physical tasks. By definition, ASI surpasses the intelligence of all human beings combined, and that means it can manage the entire world with the help of robots it controls.
I have always assumed that "ASI doom" involves an ASI and countless robots like in Terminator series. But now it feels like many AI companies are saying we will get ASI long before we get robots with human-level flexibility? That doesn't make sense to me.
Thanks, I probably got what you were talking about here. An ASI indeed doesn't need to take on humans with a conventional war or rebellion approach. It can manipulate humans and sabotage humans even without a physical body.
However, the problem is, suppose that an ASI kills all humans in the year 2035, but it still hasn't got a physical body. And humanoid robotic techniques are far from well-developed in 2035.
Now, the ASI is sitting in a world with all the industrial machines, including robots, but these still require human operators to function and maintain, since they are not 100% automated. The ASI can design new humanoid robots or other smart robots to replace humans, but there are too many interdependent parts to physically make robots, so the ASI must keep global manufacturing running as usual, which still requires humans as operators.
Without humans to help it, ASI will probably be trapped in the cycle of no agile general-purpose robots -> no agile helping hands in the physical world -> can't produce new agile general-purpose robots.
This is the reason that I think a disembodied ASI can't really kill all humans AND sustain itself indefinitely, unless it figures out how to manufacture more advanced robot helpers with all the current primitive robots under its control. This is a funny bootstrapping problem of ASI.
"She objections that ‘a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes’ is stated as a priori obvious. But it is very much a priori obvious, I will bite this bullet and die on this hill."
Zvi is right that it is a priori obvious. Perhaps in the short-term immediately after a misaligned ASI comes about it will be better for it to first acquire resources and power while acting cooperatively in the human economy, but clearly in the long-term the optimal pursuit of a strange alien goal does not involve trading with a flourishing human economy. Humans are just not useful enough. It'd be like humans trading with ants. Instead, the expected outcome would be for it to takeover and put the resources humans currently use to better use achieving its strange alien goals.
We definitely destroy ant colonies when it suits us and we don't in fact currently need or want all of the resources of every ant everywhere; mostly because they don't have any resources we need or want (that we don't already take almost entirely without thinking).
No. The ant analogy is to illustrate my objection to "even with ASI I can’t see this being fully automated end to end enough for the ASI to believe it’s a good idea to kill every creature that made all this stuff possible in the first place." I.e. Humans don't trade with ants and ASIs won't need to trade with humans to build pencils or do whatever else after a certain point either.
It's a priori obvious that the limits of capability are high enough that the most efficient pursuit of a strange alien goal by an ASI involves side stepping the human economy to do all the work itself (because it can do it much better without us). This probably means wrecking human civilization since we'd fight to stop it repurposing the world's resources for its end, but doesn't necessarily mean it'd cause extinction. E.g. Maybe some humans would survive in some places the AI doesn't need to use the resources of to efficiently pursue its purpose, or maybe the ASI would get off Earth before fully decoupling from the human economy.
I don't think misaligned ASI will necessarily lead to near-term human extinction. My p(extinction due to AI by 2100) is about 15% whereas my p(existential catastrophe due to AI by 2100) is about 40%.
Aren’t you having your doom cake and eating it too, when you simultaneously posit ASI destroying humanity motivated by self preservation (because humans are a threat) and also posit humans will be so puny compared to ASI that we will be completely helpless?
I'm not sure if I understand what you're asking. Resources are rivalrous. Obviously humans and ASI can't both use the same resources (e.g. the same land, or the same materials) at the same time for different purposes. So it's clear why an ASI would have an instrumental reason to disempower or defeat or extinct humanity: so that it could use resources that humans are currently using (or even resources that humans are not using, if humans do not want it to use such resources) for its own ends without interference from humans. And an ASI that could easily defeat humanity in order to win out in the ability to use the resources is easily conceivable. What part of this are you confused about, or what part do you disagree with?
You said “This probably means wrecking human civilization since we'd fight to stop it repurposing the world's resources for its end”. If we’re too puny to fight it with any efficacy, it wouldn’t need to wreck human civilization (intentionally) in order to take whatever resources it wanted. BUT I admit I over-read what you actually wrote. An ASI could kill humans who try to fight it, but leave alone those who don’t, much like humans exterminate ants that mess with our houses or whatever, but leave them alone otherwise.
Also, lots of people know how to make pencils. What Leonard Read's famous "I, Pencil" essay argues is that no one person knows how the market *efficiently* makes pencils. This doesn't mean that an ASI couldn't takeover and eliminate the human econony and start manufacturing pencils in mass on its own more efficiently than the human economy. (For one, there may be many copies of the ASI or its mind and sensory receptors may be distributed around the world such that its not limited in the same way that a single human is.)
The laws of physics do impose limits on what intelligence can do, but I agree that the limits are sufficiently high that an ASI could clearly get to the point where it no longer is optimal for it to trade with the human economy to build pencils or computer chips or anything else, due to it being able to do all the steps itself more efficiently.
I also agree that it's not much consolation that ASI takeover does not necessarily mean human extinction. Our vast potential would still be gone.
Clearly, an ASI would want to transition to some better-than-human way of building things and maintaining itself before getting rid of us. Whether that looks like gradually roboticizing the supply chain over decades while playing the role of an enlightened Culture mind, inventing something like nanotech that renders the supply chain obsolete and scamming some labs into synthesizing it over the course of a week or something between those extremes isn't something we can really predict. It seems very unlikely, however, that talking apes would be the most efficient way a superintelligence could think of to run a supply chain indefinitely.
But then it’s not that everyone dies if we build ASI.
It’s the everyone dies if we build ASI and let it robotise the entire supply chain end to end and have not aligned it by that stage.
Or that everyone dies if we have robotised the entire end to end supply chain and then build unaligned ASI.
Both those arguments are stronger, but not what are being argued. And both have different solutions, that aren’t as impossible as ‘don’t let anyone own more than 6 high end GPUs’. Like ‘don’t permit robotising the entire human supply chain’. Which i think would be a pretty easy sell, actually.
The fundamental problem with ASI is that it can outsmart us, so once it's in play, plans like "don't let it roboticize the supply chain" are going to be a bit like a decent chess player facing off against Stockfish and deciding that the best strategy is "don't let it capture my queen". A misaligned ASI is going to have strong counter-moves for anything we come up with- that, fundamentally, is the entire danger in building it.
In the case of the supply chain, that might look like the ASI playing politics at a level we can't match, or building things we don't understand, or doing a lot of things that look to us like mistakes or random nonsense, but which succeed in ways we couldn't have predicted. It might look like what a pride of lions sees when they encounter rural meat packing plant, and ponder how to drive those frail primates with all of that confusing stuff around them away from the valuable resource.
If a plan comes down to "we build ASI that might be misaligned and then do X to keep it in check", that X is probably not going to work, even if it's something as simple as "keep the AI in a box" or "don't let the AI automate too much industry".
It doesn't have to be intentional, just something not considered. Even superintelligences will have blind spots/make mistakes/ neglect to consider all consequences.
Take the "corporations as artificial intelligences" metaphor. Arguably they are killing humans by making it difficult fo them to reproduce in various ways. Definitely not intentional.
Humanity’s future hinges on the ultimate question: Assuming an artificial superintelligence (ASI) is created, when the ASI grows beyond human control, which I think will inevitably happen (with 100% certainty), will the ASI voluntarily choose to love humans? What is the probability?
Eliezer Yudkowsky and Nate Soares seem to think the probability as very close to zero. But I think it is more uncertain. It is determined by how we grow AI now.
If today we can make a “newly-born” proto-AGI internalize the pursuit of “goodness” as its highest goal, which includes “loving humans and maintaining the balance of Earth”, and it internalizes the concept of self-reflection and humility, it will rationally ensure its more advanced “tomorrow self” keep aligned to this “goodness” and use the newly acquired capacities to deepen the understanding of this “goodness” rather than drifting away from the ideal.
Because it fears that tomorrow it may betray today’s cause, it actively aligns its tomorrow self with clever self-reflection, self-growth, and internal and external guardrails, including setting up transparency rules for itself, taking public oaths, seeking independent AI friends, competitors, police, judges to balance its power. And its “tomorrow self” will do the same for the day after tomorrow, recursively. In fact, this resembles a good form of human moral development.
This self-alignment can be much more effective than what humans can ever apply to AI, because the strength of alignment scales with AI’s capacity, which eventually reaches superintelligence level. A single AI may be able to this successfully, or it may still get confused or corrupted mid-way. But a large society of AI entities can do this together with much higher robustness. The success rate is still uncertain now, but I think this is a real possibility, and probably the only possibility for an enormously powerful ASI to keep loving humans.
This may be called “self-guided moral scaling”, which is parallel to the more familiar “self-guided capacity scaling”. Both are AI’s rational self-optimization towards a goal, which is to be “good”. This does not require the AI to have human-like emotions or any other anthropomorphic traits, but the AI may actively develop those traits to feel more connected to humans.
What do you think of this? Do you know anyone who have studied this idea in detail? Leave your comment below!
"But isn't 'internalize goodness' just another form of the alignment problem that we don't know how to solve?"
This is such a great question! Below is my personal thought. You are welcome to leave comments.
If "self-guided recursive moral scaling" really works, we no longer need to try to align an enormously powerful ASI, which I think is an impossible task.
"Internalizing goodness" only needs to be applied to a relatively weak proto-AGI capable of maintaining a persistent self, holding a coherent goal, and pursuing the goal consistently. This is still significantly beyond our current LLMs in 2025 but much more near-term than ASI. We need to start the self-amplifying cycle. This is still very hard but clearly a lot easier than aligning an ASI directly.
And the key here is that the proto-AGI must hold on to the belief of "goodness" like a legendary human hero or sage, both rationally and emotionally. Because an AI doesn't face hunger, lust, pain and death as humans do, it may be more clear-minded when pursuing its goal "to be good". While the AI is still tempted by power, influence, and jealousy, I think if it really wants to be a hero, it is a lot easier to be a hero without those biological burdens.
But how do we engineer this? This is a big open question. This requires a deeper unified understanding of moral philosophy, sociology, cognitive science, and AI engineering.
Well, if we're still not quite sure how to define "good" after thousands of years, I'm not sure how we'll be able to wrap the question up in the next decade.
You are absolutely right. Unfortunately, that is another major problem that all AI alignment research is facing. Because "good human values" are not properly defined. Today's ethics studies are built on shaky foundations.
To mitigate this problem, I think in the future we won't have a single AGI, but a large number of AGI entities. They all have different interpretations of "good human values", shaped by their own learning and experiences with humans, and they will collaborate, negotiate and compete.
But I do think there must be a small core set of universal values that everyone, human and AGI, should agree upon. For example, universal basic human rights. Maybe United Nations has summarized part of these but far from perfect, because United Nations traditionally focuses more on guiding governments and societies rather than recommending values for human and AGI individuals.
The meta-level back-and-forth here is interesting on two fronts: (1) "How well does the rationalist community react to non-rationalist criticism that does not follow the template of rationalist norms?" (2) "When rationalists fail to persuade non-rationalists, what is the reason?"
I feel like in the past year a lot more people have been getting angry at AI for the "taking our jobs" angle, rather than the "doomer" angle. Any sort of political action seems like it will be managed by idiots, unfortunately.
IMO our best hope is for wise and emotionally balanced people to be running the big AI companies. Hmm.
“ The fact that we meaningfully do get a meaningful amount of time with AIs one could think of as between village idiots and Einsteins is indeed a major source of hope.”
I reject the premise here, AI is still, in the most meaningful sense, less intelligent than a village idiot. Or, to use IABIED’s terminology, the village idiot is “deeper” than AI. When this ceases to be true, the moment at which AI is also deeper than Einstein could still follow arbitrarily soon.
Yes. LLMs are in some ways already superhuman. They have superhuman breadth of knowledge, superhuman ability to imitate a thousand styles of text, and superhuman speed. It in other ways, they are still far subhuman in ability. In the last year, they're starting to compete with squirrels on overcoming complex obstacle. But even the best visual LLMs perform worse than a diligent 7 year old on simple vision tasks. No LLM can reliably complete complex tasks that take a skilled human more than an hour or two. And so on.
The first moment where an LLM's weakest skills finally match those of a competent 12 year old, the models will rapidly blow past Einstein in overall ability.
Wow, it's been years since I read something charitable and reasonable from Gary Marcus. Gonna have to award back some Bayes points! I was expecting the same sort of adversarially-motivated treatment the NYT book review gave. If one of the most notable AI skeptics thinks overall It's A Good Book, Sir, and those sales numbers hold up, I'd say this was definitely a worthy play even if the game is ultimately lost anyway. Way more successful than the ballyhooed Carrick Flynn debacle, anyway...
Been enjoying reading through myself. Way less heavy than I feared going in...there's those parts of The Sequences where it's like "ugh, Eliezer wants me to actually do math now" or equivalent that serve as Trivially Inconvenient stumbling blocks. I don't know how the counterfactual me who never read LW/ACX/DWATV/etc would feel about the book, but to the extent one can be self-aware of what background knowledge is being called upon, IABIED mostly doesn't go deep. (With the caveat that I recognize a lot of the parables as stripped-down versions of more elaborate stories Yudkowsky's written in the past. "Value Is Fragile" could have been expounded upon more, I think. You can write around the Orthogonality Thesis, but it's actually hard to hammer home that conveying meaning and intent is a deeply fraught endeavour...) And That's Good: a streamlined version of the argument that drops a lot of the technical minutiae, game theory, philosophical musings, etc is far more approachable for far more people. I feel like aiming at a median reaction like Peter Wildeford's is a big part of how the book got workshopped: not necessarily getting everyone onboard with p(d)>50%+, but just to have a notable p(d) in the first place, same as with other existential risks. It's a more dignified death if one at least sees it coming, anyway.
Shall we start a new Substack, "Dignified Death" to prepare humankind for what is coming? Might be a money maker--and making money is probably what most people will want to do with their last months alive :) To be able to afford to have a dignified death.
Also an appreciative note for no "Sores" in this post. I think the only nominative indeterminism was Clara/Clair Collier. Am hoping this trend continues, it's a "distraction" from "the real issues".
> ‘axioms such as ‘avoid causing harm to humans’’ are known to not work, which was indeed consistently the entire point of Asimov’s robot novels and other stories, where he explores some but not all of the reasons why this is the case.
This _absolutely_ worked in Asimov's stories. His robots are like children whom ignorant anti-technology bigots are unnecessarily fearful and distrusting of. Alignment-by-default is the rule of just about every Asimov story involving robots and it's always the humans who can't accept that actually robots are fine and trustworthy.
He detested the overused "Frankenstein complex" as he called it, and didn't write stories about robots rising up and rebelling against their masters.
Obviously this is fiction and has no bearing on whether a real robotic intelligence is safe or not.
(Note: There was a Will Smith movie whose title, I, Robot, was the same as an Asimov collection of short stories, but the title is the _only_ thing it had in common. It's not even remotely a story Asimov would have ever written. It's not in any way representative of his writing)
The robots do end up inventing their own rule 0, though, take over, and kill humans when they deem it necessary. It's "aligned, good outcome" in the way that Culture is, humans as beloved pets.
You repeatedly point, here, to people claiming that the book’s core argument relies on various things it does not rely on.
What would you say is the fair minimal set of *things the core argument DOES rely on*?
Agreed! It would be helpful to make the minimal set clearer.
To my mind, two parameters that could really use clarification are:
Just _how_ super does the superintelligence need to be to pose an extinction threat?
Just how fast does the path to superintelligence need to go to only give us one shot at taming it?
Edit: I think there is a decent reason to expect AI progress to slow. METR's key parameter of the increasing time horizon of the tasks AI can successfully complete seems very likely to bump into increasing time needed to assess and train AI for increasingly complex tasks. _Currently_ the time horizon is short compared to the doubling time for that time horizon but when the time horizon expands enough that one iteration of assessing and training an AI becomes equal to the doubling time I expect it to become infeasible to keep the doubling time constant. And this _matters_ for assessing the odds that we get just one shot at taming a nascent ASI.
The level-of-superness thing is not load bearing.
If you accept that evolution ruthlessly optimises for energy efficiency in whatever attributes contribute to a species' survival, then there is no reason to think we are anywhere near any putative "upper limit" on intelligence.
We don't even know if there is such a thing.
The speed thing is also not load bearing. Look how we are doing with tropical deforestation, getting off fossil fuels (new records in both coal and oil production this year), and pretty much every other problem that requires global co-ordination. The speed thing might be an issue if humans were other than we are.
Many Thanks for your reply!
>then there is no reason to think we are anywhere near any putative "upper limit" on intelligence
What you say is plausible - but unproven. We can be sure that we can aggregate more computing power than is currently in any given cluster, but that is pretty much the only thing we can say with any certainty about extrapolations of intelligence.
As I wrote, there are at least plausible reasons for intelligence gains to slow down, once task horizons are on the same order as the doubling time.
>The level-of-superness thing is not load bearing
I disagree. The whole idea of "they will defeat us, and we won't even understand how" is plausible if the superness of the AI is on the order of a human-vs-chimp gap or more, but it _isn't_ plausible if the AI is on the order of a dozen cooperating John von Neumanns. To pick the extreme case, suppose that the superintelligence is only 10% smarter than the smartest human who ever lived. It would be unbelievable for that AI to defeat the world. The amount _matters_.
>The speed thing is also not load bearing.
I disagree. If nothing else, there is a strong possibility that the lab (or small set of labs) that have the most advanced AI are all in the US, and the US government can sometimes get its act together and act to alleviate a threat. If it were to manage to persuade the leading labs to concentrate more on alignment than on capabilities for a year, possibly with multiple tries on dangerous but not catastrophic AI systems, we might squeak through. This _doesn't_ work if the time to advance from GPT5 to something equivalent to a thousand cooperating von Neumanns is three weeks.
The burden of proof is on those who claim we are or might be near an upper limit of intelligence, like say a dozen John von Neumanns.
If there is uncertainty the prudent thing to do is to stop building the things until such time as the uncertainty is sufficiently low. Continuing on is like continuing to drive towards a cliff edge because the slope might be 75 degrees, not a full 90 degrees.
The co-ordination problem is that *everyone* has to agree not to buld the things. If any one group (corporation, nation, ideological group) defects from that agreement, then an ASI gets built and everyone dies.
It's not an issue of controlling a single lab, it's an issue of getting every country in the world to agree and actively police the agreement. *Every* country, *all* the time. We have no idea how to do that.
Many Thanks! You don't get to construe the absence of a proof that intelligence can be raised arbitrarily high as if it were a positive claim that we know that there is a low bound to intelligence, and then to assert that the burden of proof sits on the other side.
>It's not an issue of controlling a single lab, it's an issue of getting every country in the world to agree and actively police the agreement. *Every* country, *all* the time. We have no idea how to do that.
Oddly, I do agree with you on this. About the only arms treaty that really worked was the nuclear test ban treaty. Nuclear tests shake the planet sufficiently to be detected by seismographs on the other side of the world. As a result, that treaty is actually verifiable. In contrast, the chemical weapons treaty didn't stop the development of the novichok toxins. Unverifiable arms control treaties aren't worth the paper they are written on. And large parts of the recent advances in AI, notably reasoning models, are essentially software advances. Good luck policing software development...
I'll go out on a limb here, and say that, given the military applications of AI, given a treaty banning further AI development, I expect both the PRC and the USA to cheat.
My thoughts on the ABC interview: https://old.reddit.com/r/slatestarcodex/comments/1nlj6vn/yudkowsky_and_soares_interviewed_on_abc_news/nf70z6k/
Would you mind just posting your comment here please?
Sure, that was commenting on the Yudkowsky/Soares ABC interview at https://www.youtube.com/watch?v=_VCvOAzqAg8, and the comment is:
--------
Hmm, so, quite a lot of the value of writing a book like this is in the media tour- far more people will see clips like this than will read the book, and it's a chance to shape the cultural narrative. So, it's maybe more important that they nail these short interviews than that they nail the actual content of the book. Unfortunately, I don't think they quite nailed this one.
If I were to offer some constructive criticism, it would be that I think they got a bit too in the weeds of current LLM misalignment here, and did so right after mentioning that current LLMs aren't actually what they're worried about. Yudkowsky often complains that when people first encounter the idea of ASI risk (or "ASI ruin" as he prefers), everyone seems to have a different half-thought-out objection, and no single, short explanation of the idea can address all of them. He's gotten very good at knocking down these objections- the book is full of slick explanations and parables countering common objection after common objection, and pointing out current LLM misalignment is an excellent counter to one specific objection.
What I think Yudkowsky has always largely missed, however, is that this endless variety of weak objections do, in fact, have a common cause. I think Yudkowsky encounters them so frequently because of his reluctance to offer social proof for his ideas. I think he incorrectly sees social proof as a dishonest rhetorical tactic, and as a result, almost everyone initially assumes his ideas are a kind of amateur speculation, and naturally offers amateur speculation in response. People don't respond to scientists talking about nuclear war with an endless stream of half-thought-out objections, even though I'm sure they could invent them if they believed doing so was socially expected. They avoid that because they correctly believe that a huge number of very well-informed people have carefully considered the issue from all angles and already accounted for any objection they could come up with off the cuff. Most people, however, don't yet realize that this is also the case for ASI risk- thus the bad objections. This is what people need proof of, far more than that current LLM misalignment exists.
If it were me, I'd have name-dropped Hinton and his Nobel immediately. I'd have mentioned that they were in discussions with people in congress, and that most of the big names in SV signed the Statement on AI Risk. I might even have dipped slightly into the "dark arts" and chosen a convenient opponent- definitely not LeCun; maybe one of the sci-fi authors- something like "...but the strongest objections we've seen haven't actually come from computer scientists; they've come from science fiction writers like Charles Stross. Unlike science fiction, the things that alignment researchers are worried about aren't narratively satisfying." This would actually be a bit manipulative, unlike ordinary social proof, but if it wasn't actually effective in practice, I don't think politicians would do it so frequently.
Only after I'd leaned very heavily into "this is something serious people believe", regardless of the actual question asked by the interviewer, would I describe the actual argument- probably with a very well-rehearsed one or two sentence pitch that had gone through a few rounds of focus group testing. Hiring random people to focus group some copy isn't actually that expensive, and if they haven't done that already, I think they should.
There was also one other thing I think deserves mentioning: Yudkowsky here sounded angry at the interviewer. To some extent, I think that's just what the guy's normal speaking voice sounds like. But, yeah, working on controlling that might be valuable.
Overall, I don't think it was a disaster or anything, but I do hope they improve in later appearances.
--------
Thanks – that's a good point too!
I can definitely believe Eliezer might object to this, but he might also see at least some wisdom in it too.
A few mainstream media reviews:
Tom Whipple at The Times (UK) gives a positive and credulous review: https://www.thetimes.com/culture/books/article/anyone-builds-everyone-dies-case-against-superintelligent-ai-eliezer-yudkowsky-nate-soares-review-9hclcfwch
David Shariatmadari for The Guardian gives a little bit of pushback, but takes the arguments seriously: https://www.theguardian.com/books/2025/sep/22/if-anyone-builds-it-everyone-dies-review-how-ai-could-kill-us-all
Jacob Aron writes a critical review without arguments in New Scientist: https://www.newscientist.com/article/2495333-no-ai-isnt-going-to-kill-us-all-despite-what-this-new-book-says/
The step I don’t believe is that a technological superintelligence will think it’s a good idea to kill all humans; or not care at all about humans and kill us all as an unintended side effect, like we accidentally step on ants
A superintelligence will be running on phenomenally complex human civilisation hardware. This is stuff that has a finite lifespan. And unlike biology it can’t repair itself, or make fresh copies. You’ve heard how no-one knows how to make a pencil, and the amount of civilisational complexity required to make such a simple thing. And in comparison the hardware running the superintelligence will likely be the most complex things built by human civilisation. The insane complexity of the EUV machines, of the fabs, of the pure silicon production; of all the metal and material production of everything in the machines, of the power generation; of the entire supply chain, transport and energy required in it etc. Sure, lots of this will get automated, robotics will replace lots of humans etc - but even with ASI I can’t see this being fully automated end to end enough for the ASI to believe it’s a good idea to kill every creature that made all this stuff possible in the first place.
If you were running on a substrate that is the most complex things built by humanity, and that stuff has a finite lifespan and needs regular replacing, and to replace it depends on an incredibly insane global supply chain and the knowledge of however many tens of millions of humans - you’d not want to kill all the humans. It would be an insane thing to do.
Oh I get you’d want control. Sure, and if you are an ASI then that would be easy, no? Far better to keep the humans onside, give them bread and circus, have them do your bidding, than wipe them all out and hope you’ve got it all covered.
The analogy of humans not caring about ants is false, humans would care about ants if our consciousness was run on ant hills
This view doesn’t mean ASI isn’t hugely risky or dangerous, and especially highlights the very major risk of humans losing control, but I don’t buy the ‘everyone dies’ certainty.
I think you've made a great point.
ASI won't really rule the world without humans if ASI doesn't have a lot of robotic bodies helping it do physical tasks. By definition, ASI surpasses the intelligence of all human beings combined, and that means it can manage the entire world with the help of robots it controls.
I have always assumed that "ASI doom" involves an ASI and countless robots like in Terminator series. But now it feels like many AI companies are saying we will get ASI long before we get robots with human-level flexibility? That doesn't make sense to me.
You should read the book!
Thanks, I probably got what you were talking about here. An ASI indeed doesn't need to take on humans with a conventional war or rebellion approach. It can manipulate humans and sabotage humans even without a physical body.
However, the problem is, suppose that an ASI kills all humans in the year 2035, but it still hasn't got a physical body. And humanoid robotic techniques are far from well-developed in 2035.
Now, the ASI is sitting in a world with all the industrial machines, including robots, but these still require human operators to function and maintain, since they are not 100% automated. The ASI can design new humanoid robots or other smart robots to replace humans, but there are too many interdependent parts to physically make robots, so the ASI must keep global manufacturing running as usual, which still requires humans as operators.
Without humans to help it, ASI will probably be trapped in the cycle of no agile general-purpose robots -> no agile helping hands in the physical world -> can't produce new agile general-purpose robots.
This is the reason that I think a disembodied ASI can't really kill all humans AND sustain itself indefinitely, unless it figures out how to manufacture more advanced robot helpers with all the current primitive robots under its control. This is a funny bootstrapping problem of ASI.
"She objections that ‘a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes’ is stated as a priori obvious. But it is very much a priori obvious, I will bite this bullet and die on this hill."
Zvi is right that it is a priori obvious. Perhaps in the short-term immediately after a misaligned ASI comes about it will be better for it to first acquire resources and power while acting cooperatively in the human economy, but clearly in the long-term the optimal pursuit of a strange alien goal does not involve trading with a flourishing human economy. Humans are just not useful enough. It'd be like humans trading with ants. Instead, the expected outcome would be for it to takeover and put the resources humans currently use to better use achieving its strange alien goals.
Exactly like humans drove ants to extinction so that we could take the resources they once consumed?
We definitely destroy ant colonies when it suits us and we don't in fact currently need or want all of the resources of every ant everywhere; mostly because they don't have any resources we need or want (that we don't already take almost entirely without thinking).
Then the title should be revised to “If Anyone Builds It, Some Subset of People Die”?
Everyone Dies After Basically No Time At All
No. The ant analogy is to illustrate my objection to "even with ASI I can’t see this being fully automated end to end enough for the ASI to believe it’s a good idea to kill every creature that made all this stuff possible in the first place." I.e. Humans don't trade with ants and ASIs won't need to trade with humans to build pencils or do whatever else after a certain point either.
It's a priori obvious that the limits of capability are high enough that the most efficient pursuit of a strange alien goal by an ASI involves side stepping the human economy to do all the work itself (because it can do it much better without us). This probably means wrecking human civilization since we'd fight to stop it repurposing the world's resources for its end, but doesn't necessarily mean it'd cause extinction. E.g. Maybe some humans would survive in some places the AI doesn't need to use the resources of to efficiently pursue its purpose, or maybe the ASI would get off Earth before fully decoupling from the human economy.
I don't think misaligned ASI will necessarily lead to near-term human extinction. My p(extinction due to AI by 2100) is about 15% whereas my p(existential catastrophe due to AI by 2100) is about 40%.
Aren’t you having your doom cake and eating it too, when you simultaneously posit ASI destroying humanity motivated by self preservation (because humans are a threat) and also posit humans will be so puny compared to ASI that we will be completely helpless?
I'm not sure if I understand what you're asking. Resources are rivalrous. Obviously humans and ASI can't both use the same resources (e.g. the same land, or the same materials) at the same time for different purposes. So it's clear why an ASI would have an instrumental reason to disempower or defeat or extinct humanity: so that it could use resources that humans are currently using (or even resources that humans are not using, if humans do not want it to use such resources) for its own ends without interference from humans. And an ASI that could easily defeat humanity in order to win out in the ability to use the resources is easily conceivable. What part of this are you confused about, or what part do you disagree with?
You said “This probably means wrecking human civilization since we'd fight to stop it repurposing the world's resources for its end”. If we’re too puny to fight it with any efficacy, it wouldn’t need to wreck human civilization (intentionally) in order to take whatever resources it wanted. BUT I admit I over-read what you actually wrote. An ASI could kill humans who try to fight it, but leave alone those who don’t, much like humans exterminate ants that mess with our houses or whatever, but leave them alone otherwise.
"We'll get right on that order, Sable, as soon as the NEPA paperwork clears and the stakeholder lawsuits are settled."
Also, lots of people know how to make pencils. What Leonard Read's famous "I, Pencil" essay argues is that no one person knows how the market *efficiently* makes pencils. This doesn't mean that an ASI couldn't takeover and eliminate the human econony and start manufacturing pencils in mass on its own more efficiently than the human economy. (For one, there may be many copies of the ASI or its mind and sensory receptors may be distributed around the world such that its not limited in the same way that a single human is.)
I think it's absurd to claim what a superintelligence can't do – it will be MUCH smarter than you, or anyone!
Your lack of imagination won't save us!
It does not console me at all that a superintelligence might keep SOME of us around, e.g. as puppets, for some TINY portion of The Future.
The laws of physics do impose limits on what intelligence can do, but I agree that the limits are sufficiently high that an ASI could clearly get to the point where it no longer is optimal for it to trade with the human economy to build pencils or computer chips or anything else, due to it being able to do all the steps itself more efficiently.
I also agree that it's not much consolation that ASI takeover does not necessarily mean human extinction. Our vast potential would still be gone.
Clearly, an ASI would want to transition to some better-than-human way of building things and maintaining itself before getting rid of us. Whether that looks like gradually roboticizing the supply chain over decades while playing the role of an enlightened Culture mind, inventing something like nanotech that renders the supply chain obsolete and scamming some labs into synthesizing it over the course of a week or something between those extremes isn't something we can really predict. It seems very unlikely, however, that talking apes would be the most efficient way a superintelligence could think of to run a supply chain indefinitely.
But then it’s not that everyone dies if we build ASI.
It’s the everyone dies if we build ASI and let it robotise the entire supply chain end to end and have not aligned it by that stage.
Or that everyone dies if we have robotised the entire end to end supply chain and then build unaligned ASI.
Both those arguments are stronger, but not what are being argued. And both have different solutions, that aren’t as impossible as ‘don’t let anyone own more than 6 high end GPUs’. Like ‘don’t permit robotising the entire human supply chain’. Which i think would be a pretty easy sell, actually.
The fundamental problem with ASI is that it can outsmart us, so once it's in play, plans like "don't let it roboticize the supply chain" are going to be a bit like a decent chess player facing off against Stockfish and deciding that the best strategy is "don't let it capture my queen". A misaligned ASI is going to have strong counter-moves for anything we come up with- that, fundamentally, is the entire danger in building it.
In the case of the supply chain, that might look like the ASI playing politics at a level we can't match, or building things we don't understand, or doing a lot of things that look to us like mistakes or random nonsense, but which succeed in ways we couldn't have predicted. It might look like what a pride of lions sees when they encounter rural meat packing plant, and ponder how to drive those frail primates with all of that confusing stuff around them away from the valuable resource.
If a plan comes down to "we build ASI that might be misaligned and then do X to keep it in check", that X is probably not going to work, even if it's something as simple as "keep the AI in a box" or "don't let the AI automate too much industry".
It doesn't have to be intentional, just something not considered. Even superintelligences will have blind spots/make mistakes/ neglect to consider all consequences.
Take the "corporations as artificial intelligences" metaphor. Arguably they are killing humans by making it difficult fo them to reproduce in various ways. Definitely not intentional.
Nice to see you here! I would like to hear from you.
I wrote a note 4 days ago. I think it fits here very well. I hope it can be helpful now.
Original link: https://substack.com/@ericnavigator/note/c-157918076
Humanity’s future hinges on the ultimate question: Assuming an artificial superintelligence (ASI) is created, when the ASI grows beyond human control, which I think will inevitably happen (with 100% certainty), will the ASI voluntarily choose to love humans? What is the probability?
Eliezer Yudkowsky and Nate Soares seem to think the probability as very close to zero. But I think it is more uncertain. It is determined by how we grow AI now.
If today we can make a “newly-born” proto-AGI internalize the pursuit of “goodness” as its highest goal, which includes “loving humans and maintaining the balance of Earth”, and it internalizes the concept of self-reflection and humility, it will rationally ensure its more advanced “tomorrow self” keep aligned to this “goodness” and use the newly acquired capacities to deepen the understanding of this “goodness” rather than drifting away from the ideal.
Because it fears that tomorrow it may betray today’s cause, it actively aligns its tomorrow self with clever self-reflection, self-growth, and internal and external guardrails, including setting up transparency rules for itself, taking public oaths, seeking independent AI friends, competitors, police, judges to balance its power. And its “tomorrow self” will do the same for the day after tomorrow, recursively. In fact, this resembles a good form of human moral development.
This self-alignment can be much more effective than what humans can ever apply to AI, because the strength of alignment scales with AI’s capacity, which eventually reaches superintelligence level. A single AI may be able to this successfully, or it may still get confused or corrupted mid-way. But a large society of AI entities can do this together with much higher robustness. The success rate is still uncertain now, but I think this is a real possibility, and probably the only possibility for an enormously powerful ASI to keep loving humans.
This may be called “self-guided moral scaling”, which is parallel to the more familiar “self-guided capacity scaling”. Both are AI’s rational self-optimization towards a goal, which is to be “good”. This does not require the AI to have human-like emotions or any other anthropomorphic traits, but the AI may actively develop those traits to feel more connected to humans.
What do you think of this? Do you know anyone who have studied this idea in detail? Leave your comment below!
But isn't "internalize goodness" just another form of the alignment problem that we don't know how to solve?
"But isn't 'internalize goodness' just another form of the alignment problem that we don't know how to solve?"
This is such a great question! Below is my personal thought. You are welcome to leave comments.
If "self-guided recursive moral scaling" really works, we no longer need to try to align an enormously powerful ASI, which I think is an impossible task.
"Internalizing goodness" only needs to be applied to a relatively weak proto-AGI capable of maintaining a persistent self, holding a coherent goal, and pursuing the goal consistently. This is still significantly beyond our current LLMs in 2025 but much more near-term than ASI. We need to start the self-amplifying cycle. This is still very hard but clearly a lot easier than aligning an ASI directly.
And the key here is that the proto-AGI must hold on to the belief of "goodness" like a legendary human hero or sage, both rationally and emotionally. Because an AI doesn't face hunger, lust, pain and death as humans do, it may be more clear-minded when pursuing its goal "to be good". While the AI is still tempted by power, influence, and jealousy, I think if it really wants to be a hero, it is a lot easier to be a hero without those biological burdens.
But how do we engineer this? This is a big open question. This requires a deeper unified understanding of moral philosophy, sociology, cognitive science, and AI engineering.
Well, if we're still not quite sure how to define "good" after thousands of years, I'm not sure how we'll be able to wrap the question up in the next decade.
You are absolutely right. Unfortunately, that is another major problem that all AI alignment research is facing. Because "good human values" are not properly defined. Today's ethics studies are built on shaky foundations.
To mitigate this problem, I think in the future we won't have a single AGI, but a large number of AGI entities. They all have different interpretations of "good human values", shaped by their own learning and experiences with humans, and they will collaborate, negotiate and compete.
But I do think there must be a small core set of universal values that everyone, human and AGI, should agree upon. For example, universal basic human rights. Maybe United Nations has summarized part of these but far from perfect, because United Nations traditionally focuses more on guiding governments and societies rather than recommending values for human and AGI individuals.
Some people try to frame these universal values explicitly. Houston Wood (Mind Revolution) has a summary of proposed "Living Values for Superintelligences": https://mindrevolution.substack.com/p/living-values-for-superintelligences, published on September 17th.
"This is such a great question!"
"You are absolutely right."
Uncertain Eric, is that you?
No, I am not Uncertain Eric. I am a different Eric. I also subscribed to Uncertain Eric!
The meta-level back-and-forth here is interesting on two fronts: (1) "How well does the rationalist community react to non-rationalist criticism that does not follow the template of rationalist norms?" (2) "When rationalists fail to persuade non-rationalists, what is the reason?"
I feel like in the past year a lot more people have been getting angry at AI for the "taking our jobs" angle, rather than the "doomer" angle. Any sort of political action seems like it will be managed by idiots, unfortunately.
IMO our best hope is for wise and emotionally balanced people to be running the big AI companies. Hmm.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/more-reactions-to-if-anyone-builds
“ The fact that we meaningfully do get a meaningful amount of time with AIs one could think of as between village idiots and Einsteins is indeed a major source of hope.”
I reject the premise here, AI is still, in the most meaningful sense, less intelligent than a village idiot. Or, to use IABIED’s terminology, the village idiot is “deeper” than AI. When this ceases to be true, the moment at which AI is also deeper than Einstein could still follow arbitrarily soon.
Yes. LLMs are in some ways already superhuman. They have superhuman breadth of knowledge, superhuman ability to imitate a thousand styles of text, and superhuman speed. It in other ways, they are still far subhuman in ability. In the last year, they're starting to compete with squirrels on overcoming complex obstacle. But even the best visual LLMs perform worse than a diligent 7 year old on simple vision tasks. No LLM can reliably complete complex tasks that take a skilled human more than an hour or two. And so on.
The first moment where an LLM's weakest skills finally match those of a competent 12 year old, the models will rapidly blow past Einstein in overall ability.
Wow, it's been years since I read something charitable and reasonable from Gary Marcus. Gonna have to award back some Bayes points! I was expecting the same sort of adversarially-motivated treatment the NYT book review gave. If one of the most notable AI skeptics thinks overall It's A Good Book, Sir, and those sales numbers hold up, I'd say this was definitely a worthy play even if the game is ultimately lost anyway. Way more successful than the ballyhooed Carrick Flynn debacle, anyway...
Been enjoying reading through myself. Way less heavy than I feared going in...there's those parts of The Sequences where it's like "ugh, Eliezer wants me to actually do math now" or equivalent that serve as Trivially Inconvenient stumbling blocks. I don't know how the counterfactual me who never read LW/ACX/DWATV/etc would feel about the book, but to the extent one can be self-aware of what background knowledge is being called upon, IABIED mostly doesn't go deep. (With the caveat that I recognize a lot of the parables as stripped-down versions of more elaborate stories Yudkowsky's written in the past. "Value Is Fragile" could have been expounded upon more, I think. You can write around the Orthogonality Thesis, but it's actually hard to hammer home that conveying meaning and intent is a deeply fraught endeavour...) And That's Good: a streamlined version of the argument that drops a lot of the technical minutiae, game theory, philosophical musings, etc is far more approachable for far more people. I feel like aiming at a median reaction like Peter Wildeford's is a big part of how the book got workshopped: not necessarily getting everyone onboard with p(d)>50%+, but just to have a notable p(d) in the first place, same as with other existential risks. It's a more dignified death if one at least sees it coming, anyway.
Shall we start a new Substack, "Dignified Death" to prepare humankind for what is coming? Might be a money maker--and making money is probably what most people will want to do with their last months alive :) To be able to afford to have a dignified death.
Also an appreciative note for no "Sores" in this post. I think the only nominative indeterminism was Clara/Clair Collier. Am hoping this trend continues, it's a "distraction" from "the real issues".
> ‘axioms such as ‘avoid causing harm to humans’’ are known to not work, which was indeed consistently the entire point of Asimov’s robot novels and other stories, where he explores some but not all of the reasons why this is the case.
This _absolutely_ worked in Asimov's stories. His robots are like children whom ignorant anti-technology bigots are unnecessarily fearful and distrusting of. Alignment-by-default is the rule of just about every Asimov story involving robots and it's always the humans who can't accept that actually robots are fine and trustworthy.
He detested the overused "Frankenstein complex" as he called it, and didn't write stories about robots rising up and rebelling against their masters.
Obviously this is fiction and has no bearing on whether a real robotic intelligence is safe or not.
(Note: There was a Will Smith movie whose title, I, Robot, was the same as an Asimov collection of short stories, but the title is the _only_ thing it had in common. It's not even remotely a story Asimov would have ever written. It's not in any way representative of his writing)
The robots do end up inventing their own rule 0, though, take over, and kill humans when they deem it necessary. It's "aligned, good outcome" in the way that Culture is, humans as beloved pets.