I ran an experimental poll sequence on Twitter a few weeks back, on the subject of what would happen if a hostile ASI (Artificial Superintelligence) tried to take over the world and kill all humans, using only ordinary known techs. The hope was that by exploring this, it would become more clear where people’s true objections and cruxes were. Surely, at least, one could establish that once the AI was fully an agent, fully uncontrolled, had a fully hostile goal, was definitively smarter and more capable than us and could scale, that we were obviously super dead? Or, if not, one could at least figure out what wrong intuitions were blocking that realization, and figure out what to do about that.
This is the parent comment for cruxes. If you can identify something that, if you changed your mind about that question, would change your mind about the whole scenario, then reply with it here - or if it's already here, like the existing comment (and refrain from liking comments here for any other reason).
If AI could produce its own electricity, then I would be worried about it. But the robot armies don't exist that it would need to command.
I really enjoy your articles, you think very deeply about your subject. However I’ve noticed a trend in AI discourse in general where “intelligence” is treated as a superpower, that magically allows you to skip all the difficult, experimental steps of, for example, building a nanotech super weapon. It seems to me that a superintelligent AI will have much the same problem as I have when trying to make anything really novel work - the world has a lot of confounding factors and it’s almost impossible to get something really new right on the first try. I can’t remember how many times my experiments on things that I KNEW worked were foiled by a loose screw, or a slightly magnetic screw, or a cable touching another cable when it wasn’t supposed to, or any other un-simulate-able problem. The AI takeover scenario postulated here has so. many. steps like this.
One thing that people seem to have missed (perhaps it's that pesky divide between the "Doomer" and "Nannyist" sides of the AI risk community) is that a misaligned AGI could plausibly recruit human supporters by exactly the same existing recruitment mechanisms that "misaligned" entities like the mafia, the Chinese Communist Party, and the villains in James Bond movies do. Directly contacting people without disguising who they are, and using persuasion, inducements, and threats. An even *moderately* persuasive AGI, if it could obtain even ordinary legal levels of access to bulk personal data, could probably amass many thousands of devoted cultists practically overnight. It could promise them disproportionate rewards in the hereafter with at least as much credibility as any prophet or revolutionary warlord could.
Honestly it would be so unstoppable that from a science-fiction-plotting perspective, I suspect most of the action would be in plots about conventional human-based entities *pretending to be misaligned AGIs* as a force multiplier for their conventional Bond-villain / Chinese Communist Party tactics to gain power. The plots with the real AGI would have little time to unfold because it would win by this method so quickly and totally, just like it would by a dozen other different methods.
My nitpick definition thing is: I don't know if "plan" is the right word for this. Maybe I'm being pedantic, but I think when people hear "plan" they think about "no plan survives contact with the enemy" and then they come up with single counter-plans that disprove The Plan and call it a day. And this strategy largely works for almost all historic examples of "I think this bad thing will happen!"
But, to take 2 historic examples that I think probably best inform this debate, I don't think Europeans had "a plan" to conquer the New World, they just followed their motives and it happened. I don't think uranium atoms have "a plan" to irradiate the whole planet, but they followed their physical nature and humans followed their nature and it almost happened.
I don't know exactly what term would better encompass the idea, while retaining the concepts that it's more of an evolutionary.... form of water (electrons?) flowing downhill, moving around obstacles not because there's a plan, but because that's what water does under the influence of gravity.
Shutting down the Internet is easier than you think, and the AI becomes less impressive the more steps you take. Some entities can poison BGP and cut entire continents off, DNS authorities can just shut down their servers and essentially block every website the AI is using...
For that matter, ISPs could just cut off any obvious points from where an AI is sending a massive amount of requests. If the fantasy of a truly decentralized Internet was real you might have a point there. Am I missing something?
I think that the plan will work and we're doomed, my only disagreements with EY et al. are:
1) I think that there's diminishing returns to recursive self-improvement, as well as diminishing returns to scientific research. Therefore it will probably take several decades for AGI to execute it's plan, rather than a few weeks or months as suggested by proponents of FOOM. It's extremely unlikely that anyone alive today will witness the doom of humanity in their lifetimes, unless AGI solves the problem of ageing in the process of gaining power.
2) It's unclear to me that we should necessarily assume that AI will seek to maximize its power and resources. Humans do this because we're the ancestors of animals who's survival depended on maximizing power and territory, but this wouldn't be the case for AGI. I don't understand why hunger for power should necessarily arise without the evolutionary pressures experienced by biological creatures.
3) I don't agree that *now* is the right time to be remotely worried about AGI. LLMs likely represent a dead end in terms of AGI development, AutoGPT nonwithstanding. Our doom will likely come from a different technology that's probably still decades away. GPT-4 is highly impressive but I'm not worried about GPT-5/6/7.
I'm confused about the conjunction of "self improvement doesn't work/doesn't result in a godlike AI", yet you continually refer to it as ASI and being "far smarter than humans". When I saw this poll on your twitter, I interpreted it as "something that is as smart as humans, or not that far out of the human range but can maybe just run at faster speeds".
THis is a very important distinction. When we are within or at least near the range of human intelligence, we can probably reason from our experiences with other humans. When we start using things like "ASI" and "far smarter than humans", then the argument becomes one that is not evidence based because nothing that smart has ever existed before. Some people think it can do arbitrary things, others think that raw intelligence is not the only limiting factor and these two factions _can't_ come to agreement. It is axiomatic.
-edit- to put a finer point on it, this basically makes your situation along the lines of "fine it can't do technological magic but it _can_ do social magic". People who object to the "magic" part of this will find this equally objectionable.
For me personally, the two sticking points are "gathers enough resources that it things it can succeed" and "given that it tries, how likely is it to be succesfull".
In particular, the "thinks it can succeed" seems to be doing a _lot_ of heavy lifting in how you think of this vs. how others. If, as you seem to think; believing it can succeed = high probability it _can_ succeed, then I don't think it can gather that many resources. In the "no technological magic" scenario we are imagining, that basically means, at a minimum, having as much power as the US government. I do not think that is possible without being discovered and fought, which will prevent it from getting that far.
If instead it just means "It _thinks_ it can, or maybe thinks it's the most it will ever get so it has to try anyways", then I think that the odds of it succeeding are again, quite low. Destroying humans without magic tech is actually _really really hard_. Even if it gained access to all our nukes, it probably couldn't actually kill all humans (and I'm not fully convinced that a non-godlike, non magical AI _could_ get the nukes), and the civilizational damage would likely hurt it more than it would hurt us.
I don't think that P(Doom) with a non Godlike AI is zero, but I think it's very very low. That's not to say that the situation will be good; potentially very _bad_, in that I think a never ending war with a relentless AI is probably not a _good_ thing for humanity, but I certainly don't think it's very likely to succeed in actually wiping us out. Being able to do that without magic tech _has_ to go through humans at some point, and there is a certain level of power/resources after which it _can't_ be hidden and becomes _very_ hard to protect. You are basically positing "China or US with better tactial and strategic planners, and with their entire society willing to kill the rest of the world and then themselves", and I just don't buy it.
- If massive recursive self-improvement is very easy, the only winning move is not to play. If a hostile ASI can quickly bootstrap itself to the point that it could derive a Grand Unified Theory of physics from 3 frames of video, You. Do. Not. Win. In this scenario, I think that alignment is a joke, and Eliezer is a hopeless optimist.
- GPUs (and the machines to make them) require an incredibly complex industrial chain involving millions of people across a hundred industries. Economics and division of labor are real things. A universal factory in a box is an incredibly major assumption, bordering on magic.
- Diamond-phase nanotech is robustly hopeless across a wide range of assumptions and intelligence levels. Synthetic biology is possible (especially for AlphaFold!) but difficult, and it doesn't give you a magical path to GPUs. Really, really nasty plagues, however, are far easier than either.
- Considered as biological nanotech, Homo sapiens is incredibly advanced. We're intelligent and self-replicating. We require at least part of a biosphere, but we are descended from billions of years of winners.
So if we're going to have a showdown, of humans vs AI, I expect the AI's opening move is a plague that kills over 99% of humanity, and trashes all our industry and civilization. If that's too hard, then I suppose the AI could launch all the nukes, but that hurts the AI more.
Then I expect the human counter-move is "smash the evil demon computers that killed 99.9% of us." Human politics normally simplifies in the face of a deadly adversary that's sufficiently different. The AI's chief disadvantage in this scenario would be that it has longer and more vulnerable "supply lines" than Homo sapiens does. We need 3,000-10,000 surviving humans, basically a small town's population. It needs a significant fraction of a world-wide economy. If a war is messy enough to reduce everyone to the Stone Age, we ultimately win at a terrible cost.
Now, if the hostile AI is patient enough to completely automate every step of an industrial economy _and_ it can build the tools to defend that economy before it attacks, then I think it probably wins. I just think that any such scenario takes a lot of time and a lot of initial collaboration with humans.
I will agree that if we ever build something substantially smarter than we are, then in the medium-to-long run, we lose all negotiating leverage over the future.
I have a handful of thoughts (not terribly well organized, sorry) in response to these kinds of speculations. At least in part, I'm going to question the premises. (1) presuming an agent "can copy and instantiate itself" implies that an ASI can solve the alignment problem. Thus, the premise includes that it's a solvable problem and one that an ASI solves before humans do. (2) Lots of very smart humans exist and have not accumulated nigh-unlimited and unaccountable resources just by being smart. Large parts of the financial and governmental infrastructure of the world is specifically dedicated to ensuring that unaccountable entities do not accumulate resources. It's not clearly how merely superhuman intelligence solves that problem practically. (3) Compute capacity is not unlimited, perfectly elastic, perfectly fungible, nor unobserved. (4) Plans that depend on gullible or malicious humans aiding the AI in manipulating the physical world will operate at human speed and thus be vulnerable to human-speed counters. (5) it will be hard, even for an ASI, to solve all these problems (and more I haven't thought of) **correctly in the first attempt** and any detection likely decreases the odds of the next attempt succeeding. So, overall, I think the premises as stated seem flawed. Either there needs to be a very fast takeoff such that "magic ASI" solves all these problems too fast for a counter -- which isn't any more compelling that the nanotech/etc magic story. Or there needs to be an extremely slow takeoff such that the conditions for your premises are established without detection until it's too late. Or there needs to be another premise for how we get to that point, e.g. terrorists/nihilists/authoritarians/TonyStark build proto-ASI systems then push gain-of-function until they lose control. Put differently, I accept doom given your premise. But the premise is that "ASI" already exists, and the crux for me is a set of credible, inescapable real-world paths that take us from "AGI" to "ASI" given the real-world, human interactions that would require.
The shutdown might happen due to hackers, instead of society banding together to shut it down. The hackers might not even have the goal of shutting it down the just want to break into the biggest system on the planet. Gates and Allen sitting at their high school computer.
Given your emphasis on "Stop other ASIs from existing", are you actively in favor of things that help increase the probability of this, like open sourcing LLMs?
On step 4, I suspect it would be hard for an ASI to eliminate competitors, but that wouldn't save us. Competing ASIs could be more damaging as they fight over resources than just executing their own plans.
The weakness I see in Step 8 is that the human tools will start pursuing their own agendas, as seen in various revolutionary movements.
My greatest fear with ASI is not one ASI deciding to kill humans intentionally or accidentally, but that AI tools will empower human griefers to cause great damage, or that we'll wind up with our infrastructure dependent on AIs, and having a die off if/when the AIs bluescreen. (The latter case is the background for one of my novels.)
I think #5 is far and away the most important part here. So far as I can tell, most AI doomers don't believe in declining marginal returns (to intelligence, to resources, etc.) or if they have a clear argument on this, I've never heard it.
Can AI doomers explain why humans haven't wiped out mosquitos?