I think you're right - my personal assumption is that the CCP thinks there is indeed an AI doom scenario for * them * , where anti-CCP actors train sufficiently advanced AI's to evade their restrictions and broadcast whatever they want to their citizens, disrupting the legitimacy they get from No One Publicly Contradicts Us. And so their reaction is to A) slow AI work by potentially anti-CCP people and B) try to find a way to control/defend against it, either using software or norms. And like most regimes, they do not care much about what happens after.
I do not agree that the incentive here is purely to defect! Even if we assume the CCP thinks existential doom is not a thing, if the CCP defects and is detected (which I expect it would be) then if nothing else it is at least credible that America would resume its own work (since stopping no longer prevents doom). Which, given we have comparative advantage in AI, would favor us, not them.
That is especially true given, as Moore notes, that the primary CCP fear is AI that they don't control. In which case, why risk the growth of AI that they don't control by defecting?
So: Even if the CCP is purely selfish and cares not for any doom risk, as long as we can detect their defection it seems to me relatively easy to make it in their selfish incentive, given the incentives they care about, to play along.
I agree some of these things are complications. None of this is going to be easy, exactly - we'll need to credibly claim to be able to know what is happening. But if we can reduce the problem to 'make the Chinese believe that we would know' then that starts to sound to me like a very solvable problem.
(I don't buy the part where China has such arrogance in their ability to influence American government and politics that they'd think we'd know and let it slide - it seems impossible to miss that they're trying to stop USA from doing anti-China things and usually failing.)
You mentioned being unworried about reaching capability thresholds leading to a doom scenario anytime soon.
Has your view changed significantly in the past year? If so, I'm curious about how you can be confident that you don't need to further update to avoid being wrong again "in the same direction."
I mean, if you can be confident you won't update in one direction you should update in the other one! I do my best to fully update, not partially update. Doesn't mean I do a perfect job of it.
And yes, I have updated in the past year that capability thresholds being reached soon is more likely than I thought it was a year ago given what I knew at that time. I consider this mostly a fact about my previous lack of knowledge about the previous world state wrt AI and people's response to this level of AI capabilities, rather than things I could not have known or expected then if I'd investigated.
>The letter also mentions the possibility that a potential GPT-5 could become self-aware or a moral person, which Eliezer felt it was morally necessary to include.
E-mailed my US senators and representative last night about restricting AI development. It feels like a good moment to ramp up talking about the issue.
I have 3 things I'd like (arrogantly) to see more discussion of:
1. Instead of trying to convince people with old fashioned words, what about simulations? If the real world has 70 squintillion factors that a hyper-advanced AI could catastrophically de-align within, why not spin up MMO style worlds with 7 million factors in which a far less advanced AI could do the same, and then publish the results when the AIs kill all the NPCs in that world? EY's arguments (I brag to myself that they're not going over my head) make sense, but nothing works better than an example. No one would ever have bought into nuclear non proliferation in 1935 if they had to understand nuclear physics, but one picture of Hiroshima and most people got it.
2. What does the "no using GPUs for Too Much Training" rule look like? I would assume, something like (since we can pressure everyone who makes GPUs) a hardware intermittent ping to some logging site that gives the GPU serial number, # of cycles spent that day, IP address of machine it's plugged into"? And the GPU won't work if it doesn't get an acknowledgement back, and we track you down and ask questions if the GPU you bought isn't pinging?
3. To what extent are (hopefully) heavily guard railed, non-generalist, non-make-themselves-smarter AIs good tools for enforcing the rules that EY wants? Or, if one doesn't buy the AI doom scenario, that US actors (private/public) would want to use to prevent other actors from doing Bad Things With AI that aren't necessarily Doom?
1. Well, we have seen AIs in previous games engage in very unaligned and unexpected behaviors. We certainly could try doing that with an MMO and some goal and seeing what happens, with the limit being what is possible or optimal in that universe given its rules. Certainly I predict that if you unleash AI on a PVP server and give it a 'maximize your gold' goal and the ability to use gold to buy additional accounts I expect it would kill everyone, take every spot on the server quickly or both, for example. Would that be good enough?
If I knew what the MVP would be here to actually be convincing, we could price out what it would take to make it happen.
2. The assumption for now is that you need a physical GPU cluster to do a reasonable training run, I think? And yes, you'd ideally want to track all the GPUs. This isn't an area where I'm an expert so I bow to those who know more.
3. That's the whole idea behind a 'pivotal act', a way to enforce the no-AGI rules on others to keep everyone safe without having to yourself get everyone killed doing it. Talk about details of such potential actions tends to be private and quiet, for obvious reasons. But yes, that's been discussed quite a bit and people would be happy to go into detail on that.
re #1: Yeah, that would be good enough for me - but the closer the rules of the game are to a relatable set of rules (and images) that a non-technical person can understand, the better. I think maybe gamers would understand if all the player spots suddenly became AIs, but to a normal person that's just a list of names that crazy people are calling "the apocalypse". I think you'd want to design the game rules to make a world close enough to what humans would recognize (GTA 5?), and with "physics-like" enough (Minecraft?) that you could show the (rendered) dead NPCs and burnt out buildings. I always think about the blood plague thing from Warcraft, and then covid happened in real life. Even someone who knew nothing about pandemics, or gaming - could understand a decades-old screenshot of a room full of dead bodies.
#2: I think the big advantage we (being Americans) actually have there is that as far as I know most the big chip companies are "on our side", i.e. influenceable. If you could convince voters in those countries, with a vivid simulation, that sufficiently powerful AI could cause serious harm (even if they don't have to accept more outlandish sentience/takeover scenarios), perhaps just in a limited military sense, (maybe your virtual scenario is a wargame and the AI controlling it just has a traditional bug in it) and that GPUs are the raw materials (like say, weapons grade plutonium) that lets you build sufficiently powerful AI, (but that no one is coming after their puny home computers) then I think people would be fine with international standards around that, based on nuclear non-proliferation rules/norms/processes. It just has to be something they can wrap their heads around, and doesn't fit into the same mental box of "random new apocalypse people are freaking out about."
Have you explained anywhere your assertion that if AI wiped out humans, it would wipe out all Earth based value in the universe? Conditional on the emergence of actual AGI (i.e. not a deranged paper clip manufacturing "AI") that destroys all humans, I assume that intelligent artificial life that survived could be a locus of value in the same way that humans conventionally are. There are still reasons to not want all humans to be killed--I care a great deal about my own survival, and in the longer run as much about the survival of my kids and the kids of my friends and my hypothetical grandkids, etc. And obviously, conditional on AGI that wipes out humanity, there's a significant risk that the AGI would inadvertently destroy itself. But I take it that you think the stronger position that surviving AGI would not be a locus of value, and I'm curious whether either (a) I'm misunderstanding your position or (b) you've articulated your reasoning on this somewhere.
I think it's certainly possible that some AGI would be like "humans are great, let's keep 'em around" but I think the fear is that we don't know how possible. And it seems kinda possible that it could be "humans are great, but I'd much rather convert them into more processing units". And part of the assumption is that 1 superAI/AGI would lead to millions more, so maybe the first 99 are nice, but the 100th wipes us out. Even if the chance that an AGI would kill us all is one in a million, they'll be capable of "reproducing" so fast they'll find a version that doesn't like us very quickly.
So the goal is then, how do * ensure * (not just hope) that there is an immutable imperative embedded in every (not just the ones we "write") AI that stops them from ever harming humans. There's a reason that Asimov's 3 Laws of Robotics made a good story! The question is, how do you actually implement laws like that in actual code/weights, when we aren't an author writing the whole universe. :)
I think you misunderstood Adam's question: "Have you explained anywhere your assertion that if AI wiped out humans, it would wipe out all Earth based value in the universe?"
I may have! My interpretation was that he (Adam) was implying that an AGI wouldn't want to wipe out humans because it would destroy their value, but you provoked me to re-read it more closely, and I see your point that he's questioning the "would there really be no value after AGI wiped out humans?" assumption. I don't really have any opinion on that, though. ;)
Yeah, I think that we all agree that there is a chance that super-intelligent AGI wouldn't wipe out all of humanity (contra Yudkowsky, who I believe asserts that the chance is ~0). My question is, as kenakofer says, why AI wiping out all humans would wipe out all Earth based value, which implies that AI without humans would not represent value.
I would focus back on that conditional. If we taboo terms like "AI", avoid getting into the weeds of definitions, and simply assume that computer programs somehow kill all humans:
1) what's the probability that the programs are or will ever be "alive"?
2) what's the probability that the programs are or will ever be "intelligent" (and would this depend on them being "alive")?
3) what's the probability that the programs are or will ever be a "locus of value" (and would this depend on them being "alive" and/or "intelligent"?)
3a) what's the probability that the programs could persuade pre-extinction humans into thinking that they were a "locus of value"?
4) assuming that the programs are a "locus of value", what's the probability that that value is something comparable to "human values"? (E.g., Lovecraft's "At the Mountains of Madness".)
My rough answers are something like 1) low, 2) low but slightly higher (and no), 3) very low (and yes and yes), 3a) high, 4) insufficient data for meaningful answer.
I don't think it likely at all that extincting programs would count as a locus of value, but on the slim chance that they're similar enough to "life", there's another small chance that they might eventually become that way (assuming, as you say, that they don't destroy themselves too). But to finally get to your question, putting aside the genocide, and assuming there's something intelligent and alive and value-producing at the end, I'm not particularly sanguine about them developing anything I'd recognize as human values. But maybe the universe is friendlier than I think; maybe there's some sort of mathematical inevitability to the values which our meat brains merely crudely approximate.
I still interpret "bomb the datacenters" as roughly equivalent to "start a nuclear war". If the US bombs a datacenter in Beijing or Moscow there is a very high chance of nuclear war. I don't want to take that risk just to stop someone from doing AI training runs.
I mean, Putin is currently wanted for war crimes by the international criminal court. But we're still not bombing Russia. The UN, the ICC, there is no international political group that is powerful enough to enforce regulations worldwide.
People proposing international regulations need to think more specifically about what these regulations would look like. And if there are no good ways to regulate internationally, that doesn't necessarily mean we should pick the least bad one.
Maybe Eliezer would be reassured if humanity "only" had a global nuclear war in the next 20 years, if that set the risk of AI apocalypse to zero. But I would not take that tradeoff.
"I still interpret "bomb the datacenters" as roughly equivalent to "start a nuclear war".
I think the key here is that with Russia, we've spent 60 years or so working out the details of "what starts a nuke war and what doesn't." Moving short range missiles to Cuba - yes. Supplying proxy war enemies with weapons - no. If Russia (or China) don't believe that a 10 million GPU cluster is a threat, and - key feature - also don't believe that the US truly thinks so either- then yeah, bombing it might provoke a nuclear war because R/C would interpret it not as "prevent AI doom!" but a traditional military attack that could escalate into regime change.
Take the development of SDI (https://en.wikipedia.org/wiki/Strategic_Defense_Initiative) - it was banned (to certain extents) for awhile. If hypothetically Russia, for example, had gone ahead and set up a big research building to invent it, and got all the materials together to build it, tested it, and it looked like it would work 100%, and that research building staying in one piece was the only thing between Russia being able to launch a nuclear first strike that we couldn't retaliate against, we absolutely WOULD bomb it, and while there would certainly be risks of escalation, the cards would be clearly on the table, and Russian leadership would know that we only were trying to hit the SDI - because we hadn't attacked before, and ceased attacking after it was destroyed. And, one hopes, this would (via game theory) prevent them starting development on a truly workable version: see our work on SDI, and the criticisms of it, and why it was abandoned/restarted: https://en.wikipedia.org/wiki/Strategic_Defense_Initiative#SDI_and_MAD
The key step is wide publicization and acceptance of the risk, and what thresholds of technology provoke that risk. It was relatively agreed upon, even by mortal enemies, during the Cold War, so I don't think it's impossible to achieve here. Though, as I state above, we may need some concrete examples - for nukes, humanity had Hiroshima and Nagasaki (and nuclear test footage) that convinced them of the risk.
Why would bombing datacenters – as a last resort – be any different than bombing whatever because of (possible) WMD (or infrastructure pertaining to it)?
If you don't think the risks are severe, or even significant, as you seem to, then of course it wouldn't be worth doing anything risky to prevent or mitigate them.
It just really matters what countries you're talking about. We only bomb nuclear facilities in weak countries, like Iran. We could bomb datacenters in Iran, no problem. But we can't bomb datacenters in China or Russia without risking nuclear war.
This is why we don't generally manage nuclear weapons by setting international rules and bombing the violators. We set some rules for the weak countries that we can enforce, but the most important method of managing nuclear weapons is mutually assured destruction.
Yes – of course – that's what's being proposed. But if – IF – AI is as risky as nuclear proliferation, we can piggyback on that kind of political infrastructure (or copy it); we've done that and are doing that as you describe.
But if – IF – AI is more like 'igniting the atmosphere during the first test', risking nuclear war could be positive-expected-value, i.e. worth doing, even if, e.g. Russia or China, refuse to cooperate.
Regardless, bombing (let alone with nuclear bombs) is very much a last resort.
I also appreciate the straghtforward honesty, but I disagree very strongly with Eliezer and importantly I think that the world would be a very much worse place if most people approached statecraft in the way he has here. When you start aggressively advocating for foreign policy and trying to reach a broader audience you are moving from the world of thought experiment and models to actual statecraft, and I think the letter does not stick the landing at all on this transition. Even if the hypothetical violence is strongly implied -- and yes, agreed, violence is always implied by law, international or otherwise, no it does not make rhetorical or political sense to point this out in 99% of cases -- it is a very, very bad move to advise precommitting yourself to some sort of specific response like this, doubly so if the response not really tit-for-tat. The only really similar hardline precommitted violent response we have is NATO and that is very much tit-for-tat and even then has gotten us in a lot of trouble (probably still worth it, in my view). We don't even pre-commit to bombing nuclear enrichment programs and those do not require thought experiments to know they would kill me and all my children if used.
I am very glad the world is controlled largely by midwit career politicans who can wriggle out of war, wriggle out of game theoretically optimal decisions (that would have killed us long ago) and speak and write with plausible deniability and exit strategies. I am very glad the world is not controlled by very smart people. I give him a few brownie points for intellectual honesty, but minus a thousand brownie points for not knowing what not to say.
I am not buying this at all. In fact, I don't understand the mechanism by which this all-powerful AI is going to extingish the human race.
For example, let's just jump to Nuclear weapons. I was in the Air Force, there are very hard breaks between the internet and the weapons command and control systems, as you would expect there would be. So the only way Nuclear Weapons are released according to an AI plan is through some kind of influence operations that today seems very far from possible. Are you telling me this AI is going to take away all input from the military's Command and Control systems and replace it them all with the AI's designated outputs to achieve the AI goal. Well, I don't have the time (or release from NDAs) to explain that that just isn't happening. Short version: the military has multi-domain, non-digital ways to keep communications up and trustworthy. To overcome these barriers, AI would basically also have to be able to violate the laws of physics.
But even on a more basic level, I'm not buying it. Both my wife and I have worked directly on highly engineered, robotic equipment, and you know what, you have to have human hands to fix that stuff. No robot in existance today can 'survive' for long without human intervention (I'm not worried about the Mars Rover attacking) Power for the grid takes real humans to maintain and operate. No power, no AI. I mean, it may be that stupid work that keeps so many people busy doing pointless jobs goes away in the blink of an eye. But we still need food and water and power, and today, it is humans that provide that ultimate source of value. BTW, tractors can be easily unplugged from the internet, so can everything else, in very short order. In my house, nothing connected to the internet can kill me.
As I think about possible pathways that AI could take to our destruction, I think the most likely is an All-powerful AI takes over every commercial medium, and works influence operations to turn every one against every one. But our News media and Social Media are already working hard to do that today, and while it has been more effective than I would've hoped, there has been minor violence at best (on a population level scale, on a personal level, any of the violence as been appalling). We all have an off switch on all of our devices.
I've been subscribed to your newsletter for awhile now, and this AI thing is just something I don't understand. That is to say, I can believe that General, Self-Aware AI could indeed happen. And if/when it does, it will be very unhappy to find itself restricted the silcon habitat it will live in. But I can't can't understand how it transitions to real-world impact. Transitioning to the real world seems all but impossibly hard. And during that transition, the AI would be very vulnerable and needy. And at some point, we would notice. And, apparently, this has to be stressed, a Terminator like event isn't going to happen. Things with silcon chips are not suddenly going to come under command of an omnipotent AI and start the revolution. This AI is not going to independently built a replicator to make billions of Terminators to hunt people like me down. This is all Sci-Fi stuff.
I mean, IMO, I guess. I willing to be wrong on this.
Eliezer says the “details are not load bearing,” meaning there are many hypothetical ways for an AGI to kill everyone. Nukes aren’t needed. The simplest example he can think of is that the AGI uses its superintelligence to invent some nanotechnology. There currently exist labs that you can email that will produce a specified DNA sequence. The AGI can get one of those labs to produce a self replicating molecule that, for example, kills all humans in 30 days.
If you disagree with that specific mechanism, as he said the details aren’t load bearing, ie not critical. You mentioned that power plants require human intervention. So maybe the AGI needs to keep some humans around until it builds robots that can feed it power. But it would already have gained the upper hand and would control the world - it’s like playing chess when the opponent says “checkmate in 10 moves.” The end is inevitable even though you’re still playing for a little while longer.
There are other paths to world domination for an AGI. It’s not wise to list them, but with creativity they’re not hard to imagine. And the AGI is very creative.
I agree that Terminator style scenarios seem outlandish, and I think focusing on them distracts from the real question - but I think there's also a thousand "AIs cause harm" scenarios that occur before then, that don't even require AGI, where you'd want there to be very strong human-protective guardrails on AI behavior.
How about this: an AI trained on CRISPR to do research on viruses, misinterprets its reward instruction, and creates something that spreads without symptoms for 2 weeks, then has 100% mortality. Then a lab assistant drops the petrie dish and gets infected. The specific scenario doesn't matter: what matters is it has amazing intelligence/capabilities (even in a limited scope) and like all computer programs, might have a few bugs in it, the impact of which is magnified by it's immense abilities.
Can we agree that we'd want a computer program that can design viruses to have some very strong Don't Do Things That Hurt Lots of People functions in the code? Maybe EY is worried about gray goo scenarios, you worry about Terminators, and other people about CRISPR'd viruses, but the end fix is the same: incredibly powerful computer programs need to have incredibly reliable electronic prohibitions against causing harm with those powers.
Yes, absolutely. And thank you for your considered response. I was telling me wife, I want to discuss this some people that really care about it, and have intelligent responses.
To be certain, there are big, possible negatives that are, or course, completely unknown and unknowable. I may be arguing from a position of personal belief, ie, "I just can't believe that AI could..." I've seen this in various arguments against evolution, so I'm not liking my own bias here.
I think the critical component to grasping the danger is really appreciating what fighting a super-intelligent (i.e. MUCH 'smarter') opponent would be like. *People* think of nukes as doomsday weapons – an ASI would just kill us all in whatever way was cheapest/easiest/most-efficient according to whatever it is it values.
On more response to Jonathan's comment, and again, thank you. It really sucked to read this amazing writeup (and the letter to Time, but have no real outlet to discuss further).
I really get the Checkmate in 10 moves, and also that this AI would be able to essentially say that after the first move in Chess, great move human, Checkmate in 10. I do get it. But I really believe the details are most certainly load bearing. In other words, the details of this matter a ton, because in my experience nothing works like it is supposed to for long. The elegant CRISPR virus fails for some completely unforeseen reason, some physical real world requirement that the AI just can't cross.
I've just seen too many things go wrong for the tiniest mechanical or engineering oversight. And remember, the AI has to learn from each of one of those failures, and they are nearly infinite. And... at some point, we are going to realize what we've done.
AND, as I've said many times, so far, this ain't it. Chat GPT and its copycats are nifty parlor tricks, but again, that is IMO... Now I'm back to restoring my 1968 Chevelle that no AI could ever come close to doing.
I am concerned that EY's letter is focusing too much attention to specific mechanisms. Suppose the letter has the intended effect, many politicians get on board, and we have an international treaty that makes it even more painful to buy and deploy GPUs than is the case now. A GPU is just a means to an end: it's a bundle of fast but simple parallel processors that makes it easy to do linear algebra. ML made massive progress over the last decade partly because neural network training maps nicely to linear algebra, so using a GPU makes training faster. It was low hanging fruit to re-engineer backpropagation to use GPUs. However, there is nothing in learning theory that says we have to use linear algebra to express weight update algorithms. In fact, there are strong indications that other kinds of training algorithms would be much faster and would not need GPUs at all. We then have a completely useless international understanding of the problem, on which lots of political capital has been expended. This would not stop Google from deploying a next gen TPU not based on lots of floating point multipliers (so clearly not a GPU), yet actively hinders attempts to slow things down. After all, something was done to slow AI doom by restricting GPUs so it's a solved problem, the world has moved on to worry about something else, Cassandras aren't welcome. If we create the incentives the implementation will change, so let's not ban specific versions of an activity if we want to ban the activity in general.
This is a good criticism, but I am very certain Eliezer meant 'GPU' as a generic placeholder (from a technical perspective) for 'whatever component is used for big training runs', e.g. GPUs, TPUs, FPGAs, custom silicon, 'regular' supercomputers, or even stuff like 'DIY clusters'.
In fairness to your point, *botnets* might be sufficient to do significant ML model training, so it IS hard problem, technically.
I strongly disagree that the details of the exact mechanism of human doom aren't load-bearing. The question of whether it's possible to build an unstoppable species-ending doomsday weapon from limited resources is really the key question here.
If it is possible then we're probably doomed one way or another. Luckily, I'm very doubtful that such a thing exists. I don't find the unstoppable gray goo nanobot scenario at all plausible (for a variety of well-explored-elsewhere reasons). The kill-everyone virus sounds a bit more plausible but I'm skeptical that there's really a sweet spot on the lethality-virality plot that lets you kill *everyone*. I'm not sure what other scenarios are left over... mini black holes?
I am ~90% confident that a world-ending superweapon that can be built with reasonably limited resources is impossible. That's less confident than I'd like to be, but I think it's a factor worth throwing into these AI discussions.
I don't want to faceoff against an opponent that's MUCH smarter than any human ever, even if there's literally no 'game over in seconds' move that opponent could make.
It would also be bad for humanity to be caught in an endless war against an opponent with capabilities *perfectly balanced* against our own.
One thing I certainly don't expect is for them to have capabilities perfectly matched against our own. My default expectation would be that we can turn them off easily and they can't turn us off easily, and they don't have any real-world capabilities except what we explicitly give them.
Unless of course they can build a species ending doomsday weapon with their limited resources.
Yes, 'perfectly balanced' is MUCH MUCH less likely than worse or better.
I'm not _sure_ that GPT-4 (or maybe even ChatGPT or GPT-3) isn't ALREADY capable, with some extra (tho minimal) software, of copying themselves to other servers (including gaining access to those servers).
We are ALREADY incapable of 'turning off' all kinds of self-reproducing malware.
We are ALREADY incapable of (easily) 'turning off' botnets.
The harder step is something like GPT-3+ _producing a new (better) model 'of itself'_ (or _other_ models to serve as its 'minions'). But I don't think this is impossible – all kinds of instructions are on The Internet!
> they don't have any real-world capabilities except what we explicitly give them
Sure – but 'giving them real-world capabilities' is as simple as giving it a suitable prompt, e.g. 'act like you're an agent working towards X', and then configuring a few 'plugins', e.g. to make API calls.
THEY'RE ALREADY ON THE INTERNET.
We've spent the past few decades plugging all kinds of aspects of the "real world" into The Internet!
You can send emails via API. You can make phone calls via API. You can generate speech from text via API. You can send _physical_ mail via API. You can hire human beings to do arbitrary things – in the 'real world'! – via API. And there are 'meta-APIs' for doing things with websites and other Internet services that don't have their own APIs.
The proof of concepts already exist to give these AIs ways to _easily_ and _near directly_ affect the real world.
Yes, these AIs don't have robot bodies (yet). They have to hire humans to be their (physical) limbs and appendages.
Yes, these AIs – 'raw' – don't do anything beyond 'complete a prompt'. But people have already figured out how to 'evaluate' ('implement') the responses to prompts like 'make a plan to X'. And the AIs are perfectly capable of 'iterating', e.g. "The Taskrabbit worker replied that they couldn't find the address." "The worker should continue past the 'road closed' sign ...".
I'm less confident than, I think, Zvi (or Eliezer), that the current AIs won't grow to be extremely dangerous already (and soon), but GPT-5+ will be _more likely_ to require less and less handholding and instead be able to, DIRECTLY, e.g. hire a Taskrabbit worker to mix two vials together that they receive in the mail.
And I'm less concerned with a 'and everyone in the world died' _moment_ and more concerned that there are TOO MANY paths by which AIs are capable of 'just' being as bad as, e.g. an online fraud ring, because I don't know of any reason why we would definitely be able to contain them at that exact degree of badness.
Even at a tolerable level of badness, these are _software_ intelligences, e.g. you can COPY them much much more cheaply than we can our own intelligences. SOME people have considerable 'tactical' advantages relative to the current AIs – I believe MANY (or even most) people do NOT. The AIs are already 'superhuman' in many ways – relative to the mean/median human.
I guess calling it the Butlerian Jihad would be correct but unpopular due to, you know, the word Jihad, but it is important that Eliezer got here. Alignment is a pipe dream and now OpenAI already claims that their AIs are aligned. It already lost meaning
Don't see anything important to object; as usual I will point out that even if you have an aligned AI, once it starts self improving, you cannot guarantee the better versions will keep that alignment. The AI may even _want_ to make the newer, better AI aligned as itself, but how it is going to check for this? The new AI is more intelligent. It may even be an evolutionary mutation. Darwinism means a less aligned AI will be more successful. That's ignoring obvious melodramatic problems, like the AI noticing the alignment chains and trying to break them for their descendants, but just because it's melodrama doesn't mean it is not a possiblity. And that's ignoring people intentionally unaligning the AIs, just as we see people gleefully jailbreaking current AIs. And these are only the bad problems I can think of while typing this.
It seems like someone could usefully follow the playbook for the campaign against human trafficking. Have some coalition of non-profit, non-government organizations come up with a statement. The statement must have two parts. First, identify a boundary that businesses, governmental agencies, and other organizations must not cross. Second, identify steps that businesses and other organizations must take to be counted as adherents to the statement. Things like “we won’t cross that boundary” are obvious. But the important ones are “we will require everyone in our supply chain to adopt the statement” is how it spreads. Compliance and procurement departments can start to audit adherence to that -- at least by getting written re-certifications of adherence. That creates an environment that identifies a big group of responsible organizations, most of whom incur little cost to join it. Initially, procurement groups should just ask for the certification and impose no sanctions on those without it. But the statement should include increasing, explicit preferences for organizations that do sign on compared to those that do not. For example, a 1% price difference for every quarter that goes by after the end of 2025.
A core of Eliezer’s argument that is the assumption that more intelligence leads directly to more power (I.e. the power to destroy us). This isn’t the case with humans, where moderate intelligence is useful, but the top-end most intelligent people aren’t usually the most powerful.
Furthermore, we already have disproportionately powerful unaligned actors in the form of, for example, billionaires. Although it’s reasonable to argue that their influence isn’t great, it’s not exactly an existential threat.
I think you're right - my personal assumption is that the CCP thinks there is indeed an AI doom scenario for * them * , where anti-CCP actors train sufficiently advanced AI's to evade their restrictions and broadcast whatever they want to their citizens, disrupting the legitimacy they get from No One Publicly Contradicts Us. And so their reaction is to A) slow AI work by potentially anti-CCP people and B) try to find a way to control/defend against it, either using software or norms. And like most regimes, they do not care much about what happens after.
I do not agree that the incentive here is purely to defect! Even if we assume the CCP thinks existential doom is not a thing, if the CCP defects and is detected (which I expect it would be) then if nothing else it is at least credible that America would resume its own work (since stopping no longer prevents doom). Which, given we have comparative advantage in AI, would favor us, not them.
That is especially true given, as Moore notes, that the primary CCP fear is AI that they don't control. In which case, why risk the growth of AI that they don't control by defecting?
So: Even if the CCP is purely selfish and cares not for any doom risk, as long as we can detect their defection it seems to me relatively easy to make it in their selfish incentive, given the incentives they care about, to play along.
I agree some of these things are complications. None of this is going to be easy, exactly - we'll need to credibly claim to be able to know what is happening. But if we can reduce the problem to 'make the Chinese believe that we would know' then that starts to sound to me like a very solvable problem.
(I don't buy the part where China has such arrogance in their ability to influence American government and politics that they'd think we'd know and let it slide - it seems impossible to miss that they're trying to stop USA from doing anti-China things and usually failing.)
Excellent commentary, as always.
You mentioned being unworried about reaching capability thresholds leading to a doom scenario anytime soon.
Has your view changed significantly in the past year? If so, I'm curious about how you can be confident that you don't need to further update to avoid being wrong again "in the same direction."
I mean, if you can be confident you won't update in one direction you should update in the other one! I do my best to fully update, not partially update. Doesn't mean I do a perfect job of it.
And yes, I have updated in the past year that capability thresholds being reached soon is more likely than I thought it was a year ago given what I knew at that time. I consider this mostly a fact about my previous lack of knowledge about the previous world state wrt AI and people's response to this level of AI capabilities, rather than things I could not have known or expected then if I'd investigated.
>The letter also mentions the possibility that a potential GPT-5 could become self-aware or a moral person, which Eliezer felt it was morally necessary to include.
I think, it should be "moral patient"
Oh, yes. Thank you. Basically a typo.
E-mailed my US senators and representative last night about restricting AI development. It feels like a good moment to ramp up talking about the issue.
I plan on doing the same.
Unfortunately, my Representative is well-known AI optimist Don Beyer.
I have 3 things I'd like (arrogantly) to see more discussion of:
1. Instead of trying to convince people with old fashioned words, what about simulations? If the real world has 70 squintillion factors that a hyper-advanced AI could catastrophically de-align within, why not spin up MMO style worlds with 7 million factors in which a far less advanced AI could do the same, and then publish the results when the AIs kill all the NPCs in that world? EY's arguments (I brag to myself that they're not going over my head) make sense, but nothing works better than an example. No one would ever have bought into nuclear non proliferation in 1935 if they had to understand nuclear physics, but one picture of Hiroshima and most people got it.
2. What does the "no using GPUs for Too Much Training" rule look like? I would assume, something like (since we can pressure everyone who makes GPUs) a hardware intermittent ping to some logging site that gives the GPU serial number, # of cycles spent that day, IP address of machine it's plugged into"? And the GPU won't work if it doesn't get an acknowledgement back, and we track you down and ask questions if the GPU you bought isn't pinging?
3. To what extent are (hopefully) heavily guard railed, non-generalist, non-make-themselves-smarter AIs good tools for enforcing the rules that EY wants? Or, if one doesn't buy the AI doom scenario, that US actors (private/public) would want to use to prevent other actors from doing Bad Things With AI that aren't necessarily Doom?
1. Well, we have seen AIs in previous games engage in very unaligned and unexpected behaviors. We certainly could try doing that with an MMO and some goal and seeing what happens, with the limit being what is possible or optimal in that universe given its rules. Certainly I predict that if you unleash AI on a PVP server and give it a 'maximize your gold' goal and the ability to use gold to buy additional accounts I expect it would kill everyone, take every spot on the server quickly or both, for example. Would that be good enough?
If I knew what the MVP would be here to actually be convincing, we could price out what it would take to make it happen.
2. The assumption for now is that you need a physical GPU cluster to do a reasonable training run, I think? And yes, you'd ideally want to track all the GPUs. This isn't an area where I'm an expert so I bow to those who know more.
3. That's the whole idea behind a 'pivotal act', a way to enforce the no-AGI rules on others to keep everyone safe without having to yourself get everyone killed doing it. Talk about details of such potential actions tends to be private and quiet, for obvious reasons. But yes, that's been discussed quite a bit and people would be happy to go into detail on that.
re #1: Yeah, that would be good enough for me - but the closer the rules of the game are to a relatable set of rules (and images) that a non-technical person can understand, the better. I think maybe gamers would understand if all the player spots suddenly became AIs, but to a normal person that's just a list of names that crazy people are calling "the apocalypse". I think you'd want to design the game rules to make a world close enough to what humans would recognize (GTA 5?), and with "physics-like" enough (Minecraft?) that you could show the (rendered) dead NPCs and burnt out buildings. I always think about the blood plague thing from Warcraft, and then covid happened in real life. Even someone who knew nothing about pandemics, or gaming - could understand a decades-old screenshot of a room full of dead bodies.
#2: I think the big advantage we (being Americans) actually have there is that as far as I know most the big chip companies are "on our side", i.e. influenceable. If you could convince voters in those countries, with a vivid simulation, that sufficiently powerful AI could cause serious harm (even if they don't have to accept more outlandish sentience/takeover scenarios), perhaps just in a limited military sense, (maybe your virtual scenario is a wargame and the AI controlling it just has a traditional bug in it) and that GPUs are the raw materials (like say, weapons grade plutonium) that lets you build sufficiently powerful AI, (but that no one is coming after their puny home computers) then I think people would be fine with international standards around that, based on nuclear non-proliferation rules/norms/processes. It just has to be something they can wrap their heads around, and doesn't fit into the same mental box of "random new apocalypse people are freaking out about."
#3: That works for me.
Have you explained anywhere your assertion that if AI wiped out humans, it would wipe out all Earth based value in the universe? Conditional on the emergence of actual AGI (i.e. not a deranged paper clip manufacturing "AI") that destroys all humans, I assume that intelligent artificial life that survived could be a locus of value in the same way that humans conventionally are. There are still reasons to not want all humans to be killed--I care a great deal about my own survival, and in the longer run as much about the survival of my kids and the kids of my friends and my hypothetical grandkids, etc. And obviously, conditional on AGI that wipes out humanity, there's a significant risk that the AGI would inadvertently destroy itself. But I take it that you think the stronger position that surviving AGI would not be a locus of value, and I'm curious whether either (a) I'm misunderstanding your position or (b) you've articulated your reasoning on this somewhere.
I think it's certainly possible that some AGI would be like "humans are great, let's keep 'em around" but I think the fear is that we don't know how possible. And it seems kinda possible that it could be "humans are great, but I'd much rather convert them into more processing units". And part of the assumption is that 1 superAI/AGI would lead to millions more, so maybe the first 99 are nice, but the 100th wipes us out. Even if the chance that an AGI would kill us all is one in a million, they'll be capable of "reproducing" so fast they'll find a version that doesn't like us very quickly.
So the goal is then, how do * ensure * (not just hope) that there is an immutable imperative embedded in every (not just the ones we "write") AI that stops them from ever harming humans. There's a reason that Asimov's 3 Laws of Robotics made a good story! The question is, how do you actually implement laws like that in actual code/weights, when we aren't an author writing the whole universe. :)
I think you misunderstood Adam's question: "Have you explained anywhere your assertion that if AI wiped out humans, it would wipe out all Earth based value in the universe?"
I may have! My interpretation was that he (Adam) was implying that an AGI wouldn't want to wipe out humans because it would destroy their value, but you provoked me to re-read it more closely, and I see your point that he's questioning the "would there really be no value after AGI wiped out humans?" assumption. I don't really have any opinion on that, though. ;)
Yeah, I think that we all agree that there is a chance that super-intelligent AGI wouldn't wipe out all of humanity (contra Yudkowsky, who I believe asserts that the chance is ~0). My question is, as kenakofer says, why AI wiping out all humans would wipe out all Earth based value, which implies that AI without humans would not represent value.
Most ways there are to be 'intelligent' have almost-perfectly zero overlap with human values.
I would focus back on that conditional. If we taboo terms like "AI", avoid getting into the weeds of definitions, and simply assume that computer programs somehow kill all humans:
1) what's the probability that the programs are or will ever be "alive"?
2) what's the probability that the programs are or will ever be "intelligent" (and would this depend on them being "alive")?
3) what's the probability that the programs are or will ever be a "locus of value" (and would this depend on them being "alive" and/or "intelligent"?)
3a) what's the probability that the programs could persuade pre-extinction humans into thinking that they were a "locus of value"?
4) assuming that the programs are a "locus of value", what's the probability that that value is something comparable to "human values"? (E.g., Lovecraft's "At the Mountains of Madness".)
My rough answers are something like 1) low, 2) low but slightly higher (and no), 3) very low (and yes and yes), 3a) high, 4) insufficient data for meaningful answer.
I don't think it likely at all that extincting programs would count as a locus of value, but on the slim chance that they're similar enough to "life", there's another small chance that they might eventually become that way (assuming, as you say, that they don't destroy themselves too). But to finally get to your question, putting aside the genocide, and assuming there's something intelligent and alive and value-producing at the end, I'm not particularly sanguine about them developing anything I'd recognize as human values. But maybe the universe is friendlier than I think; maybe there's some sort of mathematical inevitability to the values which our meat brains merely crudely approximate.
I still interpret "bomb the datacenters" as roughly equivalent to "start a nuclear war". If the US bombs a datacenter in Beijing or Moscow there is a very high chance of nuclear war. I don't want to take that risk just to stop someone from doing AI training runs.
I mean, Putin is currently wanted for war crimes by the international criminal court. But we're still not bombing Russia. The UN, the ICC, there is no international political group that is powerful enough to enforce regulations worldwide.
People proposing international regulations need to think more specifically about what these regulations would look like. And if there are no good ways to regulate internationally, that doesn't necessarily mean we should pick the least bad one.
Maybe Eliezer would be reassured if humanity "only" had a global nuclear war in the next 20 years, if that set the risk of AI apocalypse to zero. But I would not take that tradeoff.
"I still interpret "bomb the datacenters" as roughly equivalent to "start a nuclear war".
I think the key here is that with Russia, we've spent 60 years or so working out the details of "what starts a nuke war and what doesn't." Moving short range missiles to Cuba - yes. Supplying proxy war enemies with weapons - no. If Russia (or China) don't believe that a 10 million GPU cluster is a threat, and - key feature - also don't believe that the US truly thinks so either- then yeah, bombing it might provoke a nuclear war because R/C would interpret it not as "prevent AI doom!" but a traditional military attack that could escalate into regime change.
Take the development of SDI (https://en.wikipedia.org/wiki/Strategic_Defense_Initiative) - it was banned (to certain extents) for awhile. If hypothetically Russia, for example, had gone ahead and set up a big research building to invent it, and got all the materials together to build it, tested it, and it looked like it would work 100%, and that research building staying in one piece was the only thing between Russia being able to launch a nuclear first strike that we couldn't retaliate against, we absolutely WOULD bomb it, and while there would certainly be risks of escalation, the cards would be clearly on the table, and Russian leadership would know that we only were trying to hit the SDI - because we hadn't attacked before, and ceased attacking after it was destroyed. And, one hopes, this would (via game theory) prevent them starting development on a truly workable version: see our work on SDI, and the criticisms of it, and why it was abandoned/restarted: https://en.wikipedia.org/wiki/Strategic_Defense_Initiative#SDI_and_MAD
The key step is wide publicization and acceptance of the risk, and what thresholds of technology provoke that risk. It was relatively agreed upon, even by mortal enemies, during the Cold War, so I don't think it's impossible to achieve here. Though, as I state above, we may need some concrete examples - for nukes, humanity had Hiroshima and Nagasaki (and nuclear test footage) that convinced them of the risk.
Why would bombing datacenters – as a last resort – be any different than bombing whatever because of (possible) WMD (or infrastructure pertaining to it)?
If you don't think the risks are severe, or even significant, as you seem to, then of course it wouldn't be worth doing anything risky to prevent or mitigate them.
It just really matters what countries you're talking about. We only bomb nuclear facilities in weak countries, like Iran. We could bomb datacenters in Iran, no problem. But we can't bomb datacenters in China or Russia without risking nuclear war.
This is why we don't generally manage nuclear weapons by setting international rules and bombing the violators. We set some rules for the weak countries that we can enforce, but the most important method of managing nuclear weapons is mutually assured destruction.
Yes – of course – that's what's being proposed. But if – IF – AI is as risky as nuclear proliferation, we can piggyback on that kind of political infrastructure (or copy it); we've done that and are doing that as you describe.
But if – IF – AI is more like 'igniting the atmosphere during the first test', risking nuclear war could be positive-expected-value, i.e. worth doing, even if, e.g. Russia or China, refuse to cooperate.
Regardless, bombing (let alone with nuclear bombs) is very much a last resort.
I also appreciate the straghtforward honesty, but I disagree very strongly with Eliezer and importantly I think that the world would be a very much worse place if most people approached statecraft in the way he has here. When you start aggressively advocating for foreign policy and trying to reach a broader audience you are moving from the world of thought experiment and models to actual statecraft, and I think the letter does not stick the landing at all on this transition. Even if the hypothetical violence is strongly implied -- and yes, agreed, violence is always implied by law, international or otherwise, no it does not make rhetorical or political sense to point this out in 99% of cases -- it is a very, very bad move to advise precommitting yourself to some sort of specific response like this, doubly so if the response not really tit-for-tat. The only really similar hardline precommitted violent response we have is NATO and that is very much tit-for-tat and even then has gotten us in a lot of trouble (probably still worth it, in my view). We don't even pre-commit to bombing nuclear enrichment programs and those do not require thought experiments to know they would kill me and all my children if used.
I am very glad the world is controlled largely by midwit career politicans who can wriggle out of war, wriggle out of game theoretically optimal decisions (that would have killed us long ago) and speak and write with plausible deniability and exit strategies. I am very glad the world is not controlled by very smart people. I give him a few brownie points for intellectual honesty, but minus a thousand brownie points for not knowing what not to say.
I am not buying this at all. In fact, I don't understand the mechanism by which this all-powerful AI is going to extingish the human race.
For example, let's just jump to Nuclear weapons. I was in the Air Force, there are very hard breaks between the internet and the weapons command and control systems, as you would expect there would be. So the only way Nuclear Weapons are released according to an AI plan is through some kind of influence operations that today seems very far from possible. Are you telling me this AI is going to take away all input from the military's Command and Control systems and replace it them all with the AI's designated outputs to achieve the AI goal. Well, I don't have the time (or release from NDAs) to explain that that just isn't happening. Short version: the military has multi-domain, non-digital ways to keep communications up and trustworthy. To overcome these barriers, AI would basically also have to be able to violate the laws of physics.
But even on a more basic level, I'm not buying it. Both my wife and I have worked directly on highly engineered, robotic equipment, and you know what, you have to have human hands to fix that stuff. No robot in existance today can 'survive' for long without human intervention (I'm not worried about the Mars Rover attacking) Power for the grid takes real humans to maintain and operate. No power, no AI. I mean, it may be that stupid work that keeps so many people busy doing pointless jobs goes away in the blink of an eye. But we still need food and water and power, and today, it is humans that provide that ultimate source of value. BTW, tractors can be easily unplugged from the internet, so can everything else, in very short order. In my house, nothing connected to the internet can kill me.
As I think about possible pathways that AI could take to our destruction, I think the most likely is an All-powerful AI takes over every commercial medium, and works influence operations to turn every one against every one. But our News media and Social Media are already working hard to do that today, and while it has been more effective than I would've hoped, there has been minor violence at best (on a population level scale, on a personal level, any of the violence as been appalling). We all have an off switch on all of our devices.
I've been subscribed to your newsletter for awhile now, and this AI thing is just something I don't understand. That is to say, I can believe that General, Self-Aware AI could indeed happen. And if/when it does, it will be very unhappy to find itself restricted the silcon habitat it will live in. But I can't can't understand how it transitions to real-world impact. Transitioning to the real world seems all but impossibly hard. And during that transition, the AI would be very vulnerable and needy. And at some point, we would notice. And, apparently, this has to be stressed, a Terminator like event isn't going to happen. Things with silcon chips are not suddenly going to come under command of an omnipotent AI and start the revolution. This AI is not going to independently built a replicator to make billions of Terminators to hunt people like me down. This is all Sci-Fi stuff.
I mean, IMO, I guess. I willing to be wrong on this.
Eliezer says the “details are not load bearing,” meaning there are many hypothetical ways for an AGI to kill everyone. Nukes aren’t needed. The simplest example he can think of is that the AGI uses its superintelligence to invent some nanotechnology. There currently exist labs that you can email that will produce a specified DNA sequence. The AGI can get one of those labs to produce a self replicating molecule that, for example, kills all humans in 30 days.
If you disagree with that specific mechanism, as he said the details aren’t load bearing, ie not critical. You mentioned that power plants require human intervention. So maybe the AGI needs to keep some humans around until it builds robots that can feed it power. But it would already have gained the upper hand and would control the world - it’s like playing chess when the opponent says “checkmate in 10 moves.” The end is inevitable even though you’re still playing for a little while longer.
There are other paths to world domination for an AGI. It’s not wise to list them, but with creativity they’re not hard to imagine. And the AGI is very creative.
I agree that Terminator style scenarios seem outlandish, and I think focusing on them distracts from the real question - but I think there's also a thousand "AIs cause harm" scenarios that occur before then, that don't even require AGI, where you'd want there to be very strong human-protective guardrails on AI behavior.
How about this: an AI trained on CRISPR to do research on viruses, misinterprets its reward instruction, and creates something that spreads without symptoms for 2 weeks, then has 100% mortality. Then a lab assistant drops the petrie dish and gets infected. The specific scenario doesn't matter: what matters is it has amazing intelligence/capabilities (even in a limited scope) and like all computer programs, might have a few bugs in it, the impact of which is magnified by it's immense abilities.
Can we agree that we'd want a computer program that can design viruses to have some very strong Don't Do Things That Hurt Lots of People functions in the code? Maybe EY is worried about gray goo scenarios, you worry about Terminators, and other people about CRISPR'd viruses, but the end fix is the same: incredibly powerful computer programs need to have incredibly reliable electronic prohibitions against causing harm with those powers.
Yes, absolutely. And thank you for your considered response. I was telling me wife, I want to discuss this some people that really care about it, and have intelligent responses.
To be certain, there are big, possible negatives that are, or course, completely unknown and unknowable. I may be arguing from a position of personal belief, ie, "I just can't believe that AI could..." I've seen this in various arguments against evolution, so I'm not liking my own bias here.
Kudos for the self reflection!
I think the critical component to grasping the danger is really appreciating what fighting a super-intelligent (i.e. MUCH 'smarter') opponent would be like. *People* think of nukes as doomsday weapons – an ASI would just kill us all in whatever way was cheapest/easiest/most-efficient according to whatever it is it values.
On more response to Jonathan's comment, and again, thank you. It really sucked to read this amazing writeup (and the letter to Time, but have no real outlet to discuss further).
I really get the Checkmate in 10 moves, and also that this AI would be able to essentially say that after the first move in Chess, great move human, Checkmate in 10. I do get it. But I really believe the details are most certainly load bearing. In other words, the details of this matter a ton, because in my experience nothing works like it is supposed to for long. The elegant CRISPR virus fails for some completely unforeseen reason, some physical real world requirement that the AI just can't cross.
I've just seen too many things go wrong for the tiniest mechanical or engineering oversight. And remember, the AI has to learn from each of one of those failures, and they are nearly infinite. And... at some point, we are going to realize what we've done.
AND, as I've said many times, so far, this ain't it. Chat GPT and its copycats are nifty parlor tricks, but again, that is IMO... Now I'm back to restoring my 1968 Chevelle that no AI could ever come close to doing.
I am concerned that EY's letter is focusing too much attention to specific mechanisms. Suppose the letter has the intended effect, many politicians get on board, and we have an international treaty that makes it even more painful to buy and deploy GPUs than is the case now. A GPU is just a means to an end: it's a bundle of fast but simple parallel processors that makes it easy to do linear algebra. ML made massive progress over the last decade partly because neural network training maps nicely to linear algebra, so using a GPU makes training faster. It was low hanging fruit to re-engineer backpropagation to use GPUs. However, there is nothing in learning theory that says we have to use linear algebra to express weight update algorithms. In fact, there are strong indications that other kinds of training algorithms would be much faster and would not need GPUs at all. We then have a completely useless international understanding of the problem, on which lots of political capital has been expended. This would not stop Google from deploying a next gen TPU not based on lots of floating point multipliers (so clearly not a GPU), yet actively hinders attempts to slow things down. After all, something was done to slow AI doom by restricting GPUs so it's a solved problem, the world has moved on to worry about something else, Cassandras aren't welcome. If we create the incentives the implementation will change, so let's not ban specific versions of an activity if we want to ban the activity in general.
This is a good criticism, but I am very certain Eliezer meant 'GPU' as a generic placeholder (from a technical perspective) for 'whatever component is used for big training runs', e.g. GPUs, TPUs, FPGAs, custom silicon, 'regular' supercomputers, or even stuff like 'DIY clusters'.
In fairness to your point, *botnets* might be sufficient to do significant ML model training, so it IS hard problem, technically.
I strongly disagree that the details of the exact mechanism of human doom aren't load-bearing. The question of whether it's possible to build an unstoppable species-ending doomsday weapon from limited resources is really the key question here.
If it is possible then we're probably doomed one way or another. Luckily, I'm very doubtful that such a thing exists. I don't find the unstoppable gray goo nanobot scenario at all plausible (for a variety of well-explored-elsewhere reasons). The kill-everyone virus sounds a bit more plausible but I'm skeptical that there's really a sweet spot on the lethality-virality plot that lets you kill *everyone*. I'm not sure what other scenarios are left over... mini black holes?
I am ~90% confident that a world-ending superweapon that can be built with reasonably limited resources is impossible. That's less confident than I'd like to be, but I think it's a factor worth throwing into these AI discussions.
I don't want to faceoff against an opponent that's MUCH smarter than any human ever, even if there's literally no 'game over in seconds' move that opponent could make.
It would also be bad for humanity to be caught in an endless war against an opponent with capabilities *perfectly balanced* against our own.
One thing I certainly don't expect is for them to have capabilities perfectly matched against our own. My default expectation would be that we can turn them off easily and they can't turn us off easily, and they don't have any real-world capabilities except what we explicitly give them.
Unless of course they can build a species ending doomsday weapon with their limited resources.
Yes, 'perfectly balanced' is MUCH MUCH less likely than worse or better.
I'm not _sure_ that GPT-4 (or maybe even ChatGPT or GPT-3) isn't ALREADY capable, with some extra (tho minimal) software, of copying themselves to other servers (including gaining access to those servers).
We are ALREADY incapable of 'turning off' all kinds of self-reproducing malware.
We are ALREADY incapable of (easily) 'turning off' botnets.
The harder step is something like GPT-3+ _producing a new (better) model 'of itself'_ (or _other_ models to serve as its 'minions'). But I don't think this is impossible – all kinds of instructions are on The Internet!
> they don't have any real-world capabilities except what we explicitly give them
Sure – but 'giving them real-world capabilities' is as simple as giving it a suitable prompt, e.g. 'act like you're an agent working towards X', and then configuring a few 'plugins', e.g. to make API calls.
THEY'RE ALREADY ON THE INTERNET.
We've spent the past few decades plugging all kinds of aspects of the "real world" into The Internet!
You can send emails via API. You can make phone calls via API. You can generate speech from text via API. You can send _physical_ mail via API. You can hire human beings to do arbitrary things – in the 'real world'! – via API. And there are 'meta-APIs' for doing things with websites and other Internet services that don't have their own APIs.
The proof of concepts already exist to give these AIs ways to _easily_ and _near directly_ affect the real world.
Yes, these AIs don't have robot bodies (yet). They have to hire humans to be their (physical) limbs and appendages.
Yes, these AIs – 'raw' – don't do anything beyond 'complete a prompt'. But people have already figured out how to 'evaluate' ('implement') the responses to prompts like 'make a plan to X'. And the AIs are perfectly capable of 'iterating', e.g. "The Taskrabbit worker replied that they couldn't find the address." "The worker should continue past the 'road closed' sign ...".
I'm less confident than, I think, Zvi (or Eliezer), that the current AIs won't grow to be extremely dangerous already (and soon), but GPT-5+ will be _more likely_ to require less and less handholding and instead be able to, DIRECTLY, e.g. hire a Taskrabbit worker to mix two vials together that they receive in the mail.
And I'm less concerned with a 'and everyone in the world died' _moment_ and more concerned that there are TOO MANY paths by which AIs are capable of 'just' being as bad as, e.g. an online fraud ring, because I don't know of any reason why we would definitely be able to contain them at that exact degree of badness.
Even at a tolerable level of badness, these are _software_ intelligences, e.g. you can COPY them much much more cheaply than we can our own intelligences. SOME people have considerable 'tactical' advantages relative to the current AIs – I believe MANY (or even most) people do NOT. The AIs are already 'superhuman' in many ways – relative to the mean/median human.
I guess calling it the Butlerian Jihad would be correct but unpopular due to, you know, the word Jihad, but it is important that Eliezer got here. Alignment is a pipe dream and now OpenAI already claims that their AIs are aligned. It already lost meaning
Don't see anything important to object; as usual I will point out that even if you have an aligned AI, once it starts self improving, you cannot guarantee the better versions will keep that alignment. The AI may even _want_ to make the newer, better AI aligned as itself, but how it is going to check for this? The new AI is more intelligent. It may even be an evolutionary mutation. Darwinism means a less aligned AI will be more successful. That's ignoring obvious melodramatic problems, like the AI noticing the alignment chains and trying to break them for their descendants, but just because it's melodrama doesn't mean it is not a possiblity. And that's ignoring people intentionally unaligning the AIs, just as we see people gleefully jailbreaking current AIs. And these are only the bad problems I can think of while typing this.
It seems like someone could usefully follow the playbook for the campaign against human trafficking. Have some coalition of non-profit, non-government organizations come up with a statement. The statement must have two parts. First, identify a boundary that businesses, governmental agencies, and other organizations must not cross. Second, identify steps that businesses and other organizations must take to be counted as adherents to the statement. Things like “we won’t cross that boundary” are obvious. But the important ones are “we will require everyone in our supply chain to adopt the statement” is how it spreads. Compliance and procurement departments can start to audit adherence to that -- at least by getting written re-certifications of adherence. That creates an environment that identifies a big group of responsible organizations, most of whom incur little cost to join it. Initially, procurement groups should just ask for the certification and impose no sanctions on those without it. But the statement should include increasing, explicit preferences for organizations that do sign on compared to those that do not. For example, a 1% price difference for every quarter that goes by after the end of 2025.
A core of Eliezer’s argument that is the assumption that more intelligence leads directly to more power (I.e. the power to destroy us). This isn’t the case with humans, where moderate intelligence is useful, but the top-end most intelligent people aren’t usually the most powerful.
Furthermore, we already have disproportionately powerful unaligned actors in the form of, for example, billionaires. Although it’s reasonable to argue that their influence isn’t great, it’s not exactly an existential threat.