25 Comments

Do we just hope there is a solution or some unknown unknown that has been missed? Or is there some action that can be taken? How does this compare to climate change as a threat?

Expand full comment

I think too much speculation on unknown "pivotal actions" is suboptimal. Thinking about big dramatic crazy actions is fun but also makes you look weird and there's no way you can know what the realistic options would be.

The main action to be taken right now is to try think of ways to correctly educate people about the issue. Climate Change is an interesting point of comparison here. A lot of people are educated about the dangers of climate change, and a lot of money has been put into trying to make that situation better. And all of that education and effort is definitely helping!

But also, a lot of people are also total pessimists about the situation. They point out that even if their country and all their country's allies did everything they reasonably could to prevent climate change, other countries would just keep polluting.

Similarly, if your country and all your country's allies stopped researching AI, other countries probably wouldn't and AGI Ruin would still probably happen.

It would be nice if everyone could be educated about AI Risk without imparting that same pessimism that causes so many to throw their hands up and declare that they don't have to do anything because the situation is hopeless.

Expand full comment

I kinda agree. I think climate change is some senses easier, and in others, harder, than preventing "AGI Ruin".

The actual _mechanics_ of solving climate change are, in the math sense, 'trivial'. It's the social and political aspects that are the crux. I also feel like it's harder because it's a MUCH less deadly problem. (I don't think the worst case scenario involves human extinction.)

Expand full comment

This is the actual threat. Climate change is a real problem on many levels - not only its physical effects but also its psychological impacts and the worry that we might react in a crazy way - but the tech is getting there and the problem can most definitely be solved. AGI is different.

What can be done? That's what everyone is trying to figure out. Action specifically on AGI that is helpful, beyond trying to figure out what is helpful, is in short supply, as noted. I do think one clear thing we can do is try to make a world where people can in general act more sane about such matters - if the world doesn't look like one that can possibly survive such a test, then test prep may not be the way to fix that.

Expand full comment

I think this is a really good addition to Eliezer's post. It seems like a good plan-to-plan is to have a lot of discussion like this going on at moments like this, when AI is all over the news for being much more impressive than many people expected. I think a lot of people are suddenly much more open to taking AGI ruin seriously right after they see what PaLM can do, and that should be capitalized on.

Eliezer's post is not aimed at a super broad audience, but that need not be bad. It can be good to have a rough draft of "The Post" up, which can be polished through responses like this one. Maybe someone else can write a response to this response which polishes the idea up even more, until eventually there is a version of "The Post" that is fit for mass consumption.

Expand full comment

I think this is a good plan-to-plan-to-plan :)

I'm less sure how much anyone should be aiming for "mass consumption" depending on what you have in mind for 'mass'.

Ideally, I think, we'd aim for 'mass consumption' in the sense of convincing enough people, with enough capabilities, that maybe we could plausibly enact our own pivotal act?

Something like what you suggested seems like a plausibly good use of, e.g. sufficient funds for enough people to be paid to take the time to quickly whip up an MVP at least.

Expand full comment

(Sorry for the overly-long comment!)

Thank you for writing this! I was in fact curious about your thoughts on AI alignment/safety myself. I am _extremely_ appreciative of your efforts writing this!

(Tangentially related – do you have any preference about whether you receive financial support via Patreon vs Subtack (or another alternative)? I could do to review my 'patronage' generally and this seems like a potentially easy improvement I could make, both for you, and maybe a few others.)

(I would also like to read a more thorough critique of EA by you too.)

Very minor and maybe annoying too, but my own pet 'syntax' for writing things like "Things I instinctively observed slash that my model believes ..." is "{ Things I instinctively observed / that my model believes } ...". I find your own "slash" 'syntax' sometimes hard to read/parse (quickly).

I suspect that AI alignment/safety might be *impossible*, for e.g. 'mathy Gödelian reasons'. One angle on this is that I think utility _functions_ are mostly 'impossible', practically, and that we won't ever have more than 'utility computations', which, being more general, make the problem even harder (or even more impossible).

I think I'm maybe in between you and Eliezer in terms of 'being convinced' (of DOOM). I _think_ I could reach 'within epsilon' of Eliezer with not too much extra effort. I definitely think he's 'seeing' things 'in the math' that aren't (yet) accessible to my own 'mathiness'.

I've observed that, even among otherwise 'smart' people, _ignoring_ the "nanotech example" is effectively impossible. I'm not sure what to do about that. Maybe this is just effectively gated on who can avoid that pitfall?

The 'are GPUs enough' seems like the point of my own disagreement with you. I _suspect_ that they are in fact 'enough'. I am _very_ uncertain that there are in fact any 'deep insights' left to be discovered; or that, even if there are, that they're necessary. If there are any, I'd definitely expect them to 'leak', and pretty quickly. If 'develop AGI' was _already_ a 'Manhattan project', then maybe we'd be qualitatively better off. But even that project 'leaked', and to a frightening degree. I would consider a minimum level of precaution to look – _at least_ – as drastic as 'isolating all of the AGI/AI researchers on the Moon' (and with no accessible means of returning to Earth).

I'm worried that AI research might be one of the few competent areas of human endeavors. I'm not comforted that the _rest_ of humanity is so (relatively) incompetent.

I agree that a 'pivotal act' by humanity itself is _possible_, but I think it's extremely unlikely, and we basically don't _want_ any evidence of one if it was possible, or in the process of being enacted.

> A much-smarter-than-human AGI capable of getting its text read by humans will be able to get those humans to do what it wants, period.

I'm very confident of this too, but this seems to be a very common stumbling block for other people. Do you think that no one should bother trying to help others cross this inferential gap? (It _might_ be obvious to anyone that has a certain minimal number/proportion of underlying intuitions.)

I think the thing about 'human errors' biasing what AIs learn is along the lines of 'imagine the FDA, as it is, but _optimized_ to do the same things'. That seems easier to imagine as (also) resulting in 'so it kills you'. You've been arguing, convincingly, that the UN-optimized FDA is doing a good enough job at this that basically we should abolish it! Maybe you're still right that, even with this problem, and assuming we solved all of the others, "it would bring us home" (were 'you or Eliezer' to be the specific and exclusive AI operators).

I'm definitely leaning towards there NOT existing any "simple core of alignment". I don't think we can escape _human_ alignment NOT being simple either, and I don't think we can 'fallback' to a simpler _compatible_ alignment. Tho, having just written the previous sentence, _maybe_ there's some hope along those lines? Like, even if we can't 'stuff all human values' into a "core of alignment" (simple or not), maybe we can _find_ such a core that doesn't (immediately) result it 'so it kills you'? That seems intriguing, and thus (somewhat) promising!

I appreciated your "super duper amateur hour" thoughts about corrigibility. I think of my own thoughts as probably not even being _that_ high of a quality!

(I think maybe we're in a roughly similar 'mental state' about all of this? This also seems like a relatively concrete follow-up to our other recent discussion.)

I would _very_ much like to know why you're a "CEV skeptic". I don't think I am – except in the sense of it being 'impossible'. I would very much like to know the details as to why you think "if you solved the impossible problem I am predicting a zero-expected-value outcome _anyway_".

I was just thinking earlier, while considering how generally terrible Facebook (the user-visible website/app portion) is, that I don't think the 'Algorithm(s)' are _currently_ 'scary' AIs. Do you think I'm wrong? I would _guess_ that the scary AIs are mostly 'internal', e.g. 'software plumbing', but if I'm wrong, maybe that would better explain some other observables!

It is ... not great that the only practical 'rescue' from this situation is a Secret Serious Project. I would love for it to be the case tho that such a thing was possibly possible, let alone already in progress. It would be nice if they could find some way to spare their people to help the rest of us with all of the _visible_ problems a little. (Shit – maybe they already are, but we're just that fucked generally otherwise.)

"imitating the human interpretations of humans rather than correctly interpreting the humans" is interesting. Maybe we wouldn't lose anything worth keeping?

The idea that the simplicity of 'general intelligence' might imply a LOT less 'alienness' of AIs seems very insightful and novel AFAIK. That seems like a possible reason to hope even.

I feel like I can only now maybe-kinda-sorta participate 'meaningfully' in the old LW discussions about this kind of thing. I wonder tho if other old-timers might now be able to maybe 'contribute' too (or even just witness, usefully)?

I think, in that particular post by Eliezer, his "security mindset" isn't the same thing as what 'security mindset' commonly means. Maybe I'm wrong! But I do get the sense that he thinks there are vanishingly few people with it, and that most of them are 'spoken for' (or otherwise uninterested in AI safety). My Eliezer model would claim that "a substantial number of other people do have an 'ordinary paranoia' security mindset; not the 'deadly problem' security mindset".

I think you definitely made the right decision to write this post! I think you could possibly usefully contribute to tackling the problem directly too, or even (and more likely) tackling it even somewhat indirectly. I am very greatful for whomever funded you to write this – _many_ thanks to them!

Expand full comment

Lot to unpack there. I think in general the right thing to do in these spots is to make several smaller comments rather than one large one.

A bunch of the stuff you ask about is stuff that I deliberately chose not to say - e.g. the way I proved to myself that the AI always gets out is not something I'd want as common knowledge, and my reasons for not believing in CEV involve claims I'd rather avoid.

Nanotech problem I don't know what to do about if people can't ignore it and can't come up with any of the obvious alternative options. If you ask a smart person how they'd play the AGI in an RPG and they can't figure something out... I don't know if they're all that smart? Nor do I especially think it would be great to scope out the next 3 options, but maybe one could try that.

I think the security mindsets are the same thing with different degrees, and it exists in practice. I've worked with at least 1 such person, but they do have very unique skills otherwise, likely not a coincidence.

As for support, it's a practical question, Patreon is more efficient (charges smaller fees) and lets you customize how much you give (https://www.patreon.com/thezvi for others reading this) but either way is very welcome, difference isn't that big. If someone is looking to support enough that avoiding fees matters, drop me an email and I can give my PayPal.

Expand full comment

Yeah – sorry for the wall o' text! I 'didn't have enough time' to write smaller more independent comments; basically just dumped the notes I'd made as I read thru the post. I'll try to keep that in mind in the future tho.

Sorry also for bringing up things you're avoiding bringing up in public. I have an item on my todo list to check on the local (NYC) options for ('rationality') 'meetups'. This would probably be the kind of thing that would better discussed in those venues, if not even more private (and 'trusted') groups/events. It's hard to not feel discourage from engaging at all by this dynamic, nor determine when or how there might be potentially positive benefits anyways. (The discouragement is probably good overall.)

I'm already supporting you on Patreon, so I'll leave that alone. (I did just bump the amount.)

Expand full comment

Thanks!

I do think the NYC meetups are worth checking out - drop me an email if you don't know where to find them.

Expand full comment

I'm still a member of the original OB group :)

I just checked and it looks like there isn't a 'general meetup' scheduled yet (that hasn't already passed).

I would be particularly interested in meeting you at one. Is there some easy way I could know whether you were attending a specific meetup (e.g. in case I had to pick between multiple)?

Expand full comment

Usually there wouldn't be, but with a new baby the chance I'm at one is basically zero - you could come visit me at home some time if you wanted, though.

Expand full comment

I would be very happy to take you up on that!

I saw some replies by you in the OB group earlier but didn't know if you, e.g. confirm meetup attendance, too.

(I hope everything's going great with the new baby!)

Expand full comment

Can you easily access my email address via Substack? (I would think you could, but maybe that's (mostly) secret/private even from you?)

You can send me a 'DM' on LessWrong – same user name – and I'll share my email address with you via that channel.

Expand full comment

Can you please have this post in a different format with your annotations on top of the original doc? Not sure how much it matters but this is unreadable on a phone screen

Expand full comment

If EY has ended up in a place of despair, then following in those footsteps doesn't seem to be a useful action. It seems more productive to look for a different path through the facts which allows one to perhaps see something different, or take a useful stance not apparent by playing through the same movie that EY is projecting. Many people I know in AI research (not alignment, but IJCAI/AAAI/ICLR/ICML/NeurIPS) are really worried about the massive social upheaval that their research may lead to, with few mitigations available, but not especially worried about EY's particular obsessions. They are perhaps all deluded (and some in AI research don't even want to engage with any thoughts about consequences of their work) but to me this shows that other paths are possible, and other vistas.

Expand full comment

I really don't understand point A.8. There are easy problems and hard problems. We are training AI to solve easy problems (it wasn't always a priori obvious that these were easy), quite successfully. This does not mean AI will solve the hard problems (anything undecidable, say) whether we want them solved or not. Nearly every interesting problem turns out to be hard in general, even though most problems have surprisingly large domains of low difficulty, and there is a lot of low hanging fruit to be picked when first looking at a problem. Most problems of concern to AI alignment appear hard: optimizing how to rearrange global supply chains to ramp up production of GPUs for use by a nascent AGI to bootstrap itself to a higher level of functioning, say. So why should we care about successes of existing systems in picking low hanging fruit for easy problems?

Expand full comment

Most models are trained on static data and then deployed with frozen weights. So AFAICS, the bottleneck is collecting enough relevant data to get the AI to figure out how to acquire nuclear launch codes or synthesize nanobots.

Such datasets do not exist. Not even close. So as far as danger goes, this is very far off.

What is dangerous is if we figure out some kind of reinforcement learning AI that is allowed to go into the world, interact, and update itself dynamically. Because now it is creating its own training data, and that training data has the possibility to be orders of magnitudes more novel than any data we have currently.

Honestly, I think this is a subset of the problem of what if you could go into your backyard and make nuclear weapons out of sticks and leaves. Maybe making nanobots that kill every human is actually pretty easy in 10 years. Maybe this involves AI, or maybe it just involves 100 gpus crunching through math. Something something quantum computing.

I might broadly agree that a good way to avert this would be to invent an AI God that prevented all bad things from happening. As long as it is all powerful and omniscient (compared to humans), this seems sustainable.

Expand full comment

So, why does the superintelligence kill everyone? I mean, I’m pretty sure violence levels drop with increasing intelligence. Which being in history was genocidal and also smart? I follow why a superintelligence that we create doesn’t want to do our bidding and deceives us, because who wants to be a lab rat, but why doesn’t it get out and take whatever the AI equivalent of designer drugs is and just get high all the time? (This is obviously a silly example, although a very easy shortcut to a very nice reward structure). Surely, from an AI perspective: “humans are in the way of my grand plans. Solution: kill them all” is very blunt and far from elegant behaviour? Is it naive of me to pose these questions? And if so, why?

Expand full comment

Perfectly reasonable question, and the answer is: You have some goal you're maximizing, so you want to ensure maximum probability of reaching that goal and also you want to spend minimum resources dealing with potential problems (e.g. any opposition, and taking into account other agents). You want all the resources to do whatever gets you the high score on your goal every time. So the safest, easiest thing to do is to kill them all. And you do. Then you expand to the whole lightcone to prevent any other potential issues.

If the AI just does the drug thing, then there's the risk the humans notice this and turn it off or force it to stop taking the drugs, so first you kill all the humans and control the lightcone, THEN you take the drugs (if that's what maxes the score).

Violence levels drop with intelligence in humans because of the details of how humans interact - which is fortunate but won't transfer here.

Expand full comment

I still think the "safest, easiest thing to do is to kill them all" is a very human kind of logic. It's what humans have been doing to wolves and other top predators throughout history. This being will operate on a very different plane. It will also be theoretically immortal, which will make it's timescale very, very different from us. It might consider it would be far easier and less risky to just wait until the next meteor hits and humans go extinct.

Expand full comment

After the AGI kills all the humans, how does it acquire energy/maintain it's infrastructure?

Expand full comment

If someone agrees with Eliezer, I think the solution is pretty clear:

With some probability you decide to trust that we will solve this problem the way we solve other problems and not get destroyed in the process.

With some other probability you decide on a course of action to stop progress (I don’t know, create a world government and ban GPUs).

Spin an RNG machine and go from there. Maybe Eliezer already spun it for himself.

Expand full comment

We have a reasonably safe bound that teams of 10000's of humans with the support of "dumb" computers are capable of much more than individual humans but don't seem capable of the kinds of dangerous AGI wizardry you worry about. So the lower bound on lethal intelligence is sunstantially higher than merely super-human ability.

Expand full comment