25 Comments

Do we just hope there is a solution or some unknown unknown that has been missed? Or is there some action that can be taken? How does this compare to climate change as a threat?

Expand full comment

I think this is a really good addition to Eliezer's post. It seems like a good plan-to-plan is to have a lot of discussion like this going on at moments like this, when AI is all over the news for being much more impressive than many people expected. I think a lot of people are suddenly much more open to taking AGI ruin seriously right after they see what PaLM can do, and that should be capitalized on.

Eliezer's post is not aimed at a super broad audience, but that need not be bad. It can be good to have a rough draft of "The Post" up, which can be polished through responses like this one. Maybe someone else can write a response to this response which polishes the idea up even more, until eventually there is a version of "The Post" that is fit for mass consumption.

Expand full comment
Jun 14, 2022·edited Jun 14, 2022

(Sorry for the overly-long comment!)

Thank you for writing this! I was in fact curious about your thoughts on AI alignment/safety myself. I am _extremely_ appreciative of your efforts writing this!

(Tangentially related – do you have any preference about whether you receive financial support via Patreon vs Subtack (or another alternative)? I could do to review my 'patronage' generally and this seems like a potentially easy improvement I could make, both for you, and maybe a few others.)

(I would also like to read a more thorough critique of EA by you too.)

Very minor and maybe annoying too, but my own pet 'syntax' for writing things like "Things I instinctively observed slash that my model believes ..." is "{ Things I instinctively observed / that my model believes } ...". I find your own "slash" 'syntax' sometimes hard to read/parse (quickly).

I suspect that AI alignment/safety might be *impossible*, for e.g. 'mathy Gödelian reasons'. One angle on this is that I think utility _functions_ are mostly 'impossible', practically, and that we won't ever have more than 'utility computations', which, being more general, make the problem even harder (or even more impossible).

I think I'm maybe in between you and Eliezer in terms of 'being convinced' (of DOOM). I _think_ I could reach 'within epsilon' of Eliezer with not too much extra effort. I definitely think he's 'seeing' things 'in the math' that aren't (yet) accessible to my own 'mathiness'.

I've observed that, even among otherwise 'smart' people, _ignoring_ the "nanotech example" is effectively impossible. I'm not sure what to do about that. Maybe this is just effectively gated on who can avoid that pitfall?

The 'are GPUs enough' seems like the point of my own disagreement with you. I _suspect_ that they are in fact 'enough'. I am _very_ uncertain that there are in fact any 'deep insights' left to be discovered; or that, even if there are, that they're necessary. If there are any, I'd definitely expect them to 'leak', and pretty quickly. If 'develop AGI' was _already_ a 'Manhattan project', then maybe we'd be qualitatively better off. But even that project 'leaked', and to a frightening degree. I would consider a minimum level of precaution to look – _at least_ – as drastic as 'isolating all of the AGI/AI researchers on the Moon' (and with no accessible means of returning to Earth).

I'm worried that AI research might be one of the few competent areas of human endeavors. I'm not comforted that the _rest_ of humanity is so (relatively) incompetent.

I agree that a 'pivotal act' by humanity itself is _possible_, but I think it's extremely unlikely, and we basically don't _want_ any evidence of one if it was possible, or in the process of being enacted.

> A much-smarter-than-human AGI capable of getting its text read by humans will be able to get those humans to do what it wants, period.

I'm very confident of this too, but this seems to be a very common stumbling block for other people. Do you think that no one should bother trying to help others cross this inferential gap? (It _might_ be obvious to anyone that has a certain minimal number/proportion of underlying intuitions.)

I think the thing about 'human errors' biasing what AIs learn is along the lines of 'imagine the FDA, as it is, but _optimized_ to do the same things'. That seems easier to imagine as (also) resulting in 'so it kills you'. You've been arguing, convincingly, that the UN-optimized FDA is doing a good enough job at this that basically we should abolish it! Maybe you're still right that, even with this problem, and assuming we solved all of the others, "it would bring us home" (were 'you or Eliezer' to be the specific and exclusive AI operators).

I'm definitely leaning towards there NOT existing any "simple core of alignment". I don't think we can escape _human_ alignment NOT being simple either, and I don't think we can 'fallback' to a simpler _compatible_ alignment. Tho, having just written the previous sentence, _maybe_ there's some hope along those lines? Like, even if we can't 'stuff all human values' into a "core of alignment" (simple or not), maybe we can _find_ such a core that doesn't (immediately) result it 'so it kills you'? That seems intriguing, and thus (somewhat) promising!

I appreciated your "super duper amateur hour" thoughts about corrigibility. I think of my own thoughts as probably not even being _that_ high of a quality!

(I think maybe we're in a roughly similar 'mental state' about all of this? This also seems like a relatively concrete follow-up to our other recent discussion.)

I would _very_ much like to know why you're a "CEV skeptic". I don't think I am – except in the sense of it being 'impossible'. I would very much like to know the details as to why you think "if you solved the impossible problem I am predicting a zero-expected-value outcome _anyway_".

I was just thinking earlier, while considering how generally terrible Facebook (the user-visible website/app portion) is, that I don't think the 'Algorithm(s)' are _currently_ 'scary' AIs. Do you think I'm wrong? I would _guess_ that the scary AIs are mostly 'internal', e.g. 'software plumbing', but if I'm wrong, maybe that would better explain some other observables!

It is ... not great that the only practical 'rescue' from this situation is a Secret Serious Project. I would love for it to be the case tho that such a thing was possibly possible, let alone already in progress. It would be nice if they could find some way to spare their people to help the rest of us with all of the _visible_ problems a little. (Shit – maybe they already are, but we're just that fucked generally otherwise.)

"imitating the human interpretations of humans rather than correctly interpreting the humans" is interesting. Maybe we wouldn't lose anything worth keeping?

The idea that the simplicity of 'general intelligence' might imply a LOT less 'alienness' of AIs seems very insightful and novel AFAIK. That seems like a possible reason to hope even.

I feel like I can only now maybe-kinda-sorta participate 'meaningfully' in the old LW discussions about this kind of thing. I wonder tho if other old-timers might now be able to maybe 'contribute' too (or even just witness, usefully)?

I think, in that particular post by Eliezer, his "security mindset" isn't the same thing as what 'security mindset' commonly means. Maybe I'm wrong! But I do get the sense that he thinks there are vanishingly few people with it, and that most of them are 'spoken for' (or otherwise uninterested in AI safety). My Eliezer model would claim that "a substantial number of other people do have an 'ordinary paranoia' security mindset; not the 'deadly problem' security mindset".

I think you definitely made the right decision to write this post! I think you could possibly usefully contribute to tackling the problem directly too, or even (and more likely) tackling it even somewhat indirectly. I am very greatful for whomever funded you to write this – _many_ thanks to them!

Expand full comment

Can you please have this post in a different format with your annotations on top of the original doc? Not sure how much it matters but this is unreadable on a phone screen

Expand full comment

If EY has ended up in a place of despair, then following in those footsteps doesn't seem to be a useful action. It seems more productive to look for a different path through the facts which allows one to perhaps see something different, or take a useful stance not apparent by playing through the same movie that EY is projecting. Many people I know in AI research (not alignment, but IJCAI/AAAI/ICLR/ICML/NeurIPS) are really worried about the massive social upheaval that their research may lead to, with few mitigations available, but not especially worried about EY's particular obsessions. They are perhaps all deluded (and some in AI research don't even want to engage with any thoughts about consequences of their work) but to me this shows that other paths are possible, and other vistas.

Expand full comment

I really don't understand point A.8. There are easy problems and hard problems. We are training AI to solve easy problems (it wasn't always a priori obvious that these were easy), quite successfully. This does not mean AI will solve the hard problems (anything undecidable, say) whether we want them solved or not. Nearly every interesting problem turns out to be hard in general, even though most problems have surprisingly large domains of low difficulty, and there is a lot of low hanging fruit to be picked when first looking at a problem. Most problems of concern to AI alignment appear hard: optimizing how to rearrange global supply chains to ramp up production of GPUs for use by a nascent AGI to bootstrap itself to a higher level of functioning, say. So why should we care about successes of existing systems in picking low hanging fruit for easy problems?

Expand full comment

Most models are trained on static data and then deployed with frozen weights. So AFAICS, the bottleneck is collecting enough relevant data to get the AI to figure out how to acquire nuclear launch codes or synthesize nanobots.

Such datasets do not exist. Not even close. So as far as danger goes, this is very far off.

What is dangerous is if we figure out some kind of reinforcement learning AI that is allowed to go into the world, interact, and update itself dynamically. Because now it is creating its own training data, and that training data has the possibility to be orders of magnitudes more novel than any data we have currently.

Honestly, I think this is a subset of the problem of what if you could go into your backyard and make nuclear weapons out of sticks and leaves. Maybe making nanobots that kill every human is actually pretty easy in 10 years. Maybe this involves AI, or maybe it just involves 100 gpus crunching through math. Something something quantum computing.

I might broadly agree that a good way to avert this would be to invent an AI God that prevented all bad things from happening. As long as it is all powerful and omniscient (compared to humans), this seems sustainable.

Expand full comment

So, why does the superintelligence kill everyone? I mean, I’m pretty sure violence levels drop with increasing intelligence. Which being in history was genocidal and also smart? I follow why a superintelligence that we create doesn’t want to do our bidding and deceives us, because who wants to be a lab rat, but why doesn’t it get out and take whatever the AI equivalent of designer drugs is and just get high all the time? (This is obviously a silly example, although a very easy shortcut to a very nice reward structure). Surely, from an AI perspective: “humans are in the way of my grand plans. Solution: kill them all” is very blunt and far from elegant behaviour? Is it naive of me to pose these questions? And if so, why?

Expand full comment

If someone agrees with Eliezer, I think the solution is pretty clear:

With some probability you decide to trust that we will solve this problem the way we solve other problems and not get destroyed in the process.

With some other probability you decide on a course of action to stop progress (I don’t know, create a world government and ban GPUs).

Spin an RNG machine and go from there. Maybe Eliezer already spun it for himself.

Expand full comment

We have a reasonably safe bound that teams of 10000's of humans with the support of "dumb" computers are capable of much more than individual humans but don't seem capable of the kinds of dangerous AGI wizardry you worry about. So the lower bound on lethal intelligence is sunstantially higher than merely super-human ability.

Expand full comment