As a native German speaker, this is somewhat funny because these names feel sooo different (one is the normal spelling of wood, one is a funny spelling of wood, one is the first part of the name of a children books character).
I understand that as a non-Germanic language speaker, these are just weird names with too many "t"s and "z"s, so totally understandable to mix them up (but maybe keep unusual names in your copy-paste buffer or in separate notes doc so you can easily paste them in without needing to worry about their spelling).
I listened to an interview with Hotz when he had his self-driving car company, and he sounded like a total BS hype man. Am I wrong, or is there just a big market for that?
I wish we had more people focusing on practical aspects of AI safety/doom prevention (which I think are more closely related than people seem to). One example: shouldn't we segregate AI cybersecurity capabilities from generalist LLM capabilities? The AI that understands various hacking techniques probably shouldn't be the same AI that can sweet talk people into doing things and (eventually) come up with and execute complex goals. We also probably need to 10X our game on cybersecurity in general since that's the field where bad actors will be trying to use AI to come up with new exploits and it's the logical starting point for an AGI to start creating problems.
I haven't heard him elsewhere so I can't speak to that, although he cited his own prediction of no self-driving cars so it would be odd if he was hyping his own self-driving car company. He wasn't playing hype man here, he was definitely participating for real in good faith.
It would be good in several distinct ways to segregate AI cybersecurity from general capabilities but it is a mix of extremely difficult (no one is teaching it that stuff on purpose, it follows from other things) and also not all that relevant to the outcome.
I'm curious about a couple of these points. Isn't including security content in training sets essentially training on it on purpose, by omission if nothing else?
Regarding whether cybersecurity is important to the outcome, isn't a key assumption that we won't be able to stop AGI because it will deceive us, spread, and get better before we can just "pull the plug"? Preventing it from spreading seems almost as useful as trying to make sure AI is honest. The two big weak points leading to spread seem to be hacking systems and persuading humans to help it. The first seems more tractable than the second, at least.
1. Security is not distinct from non-security. We could choose to ask 'is this high influence on security questions in the previous model?' or 'does this look relevant to security' and exclude high scores, but the model would still learn about security. The more capable/intelligent/powerful the system in question, the more it will be able to fill those gaps or still simply have enough info elsewhere.
2. In some scenarios you can stop some particular bad outcomes by pulling a plug, but spreading would mostly still happen from both deception/manipulation and simply being useful and desired. People make a lot of copies of Llama-2 and GPT-4. Not knowing about security in particular won't give us that much relevant affordance here, and any actually dangerous system would be able to rapidly get that affordance anyway in various ways. Thus, there's no initial 'breakout' necessary.
At some point, we’re assuming models will do things surreptitiously that their owners don’t want and don’t know about, regardless of who it is. This would be at the transition from fully under control to not quite under control, so I don’t think assuming limitless capacity to learn at that point is necessarily appropriate. It still seems like cybersecurity is core to these first steps regardless of the rest of our model of the future.
These debates seem very hard to make progress in. They are entertaining, but I really wish there was some way of forcing someone to answer a specific objection. That mostly didn't happen the last time Hotz debated, and didn't happen this time either. They are somewhat entertaining, but I really wish that they would spend a lot more time finding a specific crux, expressing that crux as a statement, and debating that single crux instead of going through everything from replicating bacteria to compute limits to decision theories. I think multiple subjects are too complex for a debate format
There seems to be an underlying problem of predicting the future mixed with the assumption that near term ASI capabilities being beyond current recognition. How do you debate "ASI will break our current conception of physics" when that's literally impossible for us to imagine, by its own terms?
This is an excellent example of a debate problem. I could pick at any of "assumption that near term ASI capabilities being beyond current recognition", "ASI will break our current conception of physics" "literally impossible for us to imagine"
I think this summarization is accurate; the debate wasn't really too insightful, but rather disheartening. Except that I can now calibrate what to expect from Hotz going forward. He can actually be quite insightful in computer science in general, and I expected him to have some actual arguments here, but he simply kept throwing whacks at Eliezer and framing it as some weird "narrative".
Worse still, he is very familiar with Eliezer's writing, and I can't believe he doesn't actually understand what Eliezer believes -- I really think Hotz is trying to *make it sound like* AGI is just not gonna be a problem, because... "not my problem" / "impossible" / "sci-fi" etc. Or to make it sound like he can just ""win"" this debate, and then, "see? Open source AI good! Yay Progress!"
(Maybe one good thing that comes out of this is to show all viewers that this is really what debating AGI scenarios looks like, even with the influential people, and yeah, I would agree that that's good.)
From Hots' interview on Lex Fridman, I got the impression that he would consider humanity being eclipsed by AI as "...Mission fucking accomplished" -- Saberhagen's "Goodlife" in the flesh.
It's a ridiculously big and unfair ask, but what I want from e/acc folks is to demonstrate comprehension of the key Yud/Bostrom/etc arguments. I am sincerely interested in good e/acc arguments, but instead these peeps just put forward ideas that have already been well explored. Then it is just a matter of: "can Yud condense years of LW canon on this topic into a 1 minute soundbite in a rhetorically convincing fashion" instead of a substantive debate.
Relatedly, my hunch is that LW is a highly erudite alignment echo chamber, because you really need to have your act together to put forward a non-laughable "why AGI will be okay for humans" argument - as in, have spent a formidable amount of time reading and thinking about it. I suspect this barrier to entry has a selection effect (as in, selects for doomer sympathy).
TLDR this discussion has a variance-bias problem. Not sure what can be done about that.
At a minimum, before you enter a debate like this, it is necessary to ask who has raised your objection before, what is the answer to that, and how do you respond in kind, etc. Not every argument on every topic and variation, but on the thing you plan to bring up. Standard debate prep.
It seems a more fundamental crux is that EY is asking his opponent's to prove a negative. "ASI will be able to ____" is unfalsifiable. EY has nearly infinite room within his own arguments to just waive away any concrete disagreement, because we don't know future capabilities.
Hotz seems to have been trying to catch EY in several logical contradictions - like "ASIs will solve alignment" -> "Alignment is solvable" -> "We can solve alignment" -> "ASI will be aligned." Hotz's argument there is that if ASI is alignable, then we should be able to figure it out, given a no-foom or long-foom scenario. If ASI is not alignable, then ASIs will not align and they will maintain a balance of power scenario (I picture the Great Powers era, or at worst the Cold War).
EY's response reads to me like "nuh-uh, ASI will be so much smarter than us that it can do things that would otherwise be impossible." How would anyone respond to that without simply stating their disagreement?
I've noticed that EY is very cagey about the timing aspect. If someone tries to build an argument about how long it would take for an ASI to build a capacity, the date gets moved back (sometimes dramatically - he says "within his lifetime" while previously indicating timelines closer to five years - though again he's cagey and avoids specific falsifiable claims). If someone says that means we have plenty more time to develop counters, then the date gets moved up and we're looking at a foom. EY was very cagey about foom here too, with it somehow both being unnecessary for doom and also a significant concern where an argument about having more time might mitigate our concerns.
I ultimately don't find EY's arguments convincing, because there's almost no way someone *could* ever prove them wrong, while not actually bothering to prove himself right. He can point in a direction of potential ASI capabilities, but by their nature he cannot prove them. His opponent's also point in a different direction of AI/LLM/ASI limitations (which to me read at similar levels of believable, since it's the same argument based on the same lack of concrete information) and EY's response is that they just don't understand ASI. Since nobody can claim actual knowledge of ASI, that seems to me a standstill with the onus on EY to prove something from his extraordinary claims. He seems to think his opponent's have the burden of proof, which seems quite contrary to normal debate.
I think that the timeline debate goes back to what he said at the beginning - it does not matter. Why would he waste his breath on debating the timeline, if he proposes that we stop any progress on AI. (my personal belief, as a climate activist, is that this is similar to climate change - you always end up arguing the timeline but WTF we know that it is happening and the costs of stopping it are now lower than in 10 years, so why debate the timeline...)
The debate should not therefore be about the timeline, but about the alignment possibility - again from what I understood he said that the alignment is not a computational or technical problem only, but it is a matter of a highly unified and intelligent force (i.e. not humanity) that understands that it is better to not enter a conflict unless that conflict can be resolved by completely overwhelming the opponent.
Argument: Humans cannot align to superadvanced AI models.
Because: Humans are not unified.
Because: Humans are not intelligent enough.
There you can argue right - and that is what Yudkowski was expecting right? He talked about "how much room for intelligence is there beyond biology"...
Also his other claim:
AIs are going to align among themselves, which could have formed (IMHO) the most fragile point in his argument. And this is where Hotz could explore further.
The main disagreement between e/acc and EY-esque people is on whether or not a universe ruled by AI is a good outcome or not. But of course, there isn't and cannot be an objective answer to that question.
Is that true of e/acc people? This is the spectrum I've seen, though I'm not sure all these categories would self-describe as e/acc.
*AGI universe is a good outcome*
- Joscha Bach-style e/acc: thinks AGI is a beautiful aesthetic upgrade to humanity, and also that self is a construct (which blurs the line between "us" and "them")
- We're-fucked-anyway e/acc: those who believe E[world with AGI] is higher for humans than E[world without AGI]
- Twitter edgelord e/acc: galaxy brained and suffering from a failure of future-self empathy / not really contending with the future they ostensibly want
*Humans will remain dominant*
- Andreessen-style e/acc: due to motivated reasoning and despite really solid arguments to the contrary, believes AGI is going to be great for humans
*How I learned to stop worrying and love AGI*
- Hotz-style: AGI is inevitable, we have a fighting chance, the cure is worse than the disease, and plus it is really interesting/exciting
An universe where everyone you ever know and love goes extinct(indeed, the concept of love itself going extinct) is not a worthwhile future except to madmen and nihilists.
Everyone I know and love will be extinct within the next 100 years at the latest, no matter what. And the concept of "love" has changed so much in the past 100 thousand years that it will probably morph beyond recognition in another 100 thousand years of AGI-free evolution.
A very "fake deep" message, like those of other extinctionists.
1) The Sun will probably consume the Earth in 5 billion years, making us all extinct. This does not justify your murder of me and my children, or vice versa. Also the fundamental assumption is incorrect given biological life extension research.
2) 60 million years ago, dinosaur fossil evidence shows maternal love, let alone widespread primate evidence.
3) Some futures are more valuable than others. As if it even needs to be said: a lifeless and loveless future is of exceedingly low value.
I think you're going to have these circular arguments every time because the crux of the matter is that the hypothetical Doom AI is supposed to do a thing that neither debater can predict, with capabilities that no one can comprehend. To convince someone of something, they have to comprehend that your argument is right - if they can't understand how a thing will happen, you can't convince them of it. Well, you can convince some people, if they accept the principle assumptions that will lead to it - which is why I think the AI Doom argument makes intuitive sense to some people, but almost everyone continually comes up with "this is why it won't happen!" - but that won't be enough. There's only one way to determine if, starting from assumptions X Y and Z, if an unforeseen event will occur, and that is to test it.
I don't think it's fair to call it an "unforeseen event" if people are currently being quite loud about what they foresee. You can have beliefs about things that are unpredictable! For instance, the climate is completely unpredictable in detail but you can still have beliefs about the trendline. It's quite common for things to be easier to predict in aggregate than in specific. Analogously, I agree it's impossible to predict "what superintelligence will do", in the same sense that it's impossible to predict what moves Stockfish will play, because it's simply too far above us. But in any Stockfish/Human game, I *can* quite reliably predict who will win. So too with Superintelligence.
that's fair, I guess I meant "the specifics would be unforeseeable enough" that it would be hard to persuade someone who is used to arguing over the specific details of an apocalyptic scenario. Your analogy to climate is good though, because people claiming "climate doom" do have a specific scenario in mind, with specific amounts of warming leading to higher sea levels and droughts, etc... and people who dispute "climate doom" often do so by claiming those specific things won't happen. This has the advantage that climate changes are relatively mechanical, and you can assert that they will happen due to physical processes. An AI that might cause "AI doom" would be by definition a thing able to make its own (unpredictable in their specifics) decisions, or unpredictable in the sense that they'd be "bugs." I think that requires a higher level of "unforeseen-edness" than most people are willing to accept.
Your analogy to Stockfish/Human is also good, and I think we should double down on it. Obviously AI is going to win it, we don't need to convince anyone the prediction is true we just run a few thousand games and see. Being able to run those tests is very persuasive. So then let's create more complex games - like climate models? - closer in shape to reality in a way that AI doom skeptics might accept, and see what the AI does in it.
Yep, that's what the ARC/red-teaming people are doing and I think it's a good idea. Though of course, given AI where you don't yet know how strong it is, "try to have it break out and take over just to see what happens" starts running into some snags viz. the cost of evidence-gathering.
How do you test Stockfish's strength if a Stockfish victory leads to human extinction?
yes, that is of course a risk. My suggestion is to do those tests in firewalled, sufficiently complex simulations (want to play that new MMO?) - sure, a super AI could break out, but my assumption is that it would successfully takeover/paperclip the less-complex-than-reality simulated/virtual world first, (and do it in an observable way first before a secret, hidden way) and then we pull the plug on that, and do a post-mortem to find out how it happened, and how to prevent it in the future.
Your only hope to defeat a future super-smart opponent is to beat its less-than-super-smart progenitor, then put sufficiently powerful restraints in its DNA/code that it will inherit.
Yep, that's the accelerationist take on AI safety: "We fundamentally can't develop this in any way but incremental, so let's take all the risks now because it'll only ever get harder over time." It's a bit defeatist but it's the strongest "go fast" argument I know. - Though of course if we can limit accelerator proliferation, or do anything else that increases the *global* runway, that's still a good thing in this model. Accelerationist AI safety doesn't *want* capabilities to grow, it just thinks it's inevitable. Anything that slows down the "baseline factor" (ie. compute capability) is still a win. In other words, this approach has two axes: "accelerate AGI capability given existing models" and "limit global capability to create or worsen overhangs, such as compute, datasets, etc.".
Personally, I think the overhang is already bad enough to make this approach unviable. If the less-than-smart progenitor can autonomously evolve into the super-smart opponent, you lose on the first try anyway.
I think I may have swung too far into that part of the argument, but the reason I bring up "run simulations with less complex AIs to see if they take over less complex worlds", was less as a method to guarantee AI safety (though I think it would help) and more "this would potentially establish hard(er) evidence that would actually help people like EY convince people like GH who are currently hung up on specific details and having these circular arguments."
What is frustrating is that this feels sufficient to me? If we can agree that the thing will have capabilities we can't comprehend, then that seems like sufficient evidence it will have sufficiently strong affordances, and we can move on to other questions, we don't need to know exactly what it can do at that point... and yet.
I am definitely in favor of the 'let the LLM+ program loose in the MMO' style experiments, as a demonstration.
I think it IS sufficient, for a certain set of person who thinks that way, and accepts the initial assumptions of the argument. (Not a value judgement on that either way). But other people (and I think a sufficient/necessary # of the people who need to be convinced are in that category) really do need some tangible example/experiment.
To use the stockfish or starcraft analogy, everyone 100% agrees that the AI wins that contest, because we ran thousands of tests that showed it. If everyone agreed that solving for human extinction was twice as complex as starcraft, and we've now got AI twice as complex as that one, then people would get it. I look at the "agreement gap" between the % of people who believe "AI wins at chess" and "AI wins at global resource reordering" and my brain thinks "they just need an intermediate example." Maybe I'm wrong, and eventually the theoretical arguments of EY will wear down enough people. But I worry that partisan affiliation with the theories is starting to sink in (similar to the early days of covid) and will make it harder, not easier.
It is worth noting that there were a huge number of people certain that AI would never be able to master Chess, or summarise a text, or any of the things AI can now do. Some have been genuinely persuaded as a result, but a lot of others just keep moving the goalposts.
I must have missed something important in the AI debate, because the following seems to dramatically change the debate: "34:00 Hotz asks about AIs rewriting their own source code. Yudkowsky says he no longer talks about it because it strains incredulity and you don’t need the concept anymore."
Why would AI not need to be able to rewrite its own code in order to foom? Or, as Hotz seemed to think EY was conceding, was this no longer an argument in favor of foom?
The current 'best' AIs, e.g. "frontier models", are _mostly_ "giant inscrutable matrices of floating point numbers" instead of 'old-school' AIs, e.g. 'expert systems', that _were_ (mostly) "source code", i.e. literal text files of source code for some specific programming language.
An AI doesn't need to "rewrite its own code" in order to foom because it could (plausibly) 'train' a successor in similar ways to how we train them now, i.e. NOT by writing "source code".
That brings up other concerns for alignment between AI models, of course, as the new AI would be inscrutable to the trainer the same way that our AIs are inscrutable to us. An AI re-writing its own code at least has some basis by which to understand itself, but a successor model would eliminate that connection.
It's not necessarily the case that a successor would be inscrutable to an AI that trained it, e.g. it might be better at 'interpretability' than we are for one. Understanding a large body of code is _also_ difficult, even if it's generally MUCH easier to understand individual lines and small pieces, e.g. functions. There are lots of TINY 'programs' that are provably impossible to 'understand' beyond running them and observing their outputs, tho I'd expect most 'real-world'/practical software and 'trained AI models' to generally NOT exhibit that exact 'pathology'.
i believe his name is “george hotz” (s/holtz/hotz/)
Damn it! Will fix.
You still have "Holz" in a few places.
Yes, another round of (s/Holz/Hotz) is needed.
As a native German speaker, this is somewhat funny because these names feel sooo different (one is the normal spelling of wood, one is a funny spelling of wood, one is the first part of the name of a children books character).
I understand that as a non-Germanic language speaker, these are just weird names with too many "t"s and "z"s, so totally understandable to mix them up (but maybe keep unusual names in your copy-paste buffer or in separate notes doc so you can easily paste them in without needing to worry about their spelling).
It’s “Landauer limit”
OK, that's a lot easier to fix, never trust your ears or spellcheck...
I listened to an interview with Hotz when he had his self-driving car company, and he sounded like a total BS hype man. Am I wrong, or is there just a big market for that?
I wish we had more people focusing on practical aspects of AI safety/doom prevention (which I think are more closely related than people seem to). One example: shouldn't we segregate AI cybersecurity capabilities from generalist LLM capabilities? The AI that understands various hacking techniques probably shouldn't be the same AI that can sweet talk people into doing things and (eventually) come up with and execute complex goals. We also probably need to 10X our game on cybersecurity in general since that's the field where bad actors will be trying to use AI to come up with new exploits and it's the logical starting point for an AGI to start creating problems.
I haven't heard him elsewhere so I can't speak to that, although he cited his own prediction of no self-driving cars so it would be odd if he was hyping his own self-driving car company. He wasn't playing hype man here, he was definitely participating for real in good faith.
It would be good in several distinct ways to segregate AI cybersecurity from general capabilities but it is a mix of extremely difficult (no one is teaching it that stuff on purpose, it follows from other things) and also not all that relevant to the outcome.
I'm curious about a couple of these points. Isn't including security content in training sets essentially training on it on purpose, by omission if nothing else?
Regarding whether cybersecurity is important to the outcome, isn't a key assumption that we won't be able to stop AGI because it will deceive us, spread, and get better before we can just "pull the plug"? Preventing it from spreading seems almost as useful as trying to make sure AI is honest. The two big weak points leading to spread seem to be hacking systems and persuading humans to help it. The first seems more tractable than the second, at least.
(to be clear, good questions)
1. Security is not distinct from non-security. We could choose to ask 'is this high influence on security questions in the previous model?' or 'does this look relevant to security' and exclude high scores, but the model would still learn about security. The more capable/intelligent/powerful the system in question, the more it will be able to fill those gaps or still simply have enough info elsewhere.
2. In some scenarios you can stop some particular bad outcomes by pulling a plug, but spreading would mostly still happen from both deception/manipulation and simply being useful and desired. People make a lot of copies of Llama-2 and GPT-4. Not knowing about security in particular won't give us that much relevant affordance here, and any actually dangerous system would be able to rapidly get that affordance anyway in various ways. Thus, there's no initial 'breakout' necessary.
At some point, we’re assuming models will do things surreptitiously that their owners don’t want and don’t know about, regardless of who it is. This would be at the transition from fully under control to not quite under control, so I don’t think assuming limitless capacity to learn at that point is necessarily appropriate. It still seems like cybersecurity is core to these first steps regardless of the rest of our model of the future.
Comma.ai outperforms Tesla's FSD as far as I know, so it's definitely not BS.
These debates seem very hard to make progress in. They are entertaining, but I really wish there was some way of forcing someone to answer a specific objection. That mostly didn't happen the last time Hotz debated, and didn't happen this time either. They are somewhat entertaining, but I really wish that they would spend a lot more time finding a specific crux, expressing that crux as a statement, and debating that single crux instead of going through everything from replicating bacteria to compute limits to decision theories. I think multiple subjects are too complex for a debate format
Strongly agree that this was too spread out, and any useful progress will require focus.
There seems to be an underlying problem of predicting the future mixed with the assumption that near term ASI capabilities being beyond current recognition. How do you debate "ASI will break our current conception of physics" when that's literally impossible for us to imagine, by its own terms?
This is an excellent example of a debate problem. I could pick at any of "assumption that near term ASI capabilities being beyond current recognition", "ASI will break our current conception of physics" "literally impossible for us to imagine"
I think this summarization is accurate; the debate wasn't really too insightful, but rather disheartening. Except that I can now calibrate what to expect from Hotz going forward. He can actually be quite insightful in computer science in general, and I expected him to have some actual arguments here, but he simply kept throwing whacks at Eliezer and framing it as some weird "narrative".
Worse still, he is very familiar with Eliezer's writing, and I can't believe he doesn't actually understand what Eliezer believes -- I really think Hotz is trying to *make it sound like* AGI is just not gonna be a problem, because... "not my problem" / "impossible" / "sci-fi" etc. Or to make it sound like he can just ""win"" this debate, and then, "see? Open source AI good! Yay Progress!"
(Maybe one good thing that comes out of this is to show all viewers that this is really what debating AGI scenarios looks like, even with the influential people, and yeah, I would agree that that's good.)
I have never understood how competing ASI is any better for humans than competing aerobic animals are better for anaerobic answers.
Ultimately it comes down to the crux that Hotz doesnt much value humanity, which is an something many do not realize a lot about these AI promoters.
If you think that present day humanity has no value in existing, of course you would push for one where humanity is extinct or marginalized in.
I think it's more nuanced than that, see Bryan Hanson's post on the subject: https://www.overcomingbias.com/p/to-imagine-ai-imagine-no-ai
Given that Robin Hanson wants all biological humans to be replaced by digital emulations, he is a terrible example to link to. He is an extinctionist.
From Hots' interview on Lex Fridman, I got the impression that he would consider humanity being eclipsed by AI as "...Mission fucking accomplished" -- Saberhagen's "Goodlife" in the flesh.
It's a ridiculously big and unfair ask, but what I want from e/acc folks is to demonstrate comprehension of the key Yud/Bostrom/etc arguments. I am sincerely interested in good e/acc arguments, but instead these peeps just put forward ideas that have already been well explored. Then it is just a matter of: "can Yud condense years of LW canon on this topic into a 1 minute soundbite in a rhetorically convincing fashion" instead of a substantive debate.
Relatedly, my hunch is that LW is a highly erudite alignment echo chamber, because you really need to have your act together to put forward a non-laughable "why AGI will be okay for humans" argument - as in, have spent a formidable amount of time reading and thinking about it. I suspect this barrier to entry has a selection effect (as in, selects for doomer sympathy).
TLDR this discussion has a variance-bias problem. Not sure what can be done about that.
At a minimum, before you enter a debate like this, it is necessary to ask who has raised your objection before, what is the answer to that, and how do you respond in kind, etc. Not every argument on every topic and variation, but on the thing you plan to bring up. Standard debate prep.
It seems a more fundamental crux is that EY is asking his opponent's to prove a negative. "ASI will be able to ____" is unfalsifiable. EY has nearly infinite room within his own arguments to just waive away any concrete disagreement, because we don't know future capabilities.
Hotz seems to have been trying to catch EY in several logical contradictions - like "ASIs will solve alignment" -> "Alignment is solvable" -> "We can solve alignment" -> "ASI will be aligned." Hotz's argument there is that if ASI is alignable, then we should be able to figure it out, given a no-foom or long-foom scenario. If ASI is not alignable, then ASIs will not align and they will maintain a balance of power scenario (I picture the Great Powers era, or at worst the Cold War).
EY's response reads to me like "nuh-uh, ASI will be so much smarter than us that it can do things that would otherwise be impossible." How would anyone respond to that without simply stating their disagreement?
I've noticed that EY is very cagey about the timing aspect. If someone tries to build an argument about how long it would take for an ASI to build a capacity, the date gets moved back (sometimes dramatically - he says "within his lifetime" while previously indicating timelines closer to five years - though again he's cagey and avoids specific falsifiable claims). If someone says that means we have plenty more time to develop counters, then the date gets moved up and we're looking at a foom. EY was very cagey about foom here too, with it somehow both being unnecessary for doom and also a significant concern where an argument about having more time might mitigate our concerns.
I ultimately don't find EY's arguments convincing, because there's almost no way someone *could* ever prove them wrong, while not actually bothering to prove himself right. He can point in a direction of potential ASI capabilities, but by their nature he cannot prove them. His opponent's also point in a different direction of AI/LLM/ASI limitations (which to me read at similar levels of believable, since it's the same argument based on the same lack of concrete information) and EY's response is that they just don't understand ASI. Since nobody can claim actual knowledge of ASI, that seems to me a standstill with the onus on EY to prove something from his extraordinary claims. He seems to think his opponent's have the burden of proof, which seems quite contrary to normal debate.
I think that the timeline debate goes back to what he said at the beginning - it does not matter. Why would he waste his breath on debating the timeline, if he proposes that we stop any progress on AI. (my personal belief, as a climate activist, is that this is similar to climate change - you always end up arguing the timeline but WTF we know that it is happening and the costs of stopping it are now lower than in 10 years, so why debate the timeline...)
The debate should not therefore be about the timeline, but about the alignment possibility - again from what I understood he said that the alignment is not a computational or technical problem only, but it is a matter of a highly unified and intelligent force (i.e. not humanity) that understands that it is better to not enter a conflict unless that conflict can be resolved by completely overwhelming the opponent.
Argument: Humans cannot align to superadvanced AI models.
Because: Humans are not unified.
Because: Humans are not intelligent enough.
There you can argue right - and that is what Yudkowski was expecting right? He talked about "how much room for intelligence is there beyond biology"...
Also his other claim:
AIs are going to align among themselves, which could have formed (IMHO) the most fragile point in his argument. And this is where Hotz could explore further.
“Then it is just a matter of: ‘can Yud condense years of LW canon on this topic into a 1 minute soundbite in a rhetorically convincing fashion’"
Somebody has to step up and do this for us to have any real hope. Unfortunately that’s not really EY’s strength, but hopefully somebody can and will.
Robert Miles has done a great job on the content front https://www.youtube.com/c/robertmilesai
The main disagreement between e/acc and EY-esque people is on whether or not a universe ruled by AI is a good outcome or not. But of course, there isn't and cannot be an objective answer to that question.
Is that true of e/acc people? This is the spectrum I've seen, though I'm not sure all these categories would self-describe as e/acc.
*AGI universe is a good outcome*
- Joscha Bach-style e/acc: thinks AGI is a beautiful aesthetic upgrade to humanity, and also that self is a construct (which blurs the line between "us" and "them")
- We're-fucked-anyway e/acc: those who believe E[world with AGI] is higher for humans than E[world without AGI]
- Twitter edgelord e/acc: galaxy brained and suffering from a failure of future-self empathy / not really contending with the future they ostensibly want
*Humans will remain dominant*
- Andreessen-style e/acc: due to motivated reasoning and despite really solid arguments to the contrary, believes AGI is going to be great for humans
*How I learned to stop worrying and love AGI*
- Hotz-style: AGI is inevitable, we have a fighting chance, the cure is worse than the disease, and plus it is really interesting/exciting
An universe where everyone you ever know and love goes extinct(indeed, the concept of love itself going extinct) is not a worthwhile future except to madmen and nihilists.
Everyone I know and love will be extinct within the next 100 years at the latest, no matter what. And the concept of "love" has changed so much in the past 100 thousand years that it will probably morph beyond recognition in another 100 thousand years of AGI-free evolution.
A very "fake deep" message, like those of other extinctionists.
1) The Sun will probably consume the Earth in 5 billion years, making us all extinct. This does not justify your murder of me and my children, or vice versa. Also the fundamental assumption is incorrect given biological life extension research.
2) 60 million years ago, dinosaur fossil evidence shows maternal love, let alone widespread primate evidence.
3) Some futures are more valuable than others. As if it even needs to be said: a lifeless and loveless future is of exceedingly low value.
I think you're going to have these circular arguments every time because the crux of the matter is that the hypothetical Doom AI is supposed to do a thing that neither debater can predict, with capabilities that no one can comprehend. To convince someone of something, they have to comprehend that your argument is right - if they can't understand how a thing will happen, you can't convince them of it. Well, you can convince some people, if they accept the principle assumptions that will lead to it - which is why I think the AI Doom argument makes intuitive sense to some people, but almost everyone continually comes up with "this is why it won't happen!" - but that won't be enough. There's only one way to determine if, starting from assumptions X Y and Z, if an unforeseen event will occur, and that is to test it.
I don't think it's fair to call it an "unforeseen event" if people are currently being quite loud about what they foresee. You can have beliefs about things that are unpredictable! For instance, the climate is completely unpredictable in detail but you can still have beliefs about the trendline. It's quite common for things to be easier to predict in aggregate than in specific. Analogously, I agree it's impossible to predict "what superintelligence will do", in the same sense that it's impossible to predict what moves Stockfish will play, because it's simply too far above us. But in any Stockfish/Human game, I *can* quite reliably predict who will win. So too with Superintelligence.
that's fair, I guess I meant "the specifics would be unforeseeable enough" that it would be hard to persuade someone who is used to arguing over the specific details of an apocalyptic scenario. Your analogy to climate is good though, because people claiming "climate doom" do have a specific scenario in mind, with specific amounts of warming leading to higher sea levels and droughts, etc... and people who dispute "climate doom" often do so by claiming those specific things won't happen. This has the advantage that climate changes are relatively mechanical, and you can assert that they will happen due to physical processes. An AI that might cause "AI doom" would be by definition a thing able to make its own (unpredictable in their specifics) decisions, or unpredictable in the sense that they'd be "bugs." I think that requires a higher level of "unforeseen-edness" than most people are willing to accept.
Your analogy to Stockfish/Human is also good, and I think we should double down on it. Obviously AI is going to win it, we don't need to convince anyone the prediction is true we just run a few thousand games and see. Being able to run those tests is very persuasive. So then let's create more complex games - like climate models? - closer in shape to reality in a way that AI doom skeptics might accept, and see what the AI does in it.
Yep, that's what the ARC/red-teaming people are doing and I think it's a good idea. Though of course, given AI where you don't yet know how strong it is, "try to have it break out and take over just to see what happens" starts running into some snags viz. the cost of evidence-gathering.
How do you test Stockfish's strength if a Stockfish victory leads to human extinction?
yes, that is of course a risk. My suggestion is to do those tests in firewalled, sufficiently complex simulations (want to play that new MMO?) - sure, a super AI could break out, but my assumption is that it would successfully takeover/paperclip the less-complex-than-reality simulated/virtual world first, (and do it in an observable way first before a secret, hidden way) and then we pull the plug on that, and do a post-mortem to find out how it happened, and how to prevent it in the future.
Your only hope to defeat a future super-smart opponent is to beat its less-than-super-smart progenitor, then put sufficiently powerful restraints in its DNA/code that it will inherit.
Yep, that's the accelerationist take on AI safety: "We fundamentally can't develop this in any way but incremental, so let's take all the risks now because it'll only ever get harder over time." It's a bit defeatist but it's the strongest "go fast" argument I know. - Though of course if we can limit accelerator proliferation, or do anything else that increases the *global* runway, that's still a good thing in this model. Accelerationist AI safety doesn't *want* capabilities to grow, it just thinks it's inevitable. Anything that slows down the "baseline factor" (ie. compute capability) is still a win. In other words, this approach has two axes: "accelerate AGI capability given existing models" and "limit global capability to create or worsen overhangs, such as compute, datasets, etc.".
Personally, I think the overhang is already bad enough to make this approach unviable. If the less-than-smart progenitor can autonomously evolve into the super-smart opponent, you lose on the first try anyway.
I think I may have swung too far into that part of the argument, but the reason I bring up "run simulations with less complex AIs to see if they take over less complex worlds", was less as a method to guarantee AI safety (though I think it would help) and more "this would potentially establish hard(er) evidence that would actually help people like EY convince people like GH who are currently hung up on specific details and having these circular arguments."
What is frustrating is that this feels sufficient to me? If we can agree that the thing will have capabilities we can't comprehend, then that seems like sufficient evidence it will have sufficiently strong affordances, and we can move on to other questions, we don't need to know exactly what it can do at that point... and yet.
I am definitely in favor of the 'let the LLM+ program loose in the MMO' style experiments, as a demonstration.
I think it IS sufficient, for a certain set of person who thinks that way, and accepts the initial assumptions of the argument. (Not a value judgement on that either way). But other people (and I think a sufficient/necessary # of the people who need to be convinced are in that category) really do need some tangible example/experiment.
To use the stockfish or starcraft analogy, everyone 100% agrees that the AI wins that contest, because we ran thousands of tests that showed it. If everyone agreed that solving for human extinction was twice as complex as starcraft, and we've now got AI twice as complex as that one, then people would get it. I look at the "agreement gap" between the % of people who believe "AI wins at chess" and "AI wins at global resource reordering" and my brain thinks "they just need an intermediate example." Maybe I'm wrong, and eventually the theoretical arguments of EY will wear down enough people. But I worry that partisan affiliation with the theories is starting to sink in (similar to the early days of covid) and will make it harder, not easier.
It is worth noting that there were a huge number of people certain that AI would never be able to master Chess, or summarise a text, or any of the things AI can now do. Some have been genuinely persuaded as a result, but a lot of others just keep moving the goalposts.
Yes, but I think there's a simulated example of AI doom out there that would persuade enough people, even if some continued to move goalposts.
I must have missed something important in the AI debate, because the following seems to dramatically change the debate: "34:00 Hotz asks about AIs rewriting their own source code. Yudkowsky says he no longer talks about it because it strains incredulity and you don’t need the concept anymore."
Why would AI not need to be able to rewrite its own code in order to foom? Or, as Hotz seemed to think EY was conceding, was this no longer an argument in favor of foom?
The current 'best' AIs, e.g. "frontier models", are _mostly_ "giant inscrutable matrices of floating point numbers" instead of 'old-school' AIs, e.g. 'expert systems', that _were_ (mostly) "source code", i.e. literal text files of source code for some specific programming language.
An AI doesn't need to "rewrite its own code" in order to foom because it could (plausibly) 'train' a successor in similar ways to how we train them now, i.e. NOT by writing "source code".
Ah, thank you, that does make sense.
That brings up other concerns for alignment between AI models, of course, as the new AI would be inscrutable to the trainer the same way that our AIs are inscrutable to us. An AI re-writing its own code at least has some basis by which to understand itself, but a successor model would eliminate that connection.
:)
It's not necessarily the case that a successor would be inscrutable to an AI that trained it, e.g. it might be better at 'interpretability' than we are for one. Understanding a large body of code is _also_ difficult, even if it's generally MUCH easier to understand individual lines and small pieces, e.g. functions. There are lots of TINY 'programs' that are provably impossible to 'understand' beyond running them and observing their outputs, tho I'd expect most 'real-world'/practical software and 'trained AI models' to generally NOT exhibit that exact 'pathology'.
This doesn't _seem_ particularly high value for people familiar with all of this to watch – no?
If you have read my summary and aren't excited to watch further, you probably shouldn't.