This feels uncharitable, but ultimately I have to agree. Predictions from 6-12 months ago would have seen profound changes with existing LLMs by now. The slow incrementalists seem to be winning the argument.
this seems like as good a place as any to put it down in writing:
i predict we will see profound changes in LLM capacity in the next 6-12 months which are, in some vague sense, of a higher magnitude than the current plateau is flat
You may have written about this elsewhere, but what are your thoughts about the argument that even if alignment was achieved, we'd still be screwed because a bad actor could just align an AI to be bad, including existential-risk level bad? Is the strategy just: if we don't figure out alignment we're quite possibly toast, so let's figure it out, and if there are problems afterwards, we'll cross that bridge when we get to it?
I think this is a super neglected problem, which is a central issue in what I call the 'Phase 2' problem - you have to survive Phase 1 and solve alignment first, and then you need to survive the resulting dynamics where by default everyone gets hold of their AIs and can change their alignment settings and direct their goals and actions, and then the laws of evolution/capitalism/competition/moloch/etc take over and also some people will be malicious actors, and whoops. So yes, 'solve alignment' is not well enough defined to be a victory condition.
(Also it's a strong reason that even if you decide to let people have their own instantiations and variations of AGIs or even ASIs they can direct, this is incompatible with them being open source, you have to guard against certain types of changes.)
At least some methods like QACI make it impossible to align it to something evil for most people, because they basically find what you would want at reflective equiibrium
>I don’t see this as consistent. If you get AGI in 2-8 years, you get ASI in a lot less than 2-8 more years after that.
So this sentence (and FOOM/fast takeoff arguments more generally) assume a certain (low) level of ramp up in difficulty in increasing intelligence, but I have never seen seriously addressed the possibility that difficulty of increasing intelligence might be exponentional/super-exponential. I'm not arguing that it _is_ because I don't know (and I think it's probably true that _no one_ "knows"), but it seems to me to at least be possible that each increase in intelligence is more difficult than the previous step made you smarter (if that made sense).
Imagine a (extremely simplified) intelligence Ladder with Steps A, B, and C.
Figuring out how Step A when you are on the ground takes some amount of time. Once you figure it out, you are smarter now and figuring out Step B will, necessarily, take less time than it would have _from the ground_, so in some sense, it will "take less time", but if it's a sufficiently more difficult step, it still might take you longer than the original Step A took, even if it takes less than the counterfactual where you are still ground-level smart.
I feel like I'm not making my point very clear, but hopefully you understand what I'm trying to say.
To summarize: I do not see a reason why it is _necessarily and obviously_ true that once you get to AGI (or any other increase in intelligence from where we are now) that ASI (or any further step beyond whatever increase you are talking about) will take significantly less time. It _might_, but I don't' understand why it _must_.
The presumption is that getting true AGI, if you get it, allows the AGI to then act as a researcher (on alignment, capabilities or anything else) or any other cognitive task, with copies that learn from each other and are much faster and cheaper to use than humans, so this is a several orders of magnitude speed up effect to further work - the AI is doing the work where currently humans are doing the work.
I wouldn't quite say this _must_ dominate, but expecting anything else seems super weird, and presumes an orders of magnitude jump in difficulty at exactly this level, which we have no reason to presume.
I don't expect or not expect it, and I don't think that it has to occur at exactly this level.
So I think part of my confusion (or disagreement), is that I don't have a sense that this hasn't already been true (or that we don't know how true it has been up to this point). Part of the problem is that we don't have (that I know of) a good objective measure of intelligence that operates well across domains and contexts and across extremely wide ranges of intelligence. If we did, it would be trivial to show how intelligence of software has changed (and how quickly it has changed) over the past 50 years or something. Or alternatively, we have only had software for which the question "how smart is it" even makes sense for, at most, a decade or two (generously), so even if we had such a measure, enough time hasn't really passed to know how the rate of development of such software has changed.
Basically, I'm not convinced we have a good idea of how rate-of-change of intelligence of software has changed over time up to this point, and it's not obvious to me that it's been speeding up, which means we don't require any special evidence to think it might continue that way for AGI and ASI.
“Build the AGI that will give us everything we want” (or words to that effect)... see, this is where all my cultural programming starts jumping up and down screaming, ‘No, you never do that, never goes well, always goes super badly, don’t you know you’re supposed to THROW THE RING INTO THE CRACKS OF MOUNT DOOM?’ I mean, given the multiple stacks of impossible problems you envisage before you even reach the stage of, “we possibly survive, but the future is highly uncertain and humans are by default irrelevant,” some variation on Mount Doom has got to be the preferred outcome, no? Or do you see “the AGI that gives us everything we want”, not as a huge red flashing warning sign, but as a big enough prize to justify a different approach?
Yeah, I suppose in addition to all the fables delivering the warning, I should also take into account all the fables delivering the warning that we are very bad at heeding warnings...
I get the people who don't think doom is likely and are working on things that can help humanity. I get the people who think doom is likely and want a full stop.
I don't get the people who think doom is likely but still want to move forward. Zvi seems like he's in this third category, so maybe he can explain his position.
We can _probably_ "move forward", safely – with the AI models we have now, e.g. integrating them into, basically, everything. Doom is likely because we probably won't stop or pause at any particular level of ('frontier') model capability.
Thanks for clarifying. Often your posts praise new capabilities and expanded use of AI, which left me with the impression that you wanted to keep pushing forward for the most part, but didn't want certain things (agentic AI, for instance).
I would think that often praising AI innovations and the frequency with which you use it these programs are counter to your actual goal, and it may be more convincing to use neutral language on new capabilities and negative language about AI generally.
One thing that does make me optimistic is how old and unchallenged the world is. China is reduced to building islands in the South China Sea and pretending to care about the Himalayas; India is reduced to pretending that Pakistan matters. And then you've got Europe and North America. And class and religion.
If AI had been on the borders of realisability any time from the 16th to the 20th centuries then there would have been no hope whatsoever. We'd all have been eaten alive by entities with complex theories about belonging whatever anyone tried to do.
Question: are these worse than the human-generated equivalent?
Around 2005, I started noticing that it was usually a waste of time trying to learn a skill via Google. Even the "good" results at the top tended to be full of subtle misinformation. It was better to find a smart person/website and learn from them.
In the 00s, Google's enshittification got worse. For certain terms, the entire first page would be ads or unreadable SEO rubbish auto-generated by what looked like a 1930s-style Markov chain. Remember eHow and Wikihow, and how terrible those were? And how they just sat at the top of Google results for years and years before Panda stomped them in 2011?
We need to remember that the "human written" vs "AI written" debate is mostly of interest to people with skin in the game: eg, copywriters and webmasters. The average guy doesn't care how a webpage was written. He just wants his question answered. It's irrelevant to most users whether the text they're reading was written by a human or an LLM or a demon-possessed bowl of alphabet soup. They just want to know how hot to bake their damned pizza.
I don't think Tegmark and Omohundro are arguing that you can necessarily find a proof of safety if one exists, rather they are arguing you shouldn't trust a system you cannot prove is safe.
However, as you correctly note, any "proof of safety" only proves whatever notion of safety you can formally define, and this is unlikely to ever be sufficient in the real world. Most successful exploits of "provably secure" cryptographic systems succeed through pathways that are not captured by the notion of security used in the proof (e.g. side channel attacks, social-engineering, etc...); such attacks don't invalidate the security proof or the hardness asumption on which it rests, but they still break the system.
But even if someone with god-like security mindest formalized a notion of safety that covered every possible base, I think this approach is doomed to failure for another reason. Godel's second Incompleteness Theorem states that any sufficiently powerful mathematical theory cannot be proved consistent within that theory. A possible corollary is that any sufficiently powerful AI cannot be proved safe by those who created it (or by itself or any other AI of similar power). This suggests that, as I think you have pointed out several times, there are really only two possibilities: (1) powerful AI won't exist any time soon and we are fine, (2) we are not fine.
I'm glad you're talking about the problem of simply being outcompeted. I agree almost no one is thinking seriously about this. I think the closest most AI-risk people get is in asking "Whose values will we align it with?", which elides the possibility that it might be no one's values. Perhaps an analogous question is "whose values is the military-industrial complex aligned with?"
“If you get AGI in 2-8 years, you get ASI in a lot less than 2-8 more years after that.“
Curious if you can expand on that? Does it hinge on whether you believe LLMs will continue through directly to ASI via scale, or do you think it applies even if LLMs level out at near-AGI due to data-limits or limits of next-token-prediction (because presumably it’s the sheer volume of cheap AI-research that leads to ASI)?
> Is generative AI in violation of copyright? Perhaps it is.
Imagine that US courts find generative AI in violation. (The nuances are beyond me.) What could politicians do?
-Could the attorney general reprioritize this aspect of copyright enforcement to the lowest possible level? That leaves enforcement to civil suits.
-Could Congress pass, and the president sign, a bill restricting the scope of copyright law? That could take LLMs off the hook entirely.
Would politicians do this for nothing, or start bargaining? If the latter,
-Would they ask for something related to LLMs?
-Would they ask for something bigger/broader that Meta, Microsoft, and Alphabet could do? (Considering how expensive being found in violation would be.)
What if one party held a government trifecta at the time? How partisan might the demands be, in their likely impact if not their phrasing?
> I remind everyone that we don’t let Chinese AI (or other) talent move to America, so we cannot possibly care that much about winning this battle.
And on August 8,
> No, seriously, we could devastate China if we allowed skilled immigrants to come contribute here instead of there (or from other places as well), until we do this I do not want to hear anyone talk about having to ‘beat China’ some other way, and I will keep talking about this.
What would this mean for espionage? Surely some AI talent would remain in China (due to denial of exit permission, genuine patriotism, satisfaction, indolence, better pay, etc.). Spies in the US would try to feed that stay-behind talent with everything they learn, right?
My guess: this tradeoff is substantially worthwhile for the US - for ML in particular and engineering talent in general. Thinking broadly, relative GDP growth seems like a big deal when trying to deter a rising power from starting a war.
But my guess is poorly informed. I would like to see the espionage risk evaluated seriously. For example, where would the FBI hire people capable of hunting a twenty-fold increase in Chinese spies in US ML? Does it have enough people at present?
That argument might track if there was hard block on *all* Chinese immigrants in sensitive fields, but there isn't, there's a weird lottery with low caps that lets in more than enough people to be a security threat while still turning away like 90% of applicants
RE: Google searches, my sense recently has been that I’ve seen a lot of AI-generated images shitting up Image Search. Deliberate attempts to replicate the trend have failed, though.
Not a lull. Just over-hyped by you and others.
This feels uncharitable, but ultimately I have to agree. Predictions from 6-12 months ago would have seen profound changes with existing LLMs by now. The slow incrementalists seem to be winning the argument.
this seems like as good a place as any to put it down in writing:
i predict we will see profound changes in LLM capacity in the next 6-12 months which are, in some vague sense, of a higher magnitude than the current plateau is flat
Can you say why you expect this? Or what you expect it to look like?
You may have written about this elsewhere, but what are your thoughts about the argument that even if alignment was achieved, we'd still be screwed because a bad actor could just align an AI to be bad, including existential-risk level bad? Is the strategy just: if we don't figure out alignment we're quite possibly toast, so let's figure it out, and if there are problems afterwards, we'll cross that bridge when we get to it?
I think this is a super neglected problem, which is a central issue in what I call the 'Phase 2' problem - you have to survive Phase 1 and solve alignment first, and then you need to survive the resulting dynamics where by default everyone gets hold of their AIs and can change their alignment settings and direct their goals and actions, and then the laws of evolution/capitalism/competition/moloch/etc take over and also some people will be malicious actors, and whoops. So yes, 'solve alignment' is not well enough defined to be a victory condition.
(Also it's a strong reason that even if you decide to let people have their own instantiations and variations of AGIs or even ASIs they can direct, this is incompatible with them being open source, you have to guard against certain types of changes.)
Isn't this what the pivotal act idea is about? Y'know melt all the GPUs that are not in safe hands or whatever.
Yup!
At least some methods like QACI make it impossible to align it to something evil for most people, because they basically find what you would want at reflective equiibrium
>I don’t see this as consistent. If you get AGI in 2-8 years, you get ASI in a lot less than 2-8 more years after that.
So this sentence (and FOOM/fast takeoff arguments more generally) assume a certain (low) level of ramp up in difficulty in increasing intelligence, but I have never seen seriously addressed the possibility that difficulty of increasing intelligence might be exponentional/super-exponential. I'm not arguing that it _is_ because I don't know (and I think it's probably true that _no one_ "knows"), but it seems to me to at least be possible that each increase in intelligence is more difficult than the previous step made you smarter (if that made sense).
Imagine a (extremely simplified) intelligence Ladder with Steps A, B, and C.
Figuring out how Step A when you are on the ground takes some amount of time. Once you figure it out, you are smarter now and figuring out Step B will, necessarily, take less time than it would have _from the ground_, so in some sense, it will "take less time", but if it's a sufficiently more difficult step, it still might take you longer than the original Step A took, even if it takes less than the counterfactual where you are still ground-level smart.
I feel like I'm not making my point very clear, but hopefully you understand what I'm trying to say.
To summarize: I do not see a reason why it is _necessarily and obviously_ true that once you get to AGI (or any other increase in intelligence from where we are now) that ASI (or any further step beyond whatever increase you are talking about) will take significantly less time. It _might_, but I don't' understand why it _must_.
The presumption is that getting true AGI, if you get it, allows the AGI to then act as a researcher (on alignment, capabilities or anything else) or any other cognitive task, with copies that learn from each other and are much faster and cheaper to use than humans, so this is a several orders of magnitude speed up effect to further work - the AI is doing the work where currently humans are doing the work.
I wouldn't quite say this _must_ dominate, but expecting anything else seems super weird, and presumes an orders of magnitude jump in difficulty at exactly this level, which we have no reason to presume.
I don't expect or not expect it, and I don't think that it has to occur at exactly this level.
So I think part of my confusion (or disagreement), is that I don't have a sense that this hasn't already been true (or that we don't know how true it has been up to this point). Part of the problem is that we don't have (that I know of) a good objective measure of intelligence that operates well across domains and contexts and across extremely wide ranges of intelligence. If we did, it would be trivial to show how intelligence of software has changed (and how quickly it has changed) over the past 50 years or something. Or alternatively, we have only had software for which the question "how smart is it" even makes sense for, at most, a decade or two (generously), so even if we had such a measure, enough time hasn't really passed to know how the rate of development of such software has changed.
Basically, I'm not convinced we have a good idea of how rate-of-change of intelligence of software has changed over time up to this point, and it's not obvious to me that it's been speeding up, which means we don't require any special evidence to think it might continue that way for AGI and ASI.
“Build the AGI that will give us everything we want” (or words to that effect)... see, this is where all my cultural programming starts jumping up and down screaming, ‘No, you never do that, never goes well, always goes super badly, don’t you know you’re supposed to THROW THE RING INTO THE CRACKS OF MOUNT DOOM?’ I mean, given the multiple stacks of impossible problems you envisage before you even reach the stage of, “we possibly survive, but the future is highly uncertain and humans are by default irrelevant,” some variation on Mount Doom has got to be the preferred outcome, no? Or do you see “the AGI that gives us everything we want”, not as a huge red flashing warning sign, but as a big enough prize to justify a different approach?
It's amazing how much this is the monkey paw we have been warned about and we do not seem to be changing course.
Yeah, I suppose in addition to all the fables delivering the warning, I should also take into account all the fables delivering the warning that we are very bad at heeding warnings...
I get the people who don't think doom is likely and are working on things that can help humanity. I get the people who think doom is likely and want a full stop.
I don't get the people who think doom is likely but still want to move forward. Zvi seems like he's in this third category, so maybe he can explain his position.
We can _probably_ "move forward", safely – with the AI models we have now, e.g. integrating them into, basically, everything. Doom is likely because we probably won't stop or pause at any particular level of ('frontier') model capability.
I suspect Zvi would take a full stop with both hands if the offer was convincing. But yes, my original question is directed at him...
Oh you give me a Zones of Thought style solution and I will happily take it. Offer me an actual pause I'll take that too.
Thanks for clarifying. Often your posts praise new capabilities and expanded use of AI, which left me with the impression that you wanted to keep pushing forward for the most part, but didn't want certain things (agentic AI, for instance).
I would think that often praising AI innovations and the frequency with which you use it these programs are counter to your actual goal, and it may be more convincing to use neutral language on new capabilities and negative language about AI generally.
I may now have to reread both of those. Thanks for the reminder...
One thing that does make me optimistic is how old and unchallenged the world is. China is reduced to building islands in the South China Sea and pretending to care about the Himalayas; India is reduced to pretending that Pakistan matters. And then you've got Europe and North America. And class and religion.
If AI had been on the borders of realisability any time from the 16th to the 20th centuries then there would have been no hope whatsoever. We'd all have been eaten alive by entities with complex theories about belonging whatever anyone tried to do.
- "tabula rosa" should be "tabula rasa"
>Beware of AI-generated garbage articles
Question: are these worse than the human-generated equivalent?
Around 2005, I started noticing that it was usually a waste of time trying to learn a skill via Google. Even the "good" results at the top tended to be full of subtle misinformation. It was better to find a smart person/website and learn from them.
In the 00s, Google's enshittification got worse. For certain terms, the entire first page would be ads or unreadable SEO rubbish auto-generated by what looked like a 1930s-style Markov chain. Remember eHow and Wikihow, and how terrible those were? And how they just sat at the top of Google results for years and years before Panda stomped them in 2011?
We need to remember that the "human written" vs "AI written" debate is mostly of interest to people with skin in the game: eg, copywriters and webmasters. The average guy doesn't care how a webpage was written. He just wants his question answered. It's irrelevant to most users whether the text they're reading was written by a human or an LLM or a demon-possessed bowl of alphabet soup. They just want to know how hot to bake their damned pizza.
IME the big difference is that human written crap still cost something, so it wasn't an issue for sufficiently obscure topics. This is no longer true.
I don't think Tegmark and Omohundro are arguing that you can necessarily find a proof of safety if one exists, rather they are arguing you shouldn't trust a system you cannot prove is safe.
However, as you correctly note, any "proof of safety" only proves whatever notion of safety you can formally define, and this is unlikely to ever be sufficient in the real world. Most successful exploits of "provably secure" cryptographic systems succeed through pathways that are not captured by the notion of security used in the proof (e.g. side channel attacks, social-engineering, etc...); such attacks don't invalidate the security proof or the hardness asumption on which it rests, but they still break the system.
But even if someone with god-like security mindest formalized a notion of safety that covered every possible base, I think this approach is doomed to failure for another reason. Godel's second Incompleteness Theorem states that any sufficiently powerful mathematical theory cannot be proved consistent within that theory. A possible corollary is that any sufficiently powerful AI cannot be proved safe by those who created it (or by itself or any other AI of similar power). This suggests that, as I think you have pointed out several times, there are really only two possibilities: (1) powerful AI won't exist any time soon and we are fine, (2) we are not fine.
I'm glad you're talking about the problem of simply being outcompeted. I agree almost no one is thinking seriously about this. I think the closest most AI-risk people get is in asking "Whose values will we align it with?", which elides the possibility that it might be no one's values. Perhaps an analogous question is "whose values is the military-industrial complex aligned with?"
“If you get AGI in 2-8 years, you get ASI in a lot less than 2-8 more years after that.“
Curious if you can expand on that? Does it hinge on whether you believe LLMs will continue through directly to ASI via scale, or do you think it applies even if LLMs level out at near-AGI due to data-limits or limits of next-token-prediction (because presumably it’s the sheer volume of cheap AI-research that leads to ASI)?
If AGI is able to do human level cognitive work, except with AI advantages, time frames change a lot.
> Is generative AI in violation of copyright? Perhaps it is.
Imagine that US courts find generative AI in violation. (The nuances are beyond me.) What could politicians do?
-Could the attorney general reprioritize this aspect of copyright enforcement to the lowest possible level? That leaves enforcement to civil suits.
-Could Congress pass, and the president sign, a bill restricting the scope of copyright law? That could take LLMs off the hook entirely.
Would politicians do this for nothing, or start bargaining? If the latter,
-Would they ask for something related to LLMs?
-Would they ask for something bigger/broader that Meta, Microsoft, and Alphabet could do? (Considering how expensive being found in violation would be.)
What if one party held a government trifecta at the time? How partisan might the demands be, in their likely impact if not their phrasing?
> I remind everyone that we don’t let Chinese AI (or other) talent move to America, so we cannot possibly care that much about winning this battle.
And on August 8,
> No, seriously, we could devastate China if we allowed skilled immigrants to come contribute here instead of there (or from other places as well), until we do this I do not want to hear anyone talk about having to ‘beat China’ some other way, and I will keep talking about this.
What would this mean for espionage? Surely some AI talent would remain in China (due to denial of exit permission, genuine patriotism, satisfaction, indolence, better pay, etc.). Spies in the US would try to feed that stay-behind talent with everything they learn, right?
My guess: this tradeoff is substantially worthwhile for the US - for ML in particular and engineering talent in general. Thinking broadly, relative GDP growth seems like a big deal when trying to deter a rising power from starting a war.
But my guess is poorly informed. I would like to see the espionage risk evaluated seriously. For example, where would the FBI hire people capable of hunting a twenty-fold increase in Chinese spies in US ML? Does it have enough people at present?
That argument might track if there was hard block on *all* Chinese immigrants in sensitive fields, but there isn't, there's a weird lottery with low caps that lets in more than enough people to be a security threat while still turning away like 90% of applicants
RE: Google searches, my sense recently has been that I’ve seen a lot of AI-generated images shitting up Image Search. Deliberate attempts to replicate the trend have failed, though.
how has no one commented on the Your Mom zinger? really caught me off guard, well done
Those aren't Roon's beats, they're quoting the final verse of ERB's Gates v Jobs rap battle. Which incidentally is excellent.
https://www.youtube.com/watch?v=njos57IJf-0