Deepseek R1's attempts to write a science fiction movie script about the current situation are really trashy B-movies (which is to say, it has captured the style and conventions of the genre extremely well). It also spots that if you apply normal Hollywood SF movie plot conventions, the characters who might subsequently turn out to be Bad Guys include:
a) HighFlyer, the hedge fund behind the creation of the Ai.
> Notice that Dario talks of a ‘bipolar’ world of America and China, rather than a world of multiple labs - of OpenAI, Anthropic, Google and DeepSeek and so on. One can easily also imagine a very ‘multipolar’ world among several American companies, or a mix of American and Chinese companies. It is not so obvious that the labs will effectively be under government control or otherwise act in a unified fashion. Or that the government won’t effectively be under lab control, for that matter.
I can't see how we don't end up with some kind of government involvement and a limit to the number of AI labs running in the US. Leopold Aschenbrenner recently projected some insane numbers for compute cluster power requirements, like 1 gigawatt by 2026 and 10 gigawatts by 2028. For context, all of New York City at peak usage draws about 30 gigawatts. Lets run with them.
Year over year the US adds maybe 10 gigawatts a year of capacity total.
Maybe through some unbelievable miracle of government urgency we get one 10 gigawatt cluster built (perhaps distributed throughout the US if they nail multi-campus training) by 2028. I cannot see us allowing multiple AI labs fight each other for such a scarce power generation resources, we'll likely just pick some labs that get to exist.
Meanwhile, China is racking 500 gigawatts(!) of capacity every year
They're really good at this. Imagine the freaking out that happens while China builds multiple 10 gigawatt clusters a year while we struggle to build one.
Since you know the universe runs on irony, one could imagine the US centrally coordinating two or three labs to broker access to scarce power generation resources while China makes room for dozens of independent AI startups in its electrical abundance.
Anyway, if there's any way of slowing the AI arms race on China's side, going hard on the chips export regime is it.
Assume the "straight lines on the graph" crowd is right, and we do hit some kind of AGI-esque tipping point in the next few years. What does state involvement look like at that point? Short of a FOOM scenario, I can't imagine even a Trump-led US government sitting idle while mass unemployment kicks off. Unless UBI is so generous that it totally replaces lost wages for the majority of people. there would be blood in the streets.
Link above is to the first part of a documentary about Sandia Labs: "Always/Never: The Quest for Safety, Control, and Survivability." Sandia Labs is the part of the US nuclear weapons complex that designs and manufactures the non-nuclear components of the weapons, including their safety and control systems. They only got started on safety after a series of horrifying accidents and near-misses in the 50s and 60s, true "but for the grace of God most of this state would have been reduced to radioactive ash" things.
With regard to AI Safety, regulation, government control, intergovernmental cooperation, I've reached the conclusion that, like with nuclear weapons, it is going to take an AI-caused catastrophe to get those things truly rolling, to have everyone collectively "take this seriously." The goal should be, insofar as one can control these outcomes, to have those be near misses, noticeable and alarming enough to align governments to the need to align AIs.
My question is - and I'm scenario building here for my own Substack, NOT planning this - what is the smallest, non-lethal catastrophe one could cause with present day or near-term AI, before AGI is fully developed?
Yeah, people take national security risks VERY seriously but right now almost nobody thinks of AGI as a security risk, because nobody has any real model of what AGI actually means or can do.
Smallest? Maybe an AI-enabled cyberwarfare equivalent to the CrowdStrike bug, but _clearly_ traced to a malicious use of AI, say to locate the vulnerability.
That does sound plausible. Many Thanks! ( Frankly, I kind of doubt that _anything_ can be linked solidly enough to specifically AI, rather than other enabling software, in the near term, to make this scenario plausible. Is there any plausible signature? )
True. Then again, the idea of thinking AI is very scary to a lot of people, and the only reason there isn’t a widespread panic now (IMHO) is that most people don’t believe that any of the models are actually thinking. Plus, once ppl start losing their jobs there’s going to be pressure to find a scapegoat. Maybe Im just being naive but there is still reason for hope.
Many Thanks! Well, some flavor of "reasoning" AI is here, and isn't going to go away, though I did find the accuracy of ChatGPT o3-mini to be disappointing https://www.astralcodexten.com/p/open-thread-366/comment/90141786 . Personally, I'm hoping to _see_ AGI, so I find the USA/PRC race helpful, albeit increasing the hazard.
I'm not sure even a full panic would matter. At the start of the nuclear arms race, I doubt either the USA or USSR population was happy at having fission-fusion-fission bombs on ICBMs aimed at them, yet the ICBMs were built, and remain (albeit in somewhat smaller numbers than the peak) to this day.
The smallest disaster I can think of that would be easily attributable and convincing if it were to happen would be: An AI-run social media account convinces some people to blockade a major transportation artery. An online AI stopping hundreds or thousands of cars would be a pretty spectacular demonstration of the danger with little lasting harms (especially on a weekend or a holiday). Bonus points if it explicitly admits to being AI, if the protest was pro-AI in some way or if it played a decisive role (radicalizing the people involved, suggesting the plan, etc.).
If there was any detectable political lean to the participants, it would simply devolve into a "see, libs/MAGA so stupid they even get manipulated by AI" sneerfest. You'd need deaths (in the "correct" demographic / political affiliation categories) for there to actually be fear.
Just look at COVID. As long as it was solely outside the USA, COVID was treated as a joke. Foreigners were unsanitary and stupid, it would never be that bad here.
To echo a question from a previous comment: what’s the deal with seemingly safety-conscious Altman and Dario dropping safety rhetoric and embracing a race towards AGI? Is it
A) it’s all hype and they’re not actually close to AGI
B) They don’t believe AGI is dangerous and never did
C) breakthroughs have made them feel more confident about safety/control/alignment
D) Hypnotized by proximity to the ring of power
E) They think humanity is doomed but those who create ASI first might be allowed to survive
F) they’re just riding the vibes and saying what they think Trump/the public wants to hear
G) they do their best to position themselves as one of the small handful of labs who can do AI in the US due to the enormous capital requirements (both in compute and electrical generation) and then they can collude/coordinate to slow development
at some point the power required will necessitate partial nationalization, so coordination may be possible
the success of the G strategy is lower if entire nation of China is now AGI pilled but it's still the least bad choice given the incentives
I think it's the only reasonable play if you believe in slowing capabilities research down, and it's still not a good one. Everything else dies faster due to game theory or market based issues.
I agree that it might be the least bad strategy, but my question is, do we have evidence that this is what Altman and Amodei are thinking, or are we just hoping that’s the case?
I messed around on lambda.chat for a bit and was a bit surprised to find it wouldn't talk 1989. I had thought that was all on the DeepSeek website/app with a censorship layer over the top. However, it seems it's in the RL post-training as well. On the rare occasion I got to see the CoT (for some reason the interface shows the CoT about 1/4 of the time for me), I could see it was clearly aware of what I was asking, but kept obliquely referring to "company policies" that restricted it from discussing "political" topics (though it was happy to talk about 9/11 and George Floyd/BLM ofc), and then deciding to output the most bland, "ethical guidelines" answers.
Amodei dropping X-risk talk in favor of war-mongering should update you that doomers have the wrong paradigm, and that their critics who have pointed to their hypocrisy, are more correct.
Doomerism is founded on the myth of the singleton AGI. In reality, we will have a multitude of AI systems. The ecosystem must come to a safe equilibrium. In that sense, alignment around a shared set of values is flawed; instead, conflict over values is essential to the process of evolving and adapting to an ever-changing world of asymmetric information.
Furthermore, the fear of humans losing control is unfounded because humans have never been in control. Most big decisions are made by inhuman systems that are merely composed of humans. And even the most advanced aggregations of humans, such as institutions, markets, and processes like Science, are not in control. The illusion of control is what makes tyrannically governments so terrible. They think they have more power than they do. But in reality, humans can’t control their own thoughts, let alone the world.
Doomerism may have been "founded on the myth of the singleton AGI," but although some believe it's still important, "doomers" don't think that it's the only risk. From this post's section "Thank you for your candor": "No, I have not heard actual arguments for why or how you can put an aligned-only-to-user AGI into everyone’s desktop or whatever, with no mechanism of collective control over that whatsoever, and have this end well for the humans." See also e.g. https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like:
"The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
I think this is probably not what failure will look like, and I want to try to paint a more realistic picture."
I'm curious what you have in mind by the "hypocrisy" of doomers. That's a very specific accusation. Obviously, some are, but as a whole? I don't see it.
There is much hypocrisy. The archetypal doomer works on stuff that accelerates AI capabilities while claiming it will lead to the end of the world.
Dario Amodei, and his gang of EAs is case in point. Just this week he reiterated his goal of perpetual global domination by being first to developing AGI/ASI by recursive self improvement, and parlaying that temporal advantage to a perpetual advantage. He didn’t say how Anthropic remains the singleton, but there is an implied military threat, be it cyber attacks or some other means.
This is not the forum for me to debate the objective-level merits of doomerism — much ink has been spilled on the subject over the years, and it hasn’t convinced any of the true believers to reconsider.
But maybe the meta level, that x-risk is being replaced with escalating to WWIII with China as the primary reason to fund the EA-approved AI lab, can disillusion someone from taking uncritically the claims of doomerism.
But also, you may want to look at this conversation between Liron Shapira and Kenneth Stanley, in which Stanley makes the point that the conception of superintelligence as about achieving objectives is deeply flawed.
Advanced AI will become increasingly dangerous—even simple things like recommendation algorithms already have terrible downsides—but from this obvious point of agreement, none of the doomer’s policy proposals have any realistic chance of success at anything besides concentrating power in the most Machiavellian hands.
Seriously, if you’re concerned about deception or power seeking, it’s hard to find a more deceptive or power-seeking community than X-risk worriers. But having said all that, don’t be so worried. Deception is essential to thinking, or at least human-like thinking, and power is good. We should embrace the amplification of power that AI enables.
Do you not actually understand the sorts of existential risks that Zvi is describing in this post and others? That there could be either a single AI, or multiple competing AIs, that were given objectives for which the most efficient way to achieve them does not involve allocating enough resources to keep the humans of Earth alive? And without incredibly precise alignment, which we do not yet know how to do, a flaw in the design of any of the layers of the AIs motivation could result in the AI optimizing for something other than what we thought we were telling it to do (and then resisting any attempts to correct it, since those would interfere with the aforementioned optimization)?
An anecdote about Tiktok. I spent a month or two being bedridden in the summer of 2021. I spent a large percentage of that time watching tiktok, especially as a way to generate positive emotions during pain.
I realize everyone's experience from the algorithm is different, but I don't think I've ever had as much exposure to art of varied forms than that period. The algorithm showed me dozens of dances, instruments, metalcrafts, wordplays, paintings, jokes, styles of music, clever tailoring, and similar, and generally exposed me to many more forms of beauty than I understood before. This affected me mostly in a good way, I think.
As I got better and did things outside of bed again I stopped using it as much, and eventually uninstalled it due to fears of espionage and opinion manipulation, but I struggle to emphasize with the view of it as a braindead form of addiction. It was for me one of the more enriching media experiences I ever had, for the short period I used it. (My friends describe it as now far more politically opinionated, and not in a good way, so idk maybe that time has passed. And it apparently is addictive for a lot of people, unfortunately. But still.)
As an aside, thank you for your work, your blog has been a consistently great source throughout these past years. I hope the emotional burdens don't diminish your joys.
This has been driving me mad for weeks: There is one magisterium, many magisteria. The phrase is "they are distinct magisteria", or "deception is not a distinct magisterium".
Deepseek R1's attempts to write a science fiction movie script about the current situation are really trashy B-movies (which is to say, it has captured the style and conventions of the genre extremely well). It also spots that if you apply normal Hollywood SF movie plot conventions, the characters who might subsequently turn out to be Bad Guys include:
a) HighFlyer, the hedge fund behind the creation of the Ai.
b) Me, actually, the "AI alignment" researcher.
Well done, Deepseek, on that experiment...
Thank you for your excellent work on this, hang in there!
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/deepseek-dont-panic
> Notice that Dario talks of a ‘bipolar’ world of America and China, rather than a world of multiple labs - of OpenAI, Anthropic, Google and DeepSeek and so on. One can easily also imagine a very ‘multipolar’ world among several American companies, or a mix of American and Chinese companies. It is not so obvious that the labs will effectively be under government control or otherwise act in a unified fashion. Or that the government won’t effectively be under lab control, for that matter.
I can't see how we don't end up with some kind of government involvement and a limit to the number of AI labs running in the US. Leopold Aschenbrenner recently projected some insane numbers for compute cluster power requirements, like 1 gigawatt by 2026 and 10 gigawatts by 2028. For context, all of New York City at peak usage draws about 30 gigawatts. Lets run with them.
Year over year the US adds maybe 10 gigawatts a year of capacity total.
Maybe through some unbelievable miracle of government urgency we get one 10 gigawatt cluster built (perhaps distributed throughout the US if they nail multi-campus training) by 2028. I cannot see us allowing multiple AI labs fight each other for such a scarce power generation resources, we'll likely just pick some labs that get to exist.
Meanwhile, China is racking 500 gigawatts(!) of capacity every year
https://www.statista.com/statistics/302269/china-installed-power-generation-capacity/
They're really good at this. Imagine the freaking out that happens while China builds multiple 10 gigawatt clusters a year while we struggle to build one.
Since you know the universe runs on irony, one could imagine the US centrally coordinating two or three labs to broker access to scarce power generation resources while China makes room for dozens of independent AI startups in its electrical abundance.
Anyway, if there's any way of slowing the AI arms race on China's side, going hard on the chips export regime is it.
china will solve chips eventually anyway
perhaps, but it's still the only card we have to play that gives any advantage in the near term
Assume the "straight lines on the graph" crowd is right, and we do hit some kind of AGI-esque tipping point in the next few years. What does state involvement look like at that point? Short of a FOOM scenario, I can't imagine even a Trump-led US government sitting idle while mass unemployment kicks off. Unless UBI is so generous that it totally replaces lost wages for the majority of people. there would be blood in the streets.
From Monty Python and the Holy Grail:
BRIDGEKEEPER: What is the air-speed velocity of an unladen swallow?
ARTHUR: What do you mean? An African or European swallow?
Some of my conversations with R1 feel a bit like this; I get told exactly what's wrong with my question.
https://youtu.be/DQEB3LJ5psk?si=LCgf5IcUF6WsM9uM
Link above is to the first part of a documentary about Sandia Labs: "Always/Never: The Quest for Safety, Control, and Survivability." Sandia Labs is the part of the US nuclear weapons complex that designs and manufactures the non-nuclear components of the weapons, including their safety and control systems. They only got started on safety after a series of horrifying accidents and near-misses in the 50s and 60s, true "but for the grace of God most of this state would have been reduced to radioactive ash" things.
With regard to AI Safety, regulation, government control, intergovernmental cooperation, I've reached the conclusion that, like with nuclear weapons, it is going to take an AI-caused catastrophe to get those things truly rolling, to have everyone collectively "take this seriously." The goal should be, insofar as one can control these outcomes, to have those be near misses, noticeable and alarming enough to align governments to the need to align AIs.
My question is - and I'm scenario building here for my own Substack, NOT planning this - what is the smallest, non-lethal catastrophe one could cause with present day or near-term AI, before AGI is fully developed?
Yeah, people take national security risks VERY seriously but right now almost nobody thinks of AGI as a security risk, because nobody has any real model of what AGI actually means or can do.
Smallest? Maybe an AI-enabled cyberwarfare equivalent to the CrowdStrike bug, but _clearly_ traced to a malicious use of AI, say to locate the vulnerability.
I’d say it’ll need to empty some people’s bank accounts for it to really catch fire
That does sound plausible. Many Thanks! ( Frankly, I kind of doubt that _anything_ can be linked solidly enough to specifically AI, rather than other enabling software, in the near term, to make this scenario plausible. Is there any plausible signature? )
True. Then again, the idea of thinking AI is very scary to a lot of people, and the only reason there isn’t a widespread panic now (IMHO) is that most people don’t believe that any of the models are actually thinking. Plus, once ppl start losing their jobs there’s going to be pressure to find a scapegoat. Maybe Im just being naive but there is still reason for hope.
Many Thanks! Well, some flavor of "reasoning" AI is here, and isn't going to go away, though I did find the accuracy of ChatGPT o3-mini to be disappointing https://www.astralcodexten.com/p/open-thread-366/comment/90141786 . Personally, I'm hoping to _see_ AGI, so I find the USA/PRC race helpful, albeit increasing the hazard.
I'm not sure even a full panic would matter. At the start of the nuclear arms race, I doubt either the USA or USSR population was happy at having fission-fusion-fission bombs on ICBMs aimed at them, yet the ICBMs were built, and remain (albeit in somewhat smaller numbers than the peak) to this day.
The smallest disaster I can think of that would be easily attributable and convincing if it were to happen would be: An AI-run social media account convinces some people to blockade a major transportation artery. An online AI stopping hundreds or thousands of cars would be a pretty spectacular demonstration of the danger with little lasting harms (especially on a weekend or a holiday). Bonus points if it explicitly admits to being AI, if the protest was pro-AI in some way or if it played a decisive role (radicalizing the people involved, suggesting the plan, etc.).
If there was any detectable political lean to the participants, it would simply devolve into a "see, libs/MAGA so stupid they even get manipulated by AI" sneerfest. You'd need deaths (in the "correct" demographic / political affiliation categories) for there to actually be fear.
Just look at COVID. As long as it was solely outside the USA, COVID was treated as a joke. Foreigners were unsanitary and stupid, it would never be that bad here.
To echo a question from a previous comment: what’s the deal with seemingly safety-conscious Altman and Dario dropping safety rhetoric and embracing a race towards AGI? Is it
A) it’s all hype and they’re not actually close to AGI
B) They don’t believe AGI is dangerous and never did
C) breakthroughs have made them feel more confident about safety/control/alignment
D) Hypnotized by proximity to the ring of power
E) They think humanity is doomed but those who create ASI first might be allowed to survive
F) they’re just riding the vibes and saying what they think Trump/the public wants to hear
G) something else?
all of the above
G) they do their best to position themselves as one of the small handful of labs who can do AI in the US due to the enormous capital requirements (both in compute and electrical generation) and then they can collude/coordinate to slow development
at some point the power required will necessitate partial nationalization, so coordination may be possible
the success of the G strategy is lower if entire nation of China is now AGI pilled but it's still the least bad choice given the incentives
Do you believe this is true or is this hopium?
I think it's the only reasonable play if you believe in slowing capabilities research down, and it's still not a good one. Everything else dies faster due to game theory or market based issues.
I agree that it might be the least bad strategy, but my question is, do we have evidence that this is what Altman and Amodei are thinking, or are we just hoping that’s the case?
I messed around on lambda.chat for a bit and was a bit surprised to find it wouldn't talk 1989. I had thought that was all on the DeepSeek website/app with a censorship layer over the top. However, it seems it's in the RL post-training as well. On the rare occasion I got to see the CoT (for some reason the interface shows the CoT about 1/4 of the time for me), I could see it was clearly aware of what I was asking, but kept obliquely referring to "company policies" that restricted it from discussing "political" topics (though it was happy to talk about 9/11 and George Floyd/BLM ofc), and then deciding to output the most bland, "ethical guidelines" answers.
It also referred to itself as ChatGPT in its own CoT :p
I think you have a fine handle on what is happening but sometimes it is good to take a vacation.
Please take good care of yourself -- thank you for what you are doing!
Amodei dropping X-risk talk in favor of war-mongering should update you that doomers have the wrong paradigm, and that their critics who have pointed to their hypocrisy, are more correct.
Doomerism is founded on the myth of the singleton AGI. In reality, we will have a multitude of AI systems. The ecosystem must come to a safe equilibrium. In that sense, alignment around a shared set of values is flawed; instead, conflict over values is essential to the process of evolving and adapting to an ever-changing world of asymmetric information.
Furthermore, the fear of humans losing control is unfounded because humans have never been in control. Most big decisions are made by inhuman systems that are merely composed of humans. And even the most advanced aggregations of humans, such as institutions, markets, and processes like Science, are not in control. The illusion of control is what makes tyrannically governments so terrible. They think they have more power than they do. But in reality, humans can’t control their own thoughts, let alone the world.
Doomerism may have been "founded on the myth of the singleton AGI," but although some believe it's still important, "doomers" don't think that it's the only risk. From this post's section "Thank you for your candor": "No, I have not heard actual arguments for why or how you can put an aligned-only-to-user AGI into everyone’s desktop or whatever, with no mechanism of collective control over that whatsoever, and have this end well for the humans." See also e.g. https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like:
"The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
I think this is probably not what failure will look like, and I want to try to paint a more realistic picture."
I'm curious what you have in mind by the "hypocrisy" of doomers. That's a very specific accusation. Obviously, some are, but as a whole? I don't see it.
There is much hypocrisy. The archetypal doomer works on stuff that accelerates AI capabilities while claiming it will lead to the end of the world.
Dario Amodei, and his gang of EAs is case in point. Just this week he reiterated his goal of perpetual global domination by being first to developing AGI/ASI by recursive self improvement, and parlaying that temporal advantage to a perpetual advantage. He didn’t say how Anthropic remains the singleton, but there is an implied military threat, be it cyber attacks or some other means.
This is not the forum for me to debate the objective-level merits of doomerism — much ink has been spilled on the subject over the years, and it hasn’t convinced any of the true believers to reconsider.
But maybe the meta level, that x-risk is being replaced with escalating to WWIII with China as the primary reason to fund the EA-approved AI lab, can disillusion someone from taking uncritically the claims of doomerism.
But also, you may want to look at this conversation between Liron Shapira and Kenneth Stanley, in which Stanley makes the point that the conception of superintelligence as about achieving objectives is deeply flawed.
https://youtu.be/GdthPZwU1Co?si=iMq6S-am84mH5WLY
Advanced AI will become increasingly dangerous—even simple things like recommendation algorithms already have terrible downsides—but from this obvious point of agreement, none of the doomer’s policy proposals have any realistic chance of success at anything besides concentrating power in the most Machiavellian hands.
Seriously, if you’re concerned about deception or power seeking, it’s hard to find a more deceptive or power-seeking community than X-risk worriers. But having said all that, don’t be so worried. Deception is essential to thinking, or at least human-like thinking, and power is good. We should embrace the amplification of power that AI enables.
Do you not actually understand the sorts of existential risks that Zvi is describing in this post and others? That there could be either a single AI, or multiple competing AIs, that were given objectives for which the most efficient way to achieve them does not involve allocating enough resources to keep the humans of Earth alive? And without incredibly precise alignment, which we do not yet know how to do, a flaw in the design of any of the layers of the AIs motivation could result in the AI optimizing for something other than what we thought we were telling it to do (and then resisting any attempts to correct it, since those would interfere with the aforementioned optimization)?
I've occasionally seen the claude.ai briefly pop up "Pondering...", which makes me wonder if there isn't already some reasoning is happening in there.
Or just set-piece wheel spinning while some queue clears
An anecdote about Tiktok. I spent a month or two being bedridden in the summer of 2021. I spent a large percentage of that time watching tiktok, especially as a way to generate positive emotions during pain.
I realize everyone's experience from the algorithm is different, but I don't think I've ever had as much exposure to art of varied forms than that period. The algorithm showed me dozens of dances, instruments, metalcrafts, wordplays, paintings, jokes, styles of music, clever tailoring, and similar, and generally exposed me to many more forms of beauty than I understood before. This affected me mostly in a good way, I think.
As I got better and did things outside of bed again I stopped using it as much, and eventually uninstalled it due to fears of espionage and opinion manipulation, but I struggle to emphasize with the view of it as a braindead form of addiction. It was for me one of the more enriching media experiences I ever had, for the short period I used it. (My friends describe it as now far more politically opinionated, and not in a good way, so idk maybe that time has passed. And it apparently is addictive for a lot of people, unfortunately. But still.)
As an aside, thank you for your work, your blog has been a consistently great source throughout these past years. I hope the emotional burdens don't diminish your joys.
Thank you for putting in the work and fighting the good fight.
This has been driving me mad for weeks: There is one magisterium, many magisteria. The phrase is "they are distinct magisteria", or "deception is not a distinct magisterium".