This is a really tough issue. I feel like we must remain civil and honourable, but I can understand that many are feeling like we are being backed into a corner and dialogue is failing. I wish I had the faith that Nathan Metzger professes.
Related to Forrest comment there, how does the "doomer" position not turn to terrorism?
At least in a relatively benign form of "mock-terrorism" where you get AI agents to "almost commit" acts of terrorism (like a gun firing a "BANG" placard) ?
Who said anything about bombs? Seriously, chill out. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher. You have to get pretty creative to hurt a person with foam, salt water, or magnets.
You can't promise that. You can't even promise no human life would be harmed short of storming a building and forcing the staff out of it. Even the most thoroughly vetted applications of violence can have unintended consequences. Not to mention you'd have to do this on pretty much every data center at once to avoid beefed up security hampering your efforts and that makes the operation that much more difficult to pull off.
As for your contention that property destruction not being terrorism that's just absurd. Burning crosses in people's yards was absolutely terrorism. There's no widely accepted definition of terrorism that excludes property damage if it's done in the aim of furthering an ideology.
Agreed, and it is not only "beefed up security" that would happen. AI work in general, and data servers in particular are currently visible. Even with the electricity and cooling needs for the data centers (let alone the offices where the AI researchers work), actively hiding them is not a big stretch. It isn't as if racks of electronics were radioactive. If they are threatened, we will stop seeing where they are.
I am under the impression that we have already stopped seeing where they are, which was the impetus for my request for their location in the first place.
I think commenters are jumping to very violent conclusions. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher.
Usually when people talk about terrorism, the purpose is to terrorize people. Your cross burning analogy fails because it was understood as a threat- Get out, or the Klan will kill you and your family. My modest proposal of using salt water and a hose does not put anyone in physical danger. It doesn't threaten their safety. It's not a threat, and if you can call any property destruction for ideological reasons terrorism, then football hooligans often become terrorists whenever their team wins or loses, which seems absurd.
I personally have long opposed the term "terrorism" gaining more and more new meanings, because the state loves to use it to jack up the penalties for crimes.
> I think commenters are jumping to very violent conclusions. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher.
And when the armed guards hired to protect the billion dollar facility object to you rolling in with your electromagnets or dozens of PKP extinguishers?
Shouldn’t you primarily worry about actual terrorists with proven intentions getting access to that same gun?
And illegal, unilateral direct action against datacenters is obviously not a promising strategy for stopping the dynamic described in the post. Such a list would at most be a preparation for the case when a public mandate for action appears.
But all that was said a 100 times already and still the same questions keep being asked.
I may have given off the wrong impression, I don't mean to imply anyone with the most pessimistic outlook would necessarily turn to terrorism, simply wonder where this crowd lands regarding the issue.
I think there could be theoretically effective means of action (not mindless blowing up of data centres) but haven't personally given it much thought.
I should've made it clearer i was being inquisitive rather than assertive
I agree peace is the better strategy, but if data centres are the things they are using to build the thing we are feeling threatened by, then even though I do not support it, it isn't exactly "mindless" to suggest, even though it would suck for many reasons, that those data centres be destroyed.
The situation is complicated, and I don't support anti-social activity like threatening or destroying the things that other people care about, (which is what ASI builders are doing), both because I don't like that kind of behaviour, and because I think strategically it loses us supporters, but I don't feel I can fault other people for thinking about it if they are as deeply scared of misaligned ASI as I am.
I just put up a post suggesting a campaign of alarmist anti-AI lies. That's probably way more likely to be effective than terrorism, and also much more palatable.
1. It wouldn't help. Terrorism tends to have the opposite of the intended effect.
2. "False flag" types of risk realization are dumb and bad. There will be plenty of bad actors doing bad things. There's no sense muddying the water by supposedly good actors doing bad things. All we have to do is keep holding a sign saying "stop or bad things will happen," so that when bad things happen, we can rally exponential support for a sensible global treaty and AI moratorium.
You may as well propose a moratorium on the tides. The same aspects that make AI dangerous ensure that if it can be built it will be built. Incentives are too strong, potential rewards too high.
If we could prevent AI through a moratorium the USSR would have been successful in creating the New Soviet Man. That effort was doomed for precisely the same reasons a moratorium would be as well.
So for those of us who believe that development of ASI without solving the Technical Alignment problem leads us all to certain death... what would you suggest we do?
Get cracking on solving it? Given your criteria it's unclear what other possible answer there could be.
You could also hope that you're in some world out of: ASI being impossible, alignment happening by default, humans and ASIs coexisting despite lack of technical alignment for unforeseen reasons, etc. Then I'd suggest you live your life - focus on sustaining and improving the things and people around you that you can actually influence, get some exercise, have a deep belly laugh or two. The usual.
But why would you assume there's something for you to do?
I've looked at a decent amount of research on the subject, and a global moratorium on frontier AI research is not particularly hard to implement, not particularly hard to verify, and not particularly hard to enforce.
The only reasonably hard part is the political will. It is hard to get the nations of the world to cooperate and slow down the progress toward the cliff, but it's not *"alignment problem"* hard. It's just "normal problems" hard.
PauseAI has a solid theory of change that reveals that we have an actual chance to make this happen. Possible doesn't mean easy, so I expect to fail and die by default. But I wouldn't be able to live with myself if I didn't try, and lobbying and spreading public awareness of these risks is at least an order of magnitude better than any other idea for potentially useful things I could do.
I'm not cool with giving up in the middle of a cataclysmic emergency when there are still clearly things that I can do that are useful, and it isn't all that costly to just, you know, not give up.
> The only reasonably hard part is the political will.
This is the part that, due to the same incentives that warp all roads towards even dangerous AI scenarios, that I think you’ll discover will prove impossible.
Obviously it’s not _impossible_ impossible though. Good luck in your endeavors, your time is your own and windmills don’t tilt themselves, but an effective moratorium would run counter to too many interests. Including the same governments who would need to support it for an indeterminate amount of time.
Of course I may be completely wrong. As a large language model I cannot predict the future, we’ll all find out together.
If you have given up and accepted that we will all die and are choosing to try to enjoy the time you have left. I can accept that. It makes me sad but I can accept it. I may do the same someday not to long in the future. I already spend a lot more time just trying to enjoy the time I have left than I used to. But I guess I haven't completely given up.
I recall a conversation I had with a fellow mathematician, describing the difficulties of the Technical Alignment problem. They said, "Well that's more difficult that solving complex fluid dynamics systems. You can't solve that" and I said "I know but we have to try" and they said "well, maybe if we had a few hundred years" and I said "I don't think they want to give us that long".
I struggle with how much to evangelize. On the one hand, it's possible the more people agreed that AI presents an existential risk, the more likely we would mitigate that risk somehow. On the other hand, I have very little luck convincing anyone, and anyone I do convince tends to end up anxious and depressed.
1. Lots of groups people know and love did a ton of terrorism. The suffragettes, those nice ladies who wanted the vote, bombed multiple buildings with dozens of fatalities. The ANC was then, but even by todays standards, was a terrorist organization. The zionist groups Lehi and Irgun bombed hotels and killed ambassadors, and were basically indistinguishable from Hamas today (they even used schools and hospitals as bases to manufacture weapons, knowing the British would never attack those). And of course John Brown.
All of these groups' tactics had at worst a neutral effect on achieving their intended goals.
Also, by having extremes you can redefine a palatable middle. For example, in the 1960s Civil Rights era in America, the existence of Malcolm X made Martin Luther King's message magically more acceptable to the masses who would have otherwise preferred not to engage at all
What's the purpose of this parenthetical? Four words saying not to do something followed by ninety-nine words of examples justifying the thing you said not to do is confusing.
If you genuinely don't want people to do terrorism or be perceived as supporting terrorism / violence, why point out the utility of terrorism?
If you're concerned about catching a charge for inciting violence then your tiny disclaimer isn't going to count for anything, it's as futile as all those people who posted on social media that they don't consent to their data being used, but no one cares anyway.
2/ I would expect the very tech-savvy community that's worried about extinction risk could plausibly engineer a scary yet ultimately harmless demonstration of AI risk, before less sophisticated harmful actors actually got to it.
Maybe that's a recipe for actual loss of control or some terrible outcome indeed, I haven't given it much thought and am just curious where the concerned community lands.
This idea comes up every now and again in the PauseAI community. The answer always ends up being that it's a lot of a skilled labor going into something that has a high probability of backfiring, either by causing actual harm, or by harming our reputation.
There are orgs scary demonstrations, such as Redwood Research, Apollo Research, etc. We can see how people react to the scariest demos right now, and while they often constitute good evidence for people who are looking for good evidence, they are typically not salient enough yet for the average person.
Ideally, evaluations continue to catch ever-scarier things before they cause real harm, and demos constructed based on those might be enough to wake up policymakers. But in my mainline scenario, I just expect a catastrophe to occur, and I hope the first one is just bad enough to change our trajectory rather than end it.
I’ve been watching the Netflix documentary Turning Point: The Bomb and the Cold War. In episode 3 and 4 they talk about how the US believed the USSR was racing towards building more nukes but they actually weren’t. But politicians like JFK campaigned on the Missile Gap and won popular support for it.
I worry Vance is running a similar playbook. Seems to be pretty effective.
Sci-fi fears do not merit global authoritarianism. Thank goodness that you and the other de-cels have been completely and totally defeated. That would have been the darkest timeline
The Sci-Fi fears of the majority of AI researchers, most of the CEOs and engineers who are creating these systems, Turing Award winners, and Nobel Prize laureates? Sci-Fi fears based on thinking from first principles about the nature of intelligence and goals, and rejecting anthropomorphization? Sci-Fi fears that have correctly predicted the behaviors of current AI systems before they were built?
The state of the science is that existential risk from AI is real. To say otherwise is outright science denial.
False vacuum collapse is "scientifically real". It does not mean it is a legitimate threat with a probability larger than zero. All I am saying is your argument is pretty weak, you can't really elevate the evidence free opinions of some esteemed scientists to Science itself. You need peer reviewed empirical evidence in order to do that.
Some of the replications of "self replication" and "power seeking" with models told to do so in their role playing and explicitly given the means do provide some empirical evidence. But because we do not actually have superintelligence or even AGI yet there is crucial missing evidence to establish there is any existential risk greater than zero.
That evidence is things like that super hacking, super persuasion, hidden cognition or self goals, super coordinations - all this is speculation. There is no evidence for any of it.
A reasonable government choice at this time would be to act ignoring any of the claimed risks that have no evidence to seek the known benefits before China does.
Very ironically what Vance is calling for is what any "live player" in this situation would do.
This misunderstands what evidence is and what risk is.
There is overwhelming evidence that the risk of human extinction from AI is real. That doesn't mean we are definitely going to go extinct. It does mean that human extinction is a live possibility that cannot be dismissed out of hand.
Most people aren't aware of or don't understand the technical evidence that shows how screwed we might be, which is why it's useful and important to show the opinions of experts.
If the majority of the most credible people on the topic say we may be in extreme danger, then it is deeply stupid to assume with no evidence that we are absolutely not in any danger. Not having observed your own demise yet does not mean you are immortal.
Even if you think all of the people who have thought deeply on the topic are worried over nothing, the correct choice to make is to proceed with caution, under the assumption that you could be wrong. Failure to recognize lack of information as lack of danger is a good way to fall into a giant pit, especially when surrounded by terrain experts who say there might be a giant pit over there.
The experts have no evidence. Evidence in the scientific sense are empirical observations from the world that are observer independent.
So far there is nothing.
In any case the acceleration argument that apparently the US, French, and Chinese governments all agree to - which kind of makes this matter moot, this is how we are going to proceed - is that future threats can only be dealt with by staying on the leading edge of technology.
What you call "caution" an acceleration advocate calls "destructive and treasonous bureaucratic delays".
You cannot defend yourself against the development of machine guns by sternly worded letters and deciding to limit your military to 100 rounds/minute. If you think future enemies may be able to develop self replicating robot swarms, send out super persuasion messages, develop hostile bioweapons, and so on, you had better develop the same tech so that you can both develop defenses and use it against the enemy first.
Anyways that's the argument and I am explaining why it is a reasonable course of action if you think AI can potentially be controlled, which experts like Ryan Greenblatt seem to concur on.
And well all the AI lab people seem to believe this as well.
You are suggesting that science does not use models. We use models to predict how things will occur in similar situations. Drop a ball 100 times and it was a different situation each time. It was the model you fit over it with the idea of position and time that allows you to examine a theory of gravity.
Likewise there is a great deal of evidence of the claim "capabilities increase with the prosaic scaling method", and the claim "language models trained with RLHF behave sycophantically, not truly aligned". There are many claims which we do have evidence for which lead many of the experts to believe that there is substantial risk.
You are ignoring the evidence, but that does not mean there is none.
> You cannot defend yourself against the development of machine guns by sternly worded letters
You've broken my ability to take you seriously. The pen is mightier than the sword... didn't you know?
Ryan Greenblatt seems to believe we can learn to safely manage AI, that is not the same as believing we already know how to do it.
People at AI labs are constantly speaking out about it and leaving because of it. You should be saying "All the employees at the lab who haven't quit out of protest want to continue" which doesn't even seem to be true, I'm sure there are still more who will continue to quit.
(1) Nobody cares if language models are "aligned" that's not even what the training is optimizing for
(2). Scaling cures have logarithms in them. There is zero existential risk from these alone because it is impossible to build computers large enough. (Now yes, if algorithm improvements, which have been 150x in 2 years, were projected forward at some point the cost of compute is near zero for superhuman intelligence. That would be bad but is not likely physically possible)
(3) People rage quitting AI labs after collecting millions isn't much evidence of anything. It's fashionable for people to rage quit bay area elite jobs for any reason or no reason.
I'm curious about something: Altman has said that he expects GPT5 to be smarter than he is. Has he ever explained how he expects to control it? Even if he types in a prompt which sets its top level goal (and this is vastly more indirect than hard-coding the top level of an ordinary program), how does he expect to control the sub-goals of a process that is more intelligent than he is, and which can presumably anticipate _his_ moves.
It is no mark of shame to realize you are wrong. I hope you will someday admit to yourself that you may not know what is true, and join us who are trying to figure it out.
I see these posts every few weeks and I simply don't understand. How is this in any way a real risk? I tried gemini thinking for a simple task last week and it couldn't count words in a sentence up to six. The idea that they will self improve and think for themselves and take over the world within months is, to me, ludicrous, inconceivable, laughable. Anyone who says so immediately looks like an idiot. They're still practically useless! I cannot any more directly state how it all looks to me.
If you want to convince anyone the house is on fire and all these other ridiculous metaphors there must be a better way to educate or impress on someone the gravity and the reality of it because to me it almost looks like a joke. The only reason I can tell you're not joking in fact is how often you repeat yourself, which if you were joking would have stopped being funny.
Your experience does not match my experience. I'm using Claude and have used OpenAI models, and found them to be helpful at explaining quite complex concepts. Of course I didn't trust the output, but on double checking with other sources or people I know who understand the topics I was asking the AI about, I found it to be generally correct. Also relevant: the problems I did find in 2023 are mostly gone now.
I think one part of the danger is specifically that they don't think. If the instruction is to build as many paperclips as possible, the AI might just shrug and do that much like a poorly written for loop will keep going until your computer runs out of memory. But that runs into the issue of getting physical resources seems hard to work around.
I agree with you that the danger of a leap from LLM to a super intelligence that can act on its own is a much harder sell.
I do think that Zvi is mostly speaking to an already convinced audience which is reflected in his frustration that he hasn't had success talking outside of a self-selected audience. There's a distinct danger in having the mentality that he has said he has that his people (meaning the LessWrong crowd) will just downvote and ignore anyone who doesn't think like them. It's a very insular approach.
Re: Zvi speaking to an already convinced audience. Agreed and I’m starting to think the strategy has to shift to getting someone persuasive on a normie podcast circuit. Bannon, Rogan, countless others. I watch lots of debates on YouTube but it’s like Liv Boeree talking to Scott Aaronson. Fascinating but not reaching pockets that could meme/go viral with normies.
We went from "struggling with 2 digits addition" to "PhD level in mathematics" in four years. How many more years do you think we need to wait before we make most mathematicians obsolete ?
Show me phd level mathematics. that would be revolutionary. All the tests I've seen they've been given the answers in training or taken the best one of 10k submissionsor whatever. Sounds like inflated BS to me.
For a technical overview of ASI risk, I recommend AI Safety From First Principles (https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ), which was written a few years ago by a researcher at OpenAI. The site at https://aisafety.info/ offers a somewhat less technical introduction, with responses to a lot of common objections.
Obviously, current chatbots aren't a risk. However, the goal of frontier labs is to create agentic AGI- AI that can reason and act in the world like a person can- and there are good reasons to believe that they're on track to achieve that within the next few years. These labs are focusing most of their effort on developing AI agents right now, and reasoning and long time horizon benchmarks are showing a very clear trend with no indication of leveling off at a human level.
The "count the words in this sentence" task is a good example- older models like the original ChatGPT weren't able to do that because they lacked training in chain of thought. It was like asking a person to immediately give an answer to the question without giving them time to mentally count the words. Newer models like o3, however, can answer counting questions like that easily, since they're able to apply the same sort of thinking to the problem that a human would.
That's one example of progress on a very simple task, but the fact is, we've been seeing very steady progress on every single task and benchmark we throw at these things for a decade. Furthermore, the line between powerful AI and AGI is extremely fuzzy- current AI is very general and a bit agentic; humans are somewhat more so. If AI continues to climb the reasoning and agency benchmarks at the current rate, it will pass humans and keep going- and it'll do so within a couple of years.
When we say that we're on track for AGI soon, that isn't wild speculation about a science fiction breakthrough- it's taking a very clear trend in the data seriously.
Best available model, or close to it, probably prompted with a fair level of skill, still has some problems (bad at doing inline citations, for example), but plausibly speeds up text-based research or knowledge work by a lot.
Operating on the assumption that Gemini can't count words, even though that surprises me based on the capabilities I've seen: If a human can't correctly count a handful of words we guess that the human has something deeply wrong with them and probably can't do much of anything. But these things are not human, and the fact that they stumble at times on things humans can do, shouldn't lead you to firmly draw the same conclusions about their abilities that you would for a human with the same deficits. Apparently they can do the majority of the writing for a thesis, while struggling with formatting inline references or choosing reliable sources. A machine that can do that, is more concerning than "it failed at word-counting" with no mention of what it can do, would lead most people to suppose. Also of note: When I started with ChatGPT 3.5 in 2023, it could barely do a passable short poem, and that was considered a significant advance over prior models. The rate of change is fast.
On your broader point, I really hope you're right and progress stalls out.
Regarding the specific difficulty in counting words or letters: Current LLMs process tokens, not letters or words. You can search for a tokenizer online to see it in action. It would be trivial for the AI companies to synthesize training data for counting words or letters for all the possible tokens, but I'm guessing that's not a priority. Far better that the LLM is proficient using tools like python that can already solved the problem of counting.
I'm broadly worried because I'm watching the Software Engineering Verified benchmark. Progress with autonomous code development worries me as a programmer and human: https://www.swebench.com/#verified.
Are there any benchmarks or tasks you're watching as a fire alarm?
It sure feels like we’re experiencing some sort of take-off.
The release of DeepSeek R1 revealed that my personal willingness to pay curve is such that you can charge me literally 10x as much per token, and as long as you’re giving me better tokens I will buy more of them at the higher price. The set of questions I have that I would be willing to pay for answers to contains a lot of stuff that is out of reach of current frontier models, but looks likely to be within reach very soon.
The real problem here is that AI safety feels completely theoretical right now. Climate folks can at least point to hurricanes and wildfires (even if connecting those dots requires some fancy statistical footwork). But AI safety advocates are stuck making arguments about hypothetical future scenarios that sound like sci-fi to most people. It's hard to build political momentum around "trust us, this could be really bad, look at this scenario I wrote that will remind you of a James Cameron movie"
Here's the thing though - the e/acc crowd might accidentally end up doing AI safety advocates a huge favor. They want to race ahead with AI development, no guardrails, full speed ahead. That could actually force the issue. Once AI starts really replacing human workers - not just a few translators here and there, but entire professions getting automated away - suddenly everyone's going to start paying attention. Nothing gets politicians moving like angry constituents who just lost their jobs.
Here's a wild thought: instead of focusing on theoretical safety frameworks that nobody seems to care about, maybe we should be working on dramatically accelerating workplace automation. Build the systems that will make it crystal clear just how transformative AI can be. It feels counterintuitive - like we're playing into the e/acc playbook. But like extreme weather events create space to talk about carbon emissions, widespread job displacement could finally get people to take AI governance seriously. The trick is making sure this wake-up call happens before it's too late to do anything about the bigger risks lurking around the corner.
One aspect I think is under discussed is how issues only become politically salient at the moment they become polarized. And it’s clear now that the Dem side is safety and Rep is /acc. But the central task is turning Rep to the x-risk side since they control everything. Musk has a past of caring about this, but who knows where his head is at. Steve Bannons anti tech populism is the closest imo. How do we convince MAGA that caring is owning the libs? “Sama is a libtard who wants to put you out of work”? Any ideas?
You're onto something with the Silicon Valley angle. Here's a potential framing of the issue to make it salient to the Republican/MAGA movement:
"Look at who's building these systems - overwhelmingly coastal elites who've never worked a real job in their lives. And they're not even trying to hide their contempt anymore. Just look at Sam Altman talking about UBI - his solution isn't to help workers adapt to AI, it's to put them on permanent welfare.
The economics make it even more obvious. These companies have spent billions on data centers and training. They're not going to recoup that by selling "AI assistants" that make workers more productive. The math only works if they can replace those workers entirely. The whole "AI will augment, not replace" line is pure PR. They know exactly what they're building - systems to automate away jobs while concentrating wealth in Silicon Valley. The "safe and ethical AI" narrative is just cover for this wealth transfer from working Americans to tech billionaires.
The real question is: do we want these decisions about the future of work to be made by a handful of tech bros who think they know what's best for everyone else? Because right now, that's exactly what's happening."
-Coastal elites = equivalent to 'unelected bureaucrats/academics', the usual villains of MAGA
-Welfare is coded as contemptible, against human dignity in MAGA
Great comment. And there is potentially an opening with Musk hating Altman. Vance is a RINO/loves Sama and elite libs etc. obviously the MAGA base and Silicon Valley is an unholy, unstable alliance that could be exploited.
I’m not sure about the Dem side being safety. Biden did make some moves toward that, but the liberal people I talk to just think AI is all hype and a scam.
I suppose if you wish GOP constituents were concerned with AI, you could ask which jobs popular with GOP constituents are especially vulnerable to automation.
If you want to convince MAGA you have to understand MAGA. In particular, you have to understand why MAGA has some trust issues. And no, the central insight is not really "owning the libs". That’s a surface level reading, not the core point.
The MAGA narrative around those exhibits is : for years academics, media and governments have colluded to use the prestige of "Science" as an excuse to push radical left policies like BLM/reparations & co (exhibit 1) and dismantling capitalism and degrowth (exhibit 2).
Is it true ? Well, it does not really matters, what matters is their perception. You’re not going to convince them that they’ve been wrong on this point. And let me defend some parts of it (not all the way, but some core part), not to convince you, but to show that is it not crazy paranoia, there is some non-crazy basis to it.
Exhibit one is mostly centrally true : the double-standard is pretty hard to deny. I’m pretty sure our host will confirm that the double-standard was indeed crazy at the time.
Exhibit two is kinda true too. Yes, let’s not kid anyone, climate change is real, but the constant rejection of any form of geoengineering because "techno-solutionism is not a solution" is pretty much admission that a good part of the climate change narrative is as a rhetorical and political weapon, not the problem itself.
What problem do we have ?
Well, AI safety pattern-matches pretty nicely those two exhibits. Media and Scientists warns us Very Bad Things Will Happen if we do not immediately give More Power to the Government. Fool me once…
How do we fix this ?
No clue. Only some small pieces of advice. They won’t be sufficient.
Excise from the discourse *anything* that is remotely left-coded, like "sociopathic greedy corporations". Looking at you, "The Compendium". Yes, I know what meaning you are ascribing to those words, and I agree with that literal meaning. There will be maybe 50 people in the world who will understand the actual meaning you want to convey. The rest of the world will understand it as "another anti-capitalism manifesto". 99% of the Compendium is gold. The 1% left means that sharing it anywhere close to the right means burning trust/credibility points as hard as you can.
If you can bring yourself to it, concur with the above concern. Start with "Yes, I know climate change was mostly used as an excuse, but…". I don’t think I can endorse lying and deception, but… if you have no moral qualms to that, maybe disguise yourself as a climate denier ?
Disparage (as much as you can bring yourself to) academia, experts and especially the media, on everything topic other than AI. It does not have to be a lie if you have been following (and agreeing with) "Bounded Distrust". Read and reread Bounded Distrust, make it part of yourself, be able to talk its most important talking points. If you can genuinely agree with most of what is said in Bounded Distrust, and be able to defend and promote it ? Then you possibly have enough common ground with MAGA to talk a common language with them.
Do not lean strongly into "Experts and Science and Scientists say". It is self-defeating. Try the approach "the bold amateurs of LessWrong lead by the iconoclast libertarian-adjacent Yudkowsky". LessWrong is better than arxiv which is better than Nature which is better than legible and serious-sounding stuff like an "International AI Safety Report". That last one will be received in a complete unambiguous way as "I am bullshiting and manipulating you".
And that’s all I have. I did say it wouldn’t be sufficient.
Correct, but too kind. You need to make that phrase: "yes, I know that climate change was mostly used as an excuse for policies that made _everything_ including the environment worse, but"
I sometimes feel this is the strategy Sam Altman is trying to employ. Before ChatGPT I couldn't even talk publicly about AI X-Risk. It seems like OpenAI, in kicking off the AGI capabilities race, has done more for my ability to talk with people about this issue than any other organization.
No I don’t think he wants to end the world. But I think he’s pulled by the possibility of pioneering one of the most significant human inventions of all time, and that makes it worth rolling the dice for him personally.
A sociopath who doesn't care that he's playing dice with all of our futures. Words cannot express my hate, if that is a true representation of his motives.
The logic goes: I want to have the biggest impact on the future possible - to be the most important person possible. The biggest impact on the future possible is to build an AI that eats the reachable universe, and reshapes it in a form that was decided (albeit without any real control) by the training I performed. Or to quote Harry Potter:
"After all, He Who Must Not Be Named did great things – terrible, yes, but great."
Yes, this is horrifyingly amoral. It's a real mindset, though, if thankfully a rare one (not even everyone at the frontier labs has this mindset, despite those with it being strongly drawn to those labs).
It's possible that Altman honestly thinks that humanity's best chance is him decisively winning the AI race, although the departure of all his leading safety staff indicates that at best, they're not confident in that strategy.
It does seem like accelerating the replacement of jobs as early as possible (as long as possible before ASI) could both make the fire alarm louder for those who don't hear it yet, and unify a coalition for a pause and regulations. I'm more optimistic than Zvi that preventing long-term job loss and preventing x-risk go hand in hand.
I don’t think you even need to wait for that to happen to message it. It’s clearly plausible and Sama is literally saying on his blog that his tech is going to displace jobs and instead he will give you an allowance. It’s convincing.
This is exactly the idea behind "early AGI before a compute overhang".
If we could have AGI say next week, and let's assume optimizations for performance do have a reasonable floor. (If there is no floor I guess we never deserved to live, frankly speaking, nature abhors a vacuum)
So at the floor, every running instance of AGI consumes say 10 B200s, or about 6 kilowatts of power and about $500,000 in equipment.
It's going to be real hard for any AGI to escape or rebel. We will have pretty tight control over capital that is this expensive and in finite, limited facilities.
But hopefully early on such AGI will TRY it. We will PROVE it's plotting against us, that different AGI models from different companies naturally team up against the humans, that current cyber security is swiss cheese. All of the things that doomers have claimed for years as risks.
Or find empirically is the opposite and our actual problems are something completely different.
I think the best way of understanding these “safety policies” is that they are not commitments. They are internal communications from the company leadership, to the individuals who are actually building and releasing products. They can change at any time.
It’s really just a reflection of the current feelings of the company, rather than any sort of binding agreement.
All agreements are like that, just that they may be embedded into some other social context where repercussions are expected. Legal repercussions are not the only repercussions that exist.
One thing catches my worry in how I proceed not giving up. It's become clear that my efforts on Technical Alignment will not be allowed time to come to fruition, and so I must turn my attention to the strategic situation.
I find it very difficult to believe that world leaders are so myopic and uninspired that they cannot recognize the risks, but I can imagine them having an awareness of controlling the public. The public will not look pleasantly on efforts to resist what they see as sci-fi delusions while they struggle with their real everyday problems. If this is how leaders see things, than moving discussion of safety offstage would allow them to pursue it without public interference. If such an attempt was being made, then it would be counterproductive for me to continue trying to popularize the issue.
However, it could very well be that the dynamics of human political power structures do not select for those with deep insight into technical possibilities. I am caught in two worlds here. It seems blindingly obvious both that our leaders are idiots, and that they cannot possibly be idiots. But in the world where they in fact are, it continues to be of desperate importance that we succeed in popularizing this issue and making it one that can no longer be ignored.
"It seems blindingly obvious both that our leaders are idiots, and that they cannot possibly be idiots."
They do vary.
I will always remember Trump as the president who recommended looking into injecting disinfectant. Trump's depth of common sense appears remarkably shallow.
In contrast, Musk has led his teams at both Tesla and SpaceX to very impressive achievements. Even the massive downsizing at Twitter/X seems to have been done intelligently enough that users aren't fleeing for reasons of technical failures.
This reminds me: There was a comment that I made a week ago, in agreement with the
( Admittedly I want to _see_ AGI, and what I'm about to write goes against this... )
"Anyway, if I were trying to pitch this paper review to someone, it would be Elon Musk"
Yes. In addition to your other points, note that Musk's motivation for his Mars project is that he wants human _biological_ survival. Along similar lines, he is pro-natalist, has a bunch of kids himself, and opposed the Stargate project (though on other grounds).
I don't think an AI pause is in the cards, but, if I were to pick the single person both plausibly amenable to that and with enough power to make a difference, it would be Musk.
[But, as we discussed in ACX, USA/PRC competition is baked in today, and an AI arms control agreement would be unverifiable, so even if Musk completely agreed and convinced Trump, the USA is not a free actor at this point.]
A treaty is not the end of competition, but the agreement to work together to figure out how to end competition. We need to work together to create a world where both China and USA can be free actors.
I agree Musk has talked about ASI x-risk. I don't know his beliefs about many things, but it does seem at least plausible that he would be in favour of an AGI capabilities pause while we solve the Technical Alignment problem.
Many Thanks! As we discussed back in ACX, I don't think that a treaty or a pause will work. As I see it, at best arms control treaties work for highly visible things like aircraft carriers or massive reactors where cheating is readily detectable. AI development is nothing like that: microprocessors, memory chips, data, and software. Even the data centers are basically scaled up office equipment.
_Usually_ arms control treaties wind up being unverifiable, and basically dead letter laws
Best of luck with Musk (ok, I'm ambivalent about this since I want to _see_ AGI within my lifetime). He does seem to be your best bet.
Many Thanks! I also want to see AGI within my lifetime, and ASI too, in fact. But I also want my lifetime to be long and enjoyable, not short and full of propaganda that confuses me before I am killed, which is what seems likely if we build misaligned ASI.
As I have said and will continue to say, I do not think it will be easy, only that it is necessary, and we should all work together on it.
I must say, as a French, that trusting Macron for anything important has always been misguided. That man is the most arrogant of all the world leaders - yeah, more than Trump - and only care about stroking his own ego.
This is a really tough issue. I feel like we must remain civil and honourable, but I can understand that many are feeling like we are being backed into a corner and dialogue is failing. I wish I had the faith that Nathan Metzger professes.
This would be about the worst possible thing to happen for the AI safety cause w/r/t public opinion.
Related to Forrest comment there, how does the "doomer" position not turn to terrorism?
At least in a relatively benign form of "mock-terrorism" where you get AI agents to "almost commit" acts of terrorism (like a gun firing a "BANG" placard) ?
Whoa, chill. Property destruction isn't terrorism. No carbon-based life form shall be harmed.
Really? I think the Weather Underground blowing up government buildings, even ones empty of people, was terrorism.
Who said anything about bombs? Seriously, chill out. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher. You have to get pretty creative to hurt a person with foam, salt water, or magnets.
You can't promise that. You can't even promise no human life would be harmed short of storming a building and forcing the staff out of it. Even the most thoroughly vetted applications of violence can have unintended consequences. Not to mention you'd have to do this on pretty much every data center at once to avoid beefed up security hampering your efforts and that makes the operation that much more difficult to pull off.
As for your contention that property destruction not being terrorism that's just absurd. Burning crosses in people's yards was absolutely terrorism. There's no widely accepted definition of terrorism that excludes property damage if it's done in the aim of furthering an ideology.
Agreed, and it is not only "beefed up security" that would happen. AI work in general, and data servers in particular are currently visible. Even with the electricity and cooling needs for the data centers (let alone the offices where the AI researchers work), actively hiding them is not a big stretch. It isn't as if racks of electronics were radioactive. If they are threatened, we will stop seeing where they are.
I am under the impression that we have already stopped seeing where they are, which was the impetus for my request for their location in the first place.
Many Thanks! I hadn't know about that. Well, it is a reasonable precaution to take if there is a chance of hostile action towards them.
I think commenters are jumping to very violent conclusions. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher.
Usually when people talk about terrorism, the purpose is to terrorize people. Your cross burning analogy fails because it was understood as a threat- Get out, or the Klan will kill you and your family. My modest proposal of using salt water and a hose does not put anyone in physical danger. It doesn't threaten their safety. It's not a threat, and if you can call any property destruction for ideological reasons terrorism, then football hooligans often become terrorists whenever their team wins or loses, which seems absurd.
I personally have long opposed the term "terrorism" gaining more and more new meanings, because the state loves to use it to jack up the penalties for crimes.
> I think commenters are jumping to very violent conclusions. There are many ways to physically shut off computers without hurting humans. Magnets will do the trick. As will a hose and some salt water, or the wrong type of fire extinguisher.
And when the armed guards hired to protect the billion dollar facility object to you rolling in with your electromagnets or dozens of PKP extinguishers?
What makes you think I don't already work for ArmedGuardEvilCorp? Use some imagination! Don't just jump to the most violent option.
Shouldn’t you primarily worry about actual terrorists with proven intentions getting access to that same gun?
And illegal, unilateral direct action against datacenters is obviously not a promising strategy for stopping the dynamic described in the post. Such a list would at most be a preparation for the case when a public mandate for action appears.
But all that was said a 100 times already and still the same questions keep being asked.
I may have given off the wrong impression, I don't mean to imply anyone with the most pessimistic outlook would necessarily turn to terrorism, simply wonder where this crowd lands regarding the issue.
I think there could be theoretically effective means of action (not mindless blowing up of data centres) but haven't personally given it much thought.
I should've made it clearer i was being inquisitive rather than assertive
I agree peace is the better strategy, but if data centres are the things they are using to build the thing we are feeling threatened by, then even though I do not support it, it isn't exactly "mindless" to suggest, even though it would suck for many reasons, that those data centres be destroyed.
The situation is complicated, and I don't support anti-social activity like threatening or destroying the things that other people care about, (which is what ASI builders are doing), both because I don't like that kind of behaviour, and because I think strategically it loses us supporters, but I don't feel I can fault other people for thinking about it if they are as deeply scared of misaligned ASI as I am.
I just put up a post suggesting a campaign of alarmist anti-AI lies. That's probably way more likely to be effective than terrorism, and also much more palatable.
I certainly hope this doesn't happen.
1. It wouldn't help. Terrorism tends to have the opposite of the intended effect.
2. "False flag" types of risk realization are dumb and bad. There will be plenty of bad actors doing bad things. There's no sense muddying the water by supposedly good actors doing bad things. All we have to do is keep holding a sign saying "stop or bad things will happen," so that when bad things happen, we can rally exponential support for a sensible global treaty and AI moratorium.
You may as well propose a moratorium on the tides. The same aspects that make AI dangerous ensure that if it can be built it will be built. Incentives are too strong, potential rewards too high.
If we could prevent AI through a moratorium the USSR would have been successful in creating the New Soviet Man. That effort was doomed for precisely the same reasons a moratorium would be as well.
So for those of us who believe that development of ASI without solving the Technical Alignment problem leads us all to certain death... what would you suggest we do?
Get cracking on solving it? Given your criteria it's unclear what other possible answer there could be.
You could also hope that you're in some world out of: ASI being impossible, alignment happening by default, humans and ASIs coexisting despite lack of technical alignment for unforeseen reasons, etc. Then I'd suggest you live your life - focus on sustaining and improving the things and people around you that you can actually influence, get some exercise, have a deep belly laugh or two. The usual.
But why would you assume there's something for you to do?
I've looked at a decent amount of research on the subject, and a global moratorium on frontier AI research is not particularly hard to implement, not particularly hard to verify, and not particularly hard to enforce.
The only reasonably hard part is the political will. It is hard to get the nations of the world to cooperate and slow down the progress toward the cliff, but it's not *"alignment problem"* hard. It's just "normal problems" hard.
PauseAI has a solid theory of change that reveals that we have an actual chance to make this happen. Possible doesn't mean easy, so I expect to fail and die by default. But I wouldn't be able to live with myself if I didn't try, and lobbying and spreading public awareness of these risks is at least an order of magnitude better than any other idea for potentially useful things I could do.
I'm not cool with giving up in the middle of a cataclysmic emergency when there are still clearly things that I can do that are useful, and it isn't all that costly to just, you know, not give up.
> The only reasonably hard part is the political will.
This is the part that, due to the same incentives that warp all roads towards even dangerous AI scenarios, that I think you’ll discover will prove impossible.
Obviously it’s not _impossible_ impossible though. Good luck in your endeavors, your time is your own and windmills don’t tilt themselves, but an effective moratorium would run counter to too many interests. Including the same governments who would need to support it for an indeterminate amount of time.
Of course I may be completely wrong. As a large language model I cannot predict the future, we’ll all find out together.
If you have given up and accepted that we will all die and are choosing to try to enjoy the time you have left. I can accept that. It makes me sad but I can accept it. I may do the same someday not to long in the future. I already spend a lot more time just trying to enjoy the time I have left than I used to. But I guess I haven't completely given up.
I recall a conversation I had with a fellow mathematician, describing the difficulties of the Technical Alignment problem. They said, "Well that's more difficult that solving complex fluid dynamics systems. You can't solve that" and I said "I know but we have to try" and they said "well, maybe if we had a few hundred years" and I said "I don't think they want to give us that long".
I struggle with how much to evangelize. On the one hand, it's possible the more people agreed that AI presents an existential risk, the more likely we would mitigate that risk somehow. On the other hand, I have very little luck convincing anyone, and anyone I do convince tends to end up anxious and depressed.
(please don't do terrorism)
1. Lots of groups people know and love did a ton of terrorism. The suffragettes, those nice ladies who wanted the vote, bombed multiple buildings with dozens of fatalities. The ANC was then, but even by todays standards, was a terrorist organization. The zionist groups Lehi and Irgun bombed hotels and killed ambassadors, and were basically indistinguishable from Hamas today (they even used schools and hospitals as bases to manufacture weapons, knowing the British would never attack those). And of course John Brown.
All of these groups' tactics had at worst a neutral effect on achieving their intended goals.
Yes, exactly. Terrorists, freedom fighters, etc.
Also, by having extremes you can redefine a palatable middle. For example, in the 1960s Civil Rights era in America, the existence of Malcolm X made Martin Luther King's message magically more acceptable to the masses who would have otherwise preferred not to engage at all
> (please don't do terrorism)
What's the purpose of this parenthetical? Four words saying not to do something followed by ninety-nine words of examples justifying the thing you said not to do is confusing.
If you genuinely don't want people to do terrorism or be perceived as supporting terrorism / violence, why point out the utility of terrorism?
If you're concerned about catching a charge for inciting violence then your tiny disclaimer isn't going to count for anything, it's as futile as all those people who posted on social media that they don't consent to their data being used, but no one cares anyway.
2/ I would expect the very tech-savvy community that's worried about extinction risk could plausibly engineer a scary yet ultimately harmless demonstration of AI risk, before less sophisticated harmful actors actually got to it.
Maybe that's a recipe for actual loss of control or some terrible outcome indeed, I haven't given it much thought and am just curious where the concerned community lands.
This idea comes up every now and again in the PauseAI community. The answer always ends up being that it's a lot of a skilled labor going into something that has a high probability of backfiring, either by causing actual harm, or by harming our reputation.
There are orgs scary demonstrations, such as Redwood Research, Apollo Research, etc. We can see how people react to the scariest demos right now, and while they often constitute good evidence for people who are looking for good evidence, they are typically not salient enough yet for the average person.
Ideally, evaluations continue to catch ever-scarier things before they cause real harm, and demos constructed based on those might be enough to wake up policymakers. But in my mainline scenario, I just expect a catastrophe to occur, and I hope the first one is just bad enough to change our trajectory rather than end it.
Thanks, that's helpful
and Don't Worry About The Vase
I’ve been watching the Netflix documentary Turning Point: The Bomb and the Cold War. In episode 3 and 4 they talk about how the US believed the USSR was racing towards building more nukes but they actually weren’t. But politicians like JFK campaigned on the Missile Gap and won popular support for it.
I worry Vance is running a similar playbook. Seems to be pretty effective.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/the-paris-ai-anti-safety-summit?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
Love to see it
Sci-fi fears do not merit global authoritarianism. Thank goodness that you and the other de-cels have been completely and totally defeated. That would have been the darkest timeline
The Sci-Fi fears of the majority of AI researchers, most of the CEOs and engineers who are creating these systems, Turing Award winners, and Nobel Prize laureates? Sci-Fi fears based on thinking from first principles about the nature of intelligence and goals, and rejecting anthropomorphization? Sci-Fi fears that have correctly predicted the behaviors of current AI systems before they were built?
The state of the science is that existential risk from AI is real. To say otherwise is outright science denial.
The person you're responding to is generally into science denial on a variety of topics, so I don't think that argument will work on him.
False vacuum collapse is "scientifically real". It does not mean it is a legitimate threat with a probability larger than zero. All I am saying is your argument is pretty weak, you can't really elevate the evidence free opinions of some esteemed scientists to Science itself. You need peer reviewed empirical evidence in order to do that.
Some of the replications of "self replication" and "power seeking" with models told to do so in their role playing and explicitly given the means do provide some empirical evidence. But because we do not actually have superintelligence or even AGI yet there is crucial missing evidence to establish there is any existential risk greater than zero.
That evidence is things like that super hacking, super persuasion, hidden cognition or self goals, super coordinations - all this is speculation. There is no evidence for any of it.
A reasonable government choice at this time would be to act ignoring any of the claimed risks that have no evidence to seek the known benefits before China does.
Very ironically what Vance is calling for is what any "live player" in this situation would do.
This misunderstands what evidence is and what risk is.
There is overwhelming evidence that the risk of human extinction from AI is real. That doesn't mean we are definitely going to go extinct. It does mean that human extinction is a live possibility that cannot be dismissed out of hand.
Most people aren't aware of or don't understand the technical evidence that shows how screwed we might be, which is why it's useful and important to show the opinions of experts.
If the majority of the most credible people on the topic say we may be in extreme danger, then it is deeply stupid to assume with no evidence that we are absolutely not in any danger. Not having observed your own demise yet does not mean you are immortal.
Even if you think all of the people who have thought deeply on the topic are worried over nothing, the correct choice to make is to proceed with caution, under the assumption that you could be wrong. Failure to recognize lack of information as lack of danger is a good way to fall into a giant pit, especially when surrounded by terrain experts who say there might be a giant pit over there.
The experts have no evidence. Evidence in the scientific sense are empirical observations from the world that are observer independent.
So far there is nothing.
In any case the acceleration argument that apparently the US, French, and Chinese governments all agree to - which kind of makes this matter moot, this is how we are going to proceed - is that future threats can only be dealt with by staying on the leading edge of technology.
What you call "caution" an acceleration advocate calls "destructive and treasonous bureaucratic delays".
You cannot defend yourself against the development of machine guns by sternly worded letters and deciding to limit your military to 100 rounds/minute. If you think future enemies may be able to develop self replicating robot swarms, send out super persuasion messages, develop hostile bioweapons, and so on, you had better develop the same tech so that you can both develop defenses and use it against the enemy first.
Anyways that's the argument and I am explaining why it is a reasonable course of action if you think AI can potentially be controlled, which experts like Ryan Greenblatt seem to concur on.
And well all the AI lab people seem to believe this as well.
You are suggesting that science does not use models. We use models to predict how things will occur in similar situations. Drop a ball 100 times and it was a different situation each time. It was the model you fit over it with the idea of position and time that allows you to examine a theory of gravity.
Likewise there is a great deal of evidence of the claim "capabilities increase with the prosaic scaling method", and the claim "language models trained with RLHF behave sycophantically, not truly aligned". There are many claims which we do have evidence for which lead many of the experts to believe that there is substantial risk.
You are ignoring the evidence, but that does not mean there is none.
> You cannot defend yourself against the development of machine guns by sternly worded letters
You've broken my ability to take you seriously. The pen is mightier than the sword... didn't you know?
Ryan Greenblatt seems to believe we can learn to safely manage AI, that is not the same as believing we already know how to do it.
People at AI labs are constantly speaking out about it and leaving because of it. You should be saying "All the employees at the lab who haven't quit out of protest want to continue" which doesn't even seem to be true, I'm sure there are still more who will continue to quit.
A model needs evidence to back it.
(1) Nobody cares if language models are "aligned" that's not even what the training is optimizing for
(2). Scaling cures have logarithms in them. There is zero existential risk from these alone because it is impossible to build computers large enough. (Now yes, if algorithm improvements, which have been 150x in 2 years, were projected forward at some point the cost of compute is near zero for superhuman intelligence. That would be bad but is not likely physically possible)
(3) People rage quitting AI labs after collecting millions isn't much evidence of anything. It's fashionable for people to rage quit bay area elite jobs for any reason or no reason.
I'm curious about something: Altman has said that he expects GPT5 to be smarter than he is. Has he ever explained how he expects to control it? Even if he types in a prompt which sets its top level goal (and this is vastly more indirect than hard-coding the top level of an ordinary program), how does he expect to control the sub-goals of a process that is more intelligent than he is, and which can presumably anticipate _his_ moves.
It is no mark of shame to realize you are wrong. I hope you will someday admit to yourself that you may not know what is true, and join us who are trying to figure it out.
I see these posts every few weeks and I simply don't understand. How is this in any way a real risk? I tried gemini thinking for a simple task last week and it couldn't count words in a sentence up to six. The idea that they will self improve and think for themselves and take over the world within months is, to me, ludicrous, inconceivable, laughable. Anyone who says so immediately looks like an idiot. They're still practically useless! I cannot any more directly state how it all looks to me.
If you want to convince anyone the house is on fire and all these other ridiculous metaphors there must be a better way to educate or impress on someone the gravity and the reality of it because to me it almost looks like a joke. The only reason I can tell you're not joking in fact is how often you repeat yourself, which if you were joking would have stopped being funny.
Your experience does not match my experience. I'm using Claude and have used OpenAI models, and found them to be helpful at explaining quite complex concepts. Of course I didn't trust the output, but on double checking with other sources or people I know who understand the topics I was asking the AI about, I found it to be generally correct. Also relevant: the problems I did find in 2023 are mostly gone now.
i used to think this but these things have gotten better, if you pay for the reasoning models i've been pleasantly surprised.
I think one part of the danger is specifically that they don't think. If the instruction is to build as many paperclips as possible, the AI might just shrug and do that much like a poorly written for loop will keep going until your computer runs out of memory. But that runs into the issue of getting physical resources seems hard to work around.
I agree with you that the danger of a leap from LLM to a super intelligence that can act on its own is a much harder sell.
I do think that Zvi is mostly speaking to an already convinced audience which is reflected in his frustration that he hasn't had success talking outside of a self-selected audience. There's a distinct danger in having the mentality that he has said he has that his people (meaning the LessWrong crowd) will just downvote and ignore anyone who doesn't think like them. It's a very insular approach.
Re: Zvi speaking to an already convinced audience. Agreed and I’m starting to think the strategy has to shift to getting someone persuasive on a normie podcast circuit. Bannon, Rogan, countless others. I watch lots of debates on YouTube but it’s like Liv Boeree talking to Scott Aaronson. Fascinating but not reaching pockets that could meme/go viral with normies.
It is not the AI we have today but the AI we may have tomorrow.
Deep, man. My point is we are decades away if ever. These things are improving logarithmically and we've hit all the low hanging fruit, most likely.
All the researchers are predicting months to a decade.
We went from "struggling with 2 digits addition" to "PhD level in mathematics" in four years. How many more years do you think we need to wait before we make most mathematicians obsolete ?
Show me phd level mathematics. that would be revolutionary. All the tests I've seen they've been given the answers in training or taken the best one of 10k submissionsor whatever. Sounds like inflated BS to me.
For a technical overview of ASI risk, I recommend AI Safety From First Principles (https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ), which was written a few years ago by a researcher at OpenAI. The site at https://aisafety.info/ offers a somewhat less technical introduction, with responses to a lot of common objections.
Obviously, current chatbots aren't a risk. However, the goal of frontier labs is to create agentic AGI- AI that can reason and act in the world like a person can- and there are good reasons to believe that they're on track to achieve that within the next few years. These labs are focusing most of their effort on developing AI agents right now, and reasoning and long time horizon benchmarks are showing a very clear trend with no indication of leveling off at a human level.
The "count the words in this sentence" task is a good example- older models like the original ChatGPT weren't able to do that because they lacked training in chain of thought. It was like asking a person to immediately give an answer to the question without giving them time to mentally count the words. Newer models like o3, however, can answer counting questions like that easily, since they're able to apply the same sort of thinking to the problem that a human would.
That's one example of progress on a very simple task, but the fact is, we've been seeing very steady progress on every single task and benchmark we throw at these things for a decade. Furthermore, the line between powerful AI and AGI is extremely fuzzy- current AI is very general and a bit agentic; humans are somewhat more so. If AI continues to climb the reasoning and agency benchmarks at the current rate, it will pass humans and keep going- and it'll do so within a couple of years.
When we say that we're on track for AGI soon, that isn't wild speculation about a science fiction breakthrough- it's taking a very clear trend in the data seriously.
Well said!
This seems like current state of the art this month: https://open.substack.com/pub/andrewmaynard/p/can-ai-write-your-phd-dissertation?utm_source=share&utm_medium=android&r=zre9o
Best available model, or close to it, probably prompted with a fair level of skill, still has some problems (bad at doing inline citations, for example), but plausibly speeds up text-based research or knowledge work by a lot.
Operating on the assumption that Gemini can't count words, even though that surprises me based on the capabilities I've seen: If a human can't correctly count a handful of words we guess that the human has something deeply wrong with them and probably can't do much of anything. But these things are not human, and the fact that they stumble at times on things humans can do, shouldn't lead you to firmly draw the same conclusions about their abilities that you would for a human with the same deficits. Apparently they can do the majority of the writing for a thesis, while struggling with formatting inline references or choosing reliable sources. A machine that can do that, is more concerning than "it failed at word-counting" with no mention of what it can do, would lead most people to suppose. Also of note: When I started with ChatGPT 3.5 in 2023, it could barely do a passable short poem, and that was considered a significant advance over prior models. The rate of change is fast.
On your broader point, I really hope you're right and progress stalls out.
Regarding the specific difficulty in counting words or letters: Current LLMs process tokens, not letters or words. You can search for a tokenizer online to see it in action. It would be trivial for the AI companies to synthesize training data for counting words or letters for all the possible tokens, but I'm guessing that's not a priority. Far better that the LLM is proficient using tools like python that can already solved the problem of counting.
I'm broadly worried because I'm watching the Software Engineering Verified benchmark. Progress with autonomous code development worries me as a programmer and human: https://www.swebench.com/#verified.
Are there any benchmarks or tasks you're watching as a fire alarm?
Computers already have beaten humanity at chess and go years ago. We know computers can outperform humans at intellectual tasks.
Why would math or coding be any different?
Why would AI design be any different, fundamentally?
Of course if the AIs could be directed to improve themselves someone will likely use that
It sure feels like we’re experiencing some sort of take-off.
The release of DeepSeek R1 revealed that my personal willingness to pay curve is such that you can charge me literally 10x as much per token, and as long as you’re giving me better tokens I will buy more of them at the higher price. The set of questions I have that I would be willing to pay for answers to contains a lot of stuff that is out of reach of current frontier models, but looks likely to be within reach very soon.
The real problem here is that AI safety feels completely theoretical right now. Climate folks can at least point to hurricanes and wildfires (even if connecting those dots requires some fancy statistical footwork). But AI safety advocates are stuck making arguments about hypothetical future scenarios that sound like sci-fi to most people. It's hard to build political momentum around "trust us, this could be really bad, look at this scenario I wrote that will remind you of a James Cameron movie"
Here's the thing though - the e/acc crowd might accidentally end up doing AI safety advocates a huge favor. They want to race ahead with AI development, no guardrails, full speed ahead. That could actually force the issue. Once AI starts really replacing human workers - not just a few translators here and there, but entire professions getting automated away - suddenly everyone's going to start paying attention. Nothing gets politicians moving like angry constituents who just lost their jobs.
Here's a wild thought: instead of focusing on theoretical safety frameworks that nobody seems to care about, maybe we should be working on dramatically accelerating workplace automation. Build the systems that will make it crystal clear just how transformative AI can be. It feels counterintuitive - like we're playing into the e/acc playbook. But like extreme weather events create space to talk about carbon emissions, widespread job displacement could finally get people to take AI governance seriously. The trick is making sure this wake-up call happens before it's too late to do anything about the bigger risks lurking around the corner.
One aspect I think is under discussed is how issues only become politically salient at the moment they become polarized. And it’s clear now that the Dem side is safety and Rep is /acc. But the central task is turning Rep to the x-risk side since they control everything. Musk has a past of caring about this, but who knows where his head is at. Steve Bannons anti tech populism is the closest imo. How do we convince MAGA that caring is owning the libs? “Sama is a libtard who wants to put you out of work”? Any ideas?
You're onto something with the Silicon Valley angle. Here's a potential framing of the issue to make it salient to the Republican/MAGA movement:
"Look at who's building these systems - overwhelmingly coastal elites who've never worked a real job in their lives. And they're not even trying to hide their contempt anymore. Just look at Sam Altman talking about UBI - his solution isn't to help workers adapt to AI, it's to put them on permanent welfare.
The economics make it even more obvious. These companies have spent billions on data centers and training. They're not going to recoup that by selling "AI assistants" that make workers more productive. The math only works if they can replace those workers entirely. The whole "AI will augment, not replace" line is pure PR. They know exactly what they're building - systems to automate away jobs while concentrating wealth in Silicon Valley. The "safe and ethical AI" narrative is just cover for this wealth transfer from working Americans to tech billionaires.
The real question is: do we want these decisions about the future of work to be made by a handful of tech bros who think they know what's best for everyone else? Because right now, that's exactly what's happening."
-Coastal elites = equivalent to 'unelected bureaucrats/academics', the usual villains of MAGA
-Welfare is coded as contemptible, against human dignity in MAGA
Great comment. And there is potentially an opening with Musk hating Altman. Vance is a RINO/loves Sama and elite libs etc. obviously the MAGA base and Silicon Valley is an unholy, unstable alliance that could be exploited.
I’m not sure about the Dem side being safety. Biden did make some moves toward that, but the liberal people I talk to just think AI is all hype and a scam.
I suppose if you wish GOP constituents were concerned with AI, you could ask which jobs popular with GOP constituents are especially vulnerable to automation.
If you want to convince MAGA you have to understand MAGA. In particular, you have to understand why MAGA has some trust issues. And no, the central insight is not really "owning the libs". That’s a surface level reading, not the core point.
Exhibit 1 : https://x.com/esrtweet/status/1871772092716191863
Exhibit 2 : https://imgur.com/what-if-we-create-better-world-nothing-cartoon-up6yu
The MAGA narrative around those exhibits is : for years academics, media and governments have colluded to use the prestige of "Science" as an excuse to push radical left policies like BLM/reparations & co (exhibit 1) and dismantling capitalism and degrowth (exhibit 2).
Is it true ? Well, it does not really matters, what matters is their perception. You’re not going to convince them that they’ve been wrong on this point. And let me defend some parts of it (not all the way, but some core part), not to convince you, but to show that is it not crazy paranoia, there is some non-crazy basis to it.
Exhibit one is mostly centrally true : the double-standard is pretty hard to deny. I’m pretty sure our host will confirm that the double-standard was indeed crazy at the time.
Exhibit two is kinda true too. Yes, let’s not kid anyone, climate change is real, but the constant rejection of any form of geoengineering because "techno-solutionism is not a solution" is pretty much admission that a good part of the climate change narrative is as a rhetorical and political weapon, not the problem itself.
What problem do we have ?
Well, AI safety pattern-matches pretty nicely those two exhibits. Media and Scientists warns us Very Bad Things Will Happen if we do not immediately give More Power to the Government. Fool me once…
How do we fix this ?
No clue. Only some small pieces of advice. They won’t be sufficient.
Excise from the discourse *anything* that is remotely left-coded, like "sociopathic greedy corporations". Looking at you, "The Compendium". Yes, I know what meaning you are ascribing to those words, and I agree with that literal meaning. There will be maybe 50 people in the world who will understand the actual meaning you want to convey. The rest of the world will understand it as "another anti-capitalism manifesto". 99% of the Compendium is gold. The 1% left means that sharing it anywhere close to the right means burning trust/credibility points as hard as you can.
If you can bring yourself to it, concur with the above concern. Start with "Yes, I know climate change was mostly used as an excuse, but…". I don’t think I can endorse lying and deception, but… if you have no moral qualms to that, maybe disguise yourself as a climate denier ?
Disparage (as much as you can bring yourself to) academia, experts and especially the media, on everything topic other than AI. It does not have to be a lie if you have been following (and agreeing with) "Bounded Distrust". Read and reread Bounded Distrust, make it part of yourself, be able to talk its most important talking points. If you can genuinely agree with most of what is said in Bounded Distrust, and be able to defend and promote it ? Then you possibly have enough common ground with MAGA to talk a common language with them.
Do not lean strongly into "Experts and Science and Scientists say". It is self-defeating. Try the approach "the bold amateurs of LessWrong lead by the iconoclast libertarian-adjacent Yudkowsky". LessWrong is better than arxiv which is better than Nature which is better than legible and serious-sounding stuff like an "International AI Safety Report". That last one will be received in a complete unambiguous way as "I am bullshiting and manipulating you".
And that’s all I have. I did say it wouldn’t be sufficient.
That all sounds good. I think the messaging starts with some angle of “Scam Altman” coined by Elon himself. The opening is there.
Correct, but too kind. You need to make that phrase: "yes, I know that climate change was mostly used as an excuse for policies that made _everything_ including the environment worse, but"
They've well and truly scorched the earth.
I sometimes feel this is the strategy Sam Altman is trying to employ. Before ChatGPT I couldn't even talk publicly about AI X-Risk. It seems like OpenAI, in kicking off the AGI capabilities race, has done more for my ability to talk with people about this issue than any other organization.
I mean, maybe. It’s also possible he’s just a sociopathic
A sociopath that wants to end the world?
No I don’t think he wants to end the world. But I think he’s pulled by the possibility of pioneering one of the most significant human inventions of all time, and that makes it worth rolling the dice for him personally.
A sociopath who doesn't care that he's playing dice with all of our futures. Words cannot express my hate, if that is a true representation of his motives.
I am not sure whether Sam Altman is one of them, but some people in AI capabilities are basically in Davros mode.
https://www.youtube.com/watch?v=KYWD45FN5zA (relevant part ends at 1:07)
The logic goes: I want to have the biggest impact on the future possible - to be the most important person possible. The biggest impact on the future possible is to build an AI that eats the reachable universe, and reshapes it in a form that was decided (albeit without any real control) by the training I performed. Or to quote Harry Potter:
"After all, He Who Must Not Be Named did great things – terrible, yes, but great."
Yes, this is horrifyingly amoral. It's a real mindset, though, if thankfully a rare one (not even everyone at the frontier labs has this mindset, despite those with it being strongly drawn to those labs).
It's possible that Altman honestly thinks that humanity's best chance is him decisively winning the AI race, although the departure of all his leading safety staff indicates that at best, they're not confident in that strategy.
For bonus points, do it using open models, to undermine the argument for investing in large training runs.
See also: https://forum.effectivealtruism.org/posts/RNatHBdxdCdidhqWf/chinscratch-s-quick-takes?commentId=bztdY66fypJ9ih983
It does seem like accelerating the replacement of jobs as early as possible (as long as possible before ASI) could both make the fire alarm louder for those who don't hear it yet, and unify a coalition for a pause and regulations. I'm more optimistic than Zvi that preventing long-term job loss and preventing x-risk go hand in hand.
I don’t think you even need to wait for that to happen to message it. It’s clearly plausible and Sama is literally saying on his blog that his tech is going to displace jobs and instead he will give you an allowance. It’s convincing.
This is exactly the idea behind "early AGI before a compute overhang".
If we could have AGI say next week, and let's assume optimizations for performance do have a reasonable floor. (If there is no floor I guess we never deserved to live, frankly speaking, nature abhors a vacuum)
So at the floor, every running instance of AGI consumes say 10 B200s, or about 6 kilowatts of power and about $500,000 in equipment.
It's going to be real hard for any AGI to escape or rebel. We will have pretty tight control over capital that is this expensive and in finite, limited facilities.
But hopefully early on such AGI will TRY it. We will PROVE it's plotting against us, that different AGI models from different companies naturally team up against the humans, that current cyber security is swiss cheese. All of the things that doomers have claimed for years as risks.
Or find empirically is the opposite and our actual problems are something completely different.
@zvi
tesla seemed to have solved self-driving cars, what's your take on that for alignment?
Have seen zero evidence that this is true.
Self-driving cars are easier than alignment.
Anyone thinking labor will not be replaced isn't feeling the AGI.
You can safely discount their current view as being relevant once they see the AGI.
Vance shouldn't worry you
I'm worried ASI before myopic humans realize they can replace their human workers.
I think the best way of understanding these “safety policies” is that they are not commitments. They are internal communications from the company leadership, to the individuals who are actually building and releasing products. They can change at any time.
It’s really just a reflection of the current feelings of the company, rather than any sort of binding agreement.
All agreements are like that, just that they may be embedded into some other social context where repercussions are expected. Legal repercussions are not the only repercussions that exist.
But in spite of that I mostly agree with you.
I'm panicing. I'm despairing. I'm not giving up.
One thing catches my worry in how I proceed not giving up. It's become clear that my efforts on Technical Alignment will not be allowed time to come to fruition, and so I must turn my attention to the strategic situation.
I find it very difficult to believe that world leaders are so myopic and uninspired that they cannot recognize the risks, but I can imagine them having an awareness of controlling the public. The public will not look pleasantly on efforts to resist what they see as sci-fi delusions while they struggle with their real everyday problems. If this is how leaders see things, than moving discussion of safety offstage would allow them to pursue it without public interference. If such an attempt was being made, then it would be counterproductive for me to continue trying to popularize the issue.
However, it could very well be that the dynamics of human political power structures do not select for those with deep insight into technical possibilities. I am caught in two worlds here. It seems blindingly obvious both that our leaders are idiots, and that they cannot possibly be idiots. But in the world where they in fact are, it continues to be of desperate importance that we succeed in popularizing this issue and making it one that can no longer be ignored.
"It seems blindingly obvious both that our leaders are idiots, and that they cannot possibly be idiots."
They do vary.
I will always remember Trump as the president who recommended looking into injecting disinfectant. Trump's depth of common sense appears remarkably shallow.
In contrast, Musk has led his teams at both Tesla and SpaceX to very impressive achievements. Even the massive downsizing at Twitter/X seems to have been done intelligently enough that users aren't fleeing for reasons of technical failures.
This reminds me: There was a comment that I made a week ago, in agreement with the
preceding comment re Musk: https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment/comment/91436694
Copying my part here:
( Admittedly I want to _see_ AGI, and what I'm about to write goes against this... )
"Anyway, if I were trying to pitch this paper review to someone, it would be Elon Musk"
Yes. In addition to your other points, note that Musk's motivation for his Mars project is that he wants human _biological_ survival. Along similar lines, he is pro-natalist, has a bunch of kids himself, and opposed the Stargate project (though on other grounds).
I don't think an AI pause is in the cards, but, if I were to pick the single person both plausibly amenable to that and with enough power to make a difference, it would be Musk.
[But, as we discussed in ACX, USA/PRC competition is baked in today, and an AI arms control agreement would be unverifiable, so even if Musk completely agreed and convinced Trump, the USA is not a free actor at this point.]
A treaty is not the end of competition, but the agreement to work together to figure out how to end competition. We need to work together to create a world where both China and USA can be free actors.
I agree Musk has talked about ASI x-risk. I don't know his beliefs about many things, but it does seem at least plausible that he would be in favour of an AGI capabilities pause while we solve the Technical Alignment problem.
Many Thanks! As we discussed back in ACX, I don't think that a treaty or a pause will work. As I see it, at best arms control treaties work for highly visible things like aircraft carriers or massive reactors where cheating is readily detectable. AI development is nothing like that: microprocessors, memory chips, data, and software. Even the data centers are basically scaled up office equipment.
_Usually_ arms control treaties wind up being unverifiable, and basically dead letter laws
Best of luck with Musk (ok, I'm ambivalent about this since I want to _see_ AGI within my lifetime). He does seem to be your best bet.
Many Thanks! I also want to see AGI within my lifetime, and ASI too, in fact. But I also want my lifetime to be long and enjoyable, not short and full of propaganda that confuses me before I am killed, which is what seems likely if we build misaligned ASI.
As I have said and will continue to say, I do not think it will be easy, only that it is necessary, and we should all work together on it.
Many Thanks!
Many thanks! Many thanks!
Many thanks for your many "Thanks" : )
I must say, as a French, that trusting Macron for anything important has always been misguided. That man is the most arrogant of all the world leaders - yeah, more than Trump - and only care about stroking his own ego.
I didn't realize things were so bad.
( ; _________ ; )
"yeah, more than Trump" ye gods, I'd thought Trump was the extreme case
oh no no no no
oh no
thanks for your final paragraph
Minor typo here:
"... and the currently the Trump administration is in thrall to those determined to do the same."
He should give prizes to anyone who does this