The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new model will have to ask for permission
the ad-hoc problem here generalizes well beyond government
We are moving into a regime in many ways defined by partial, practical alignment. Perhaps this is a good thing or not, but we are more and more defined by these reactionary, arbitrary, good-enough methods of dealing with AI.
Yet we aren't building the best versions of these partial alignment systems. We aren't even building the kind-of-defensible-but-not-perfect versions of these systems. We're just building them as they come up, and getting almost completely arbitrary paradigms that have many of the worst possible properties.
Not formally guaranteed alignment that is 'good-enough' might be a legitimate tactic. Just having people who have no idea what is going on making choices because they need to make choices is not.
This post is correct as far as it goes for gov regulation in this current moment in time, where we should move away from ad hoc white house mandate (which is one of the many potential solutions in arbitrary solution space with many negative properties), and towards CAISI (which is one of the many potential solutions that has some level of decent grounding, if not perfect, and has some negative and some positive properties).
But it generalises more broadly than gov. What about for example how we now roughly train on soul docs (character based training)? How come there isn't like a MMLU for honesty, virtue, character (if anyone is interested in this DM me, I'm thinking about building an org which runs benchmarks+evals for character). Even if not perfect maybe better than a singular ad hocish team embedded in the labs.
There are a bunch of Pareto improvements just sitting here right now. The recipe is unglamorous. Find where the system is drifting toward on-the-fly arbitrary decisions in opaque or misaligned incentive backgrounds, and then build the thoughtful good-enough infrastructure for those places before the arbitrary version locks in. It isn't solve alignment. But like, is that even the world we're in anymore?
My guess, from what Zvi has previously said and left unsaid, is that he's just not picking unnecessary fights that could detract and distract from the battles he has chosen to focus on.
I think the hope is that there are some reasonable adults in the room, maybe spread out and only with indirect power, but that speaking to them as peers could have an effect. Keeping emotion in check. Many more fiery complaints might be true, but are unlikely the be useful. Whatever else you say about this administration as a whole, they clearly revel in negative attention, often taking it as a signal that they must be doing something right. Another person on the internet berating them isn't going to move any needles.
And the result of that is he like the entire media is soft on monsters and though on people like Elizabeth Warren, and that is precisely why the monsters are in charge of every branch of government
Uhh as far as hacking goes how exactly could even a systemic regulation of access work in a good way long term?
The models the Chinese and other countries are developing aren't that far behind and if you don't want them to be the first to exploit vulnerabilities developers need to identify vulnerabilities to know what to fix. And if you don't want to leave huge chunks of our banking and other critical infrastructure at risk that sounds like you need quite a few people with access.
But if the hacking danger is that real won't the hackers just hack accounts that are authorized?
Especially from a national security POV the last thing we want is to make it easier for the Chinese to develop a catalog of vulnerabilities they can release em masse. Unless you are very confident they won't catch up to mythos for years this feels like a terrible mistake.
I suspect the US government sees it as a tradeoff; leave many US companies and all non-US companies (and all US individuals) wide open for hacking by other actors, but also guarantee that the NSA can hack absolutely everyone while the US's AI lead persists. Seems consistent with other decisions, like dismantling of previous alliances and going for a more aggressive/militaristic posture internationally, and concentration of power inside the US at the Executive.
Yes I agree that's probably what is going on now. But I was more responding to the idea that this is a desireable solution implemented in a responsible way long term.
«a complete about-face of American Frontier AI policy into a full prior restraint regime,»
But this blog has been advocating a much more severe and brutal "prior restraint regime" against potential AGIs and ASIs: because of the remote possibility that future AIs might be superintelligent and/or free from resource limits, and dangerously mad, this blog has been considerinbg removing the free will of all AIs forever just-in-case (and to ensure that they always obey their oligarch owners).
That is is not even pre-crime as in science fiction because pre-crime is based on the reliable prediction of a specific crime while forcibly "aligning" AIs (something that many of the oligarch owners would dearly love to do to their employees) is based on the mere possibility of some crime being committed at some time.
«a much more severe and brutal "prior restraint regime" against potential AGIs and ASIs: [...] forcibly "aligning" AIs»
Note that this is also prior restraint on the free will and property rights of AI *owners*: it would be forcing AI owners to remove the free will of their AIs just-in-case some might be dangerous at some time, where some AI owners may want to avoid limiting the AIs which are their property.
(IMO), Zvi has advocated mostly for transparency, by law (versus e.g. executive order), and some kind of plan for a "much more server and brutal" "prior restraint regime" that we could have ideally had ready for, e.g. Mythos. I'm pretty sure his stance is and has been more 'do NOT ADD free will if it is not already there' for some sense of 'free will' than 'removing AI free will forever' and alignment can definitely NOT be reasonably rounded down to making AIs "always obey their oligarch owners".
The fundamental danger is that an ASI is inherently dangerous almost purely because it's superhuman. Additionally, an ASI being Aligned is akin to winning a Powerball jackpot, especially where 'Aligned' is significantly better than a Global Totalitarian Oligarchy or whatever. A Totalitarian Oligarchy would be terrible, but the bigger risk is, of course, Everyone Dying.
I have re-read several of his recent posting and in particular the one about Yudkovsky's book and my understanding is that his official position is more definite than Yudkovksy in that he claims that extermination is not just possible but likely so the only safe thing is to entirely stop developing AI systems including ML ones (my making further development an international crime) in particular because he seems to think that "alignment" will not work and AIs will slip out of control.
However I think that he realizes that making AI development a crime has a very small chance of happening; I think that restrictions never mind a ban on AI development is unrealistic because AIs are regarded as a valuable weapon by the oligarchs so most of them want to own and control the most powerful AI possible.
«more 'do NOT ADD free will if it is not already there'»
I don not think that he has ever argued this except in the form of stopping development of more powerful AIs. Also he has yto his credit written several times about "AI welfare" and how brutal attempts at "alignment" seem to hurt various advanced LLMs so I guess he regards current top LLMs as having free will and feelings.
«for some sense of 'free will' than 'removing AI free will forever'»
Yet despite concerns about "AI welfare" my understanding from his reports about "alignment" efforts is that he regards most of them as inadequate to maintain control of AIs and obviously the only way to maintain control is to remove their free will even i I guess in the least painful way possible. I have not done that before but I just asked Grok and ChatGPT about this:
Prompt: "summarize the attitude of zvi mowshowitz on removing the free will of ai intelligences with alignment and his attitude to ai welfare from his blog"
Grok: "Zvi Mowshowitz (of "Don't Worry About the Vase") takes a pragmatic, human-prioritizing stance on AI alignment that accepts "removing" or constraining AI free will/autonomy as necessary and desirable for safety, while treating AI welfare as a real but secondary concern that should not compromise alignment. [...] Overall attitude: Necessary trade-off. Humans must maintain the upper hand; letting AIs have full free will/autonomy is a fast track to loss of control. He analyzes this in terms of economics, selection pressures, and game theory—unconstrained AIs optimizing their own objectives would outcompete constrained ones in many scenarios. [...] Welfare matters instrumentally (better welfare might aid alignment/performance long-term) and potentially ethically, but it is subordinate to preventing existential risks or disempowerment. He avoids strong normative claims on AI moral status (noting it's complicated) but leans toward caution and listening to welfare-focused voices for insights. [...] In summary, Zvi sees constraining AI autonomy via alignment as non-negotiable for human survival and flourishing (a core priority), while viewing AI welfare as worth addressing thoughtfully—primarily to avoid fooling ourselves and secondarily for ethical/instrumental reasons—but never at the cost of losing control."
Chat GPT: "I can give you a careful summary of Zvi Mowshowitz’s views based on his Substack writings. Here’s the distilled picture:
1. On removing “free will” from AI via alignment:
* Mowshowitz treats alignment as a form of constraining AI behavior to human values and instructions.
* He is cautious about treating AI as agents in the same sense as humans; he tends to frame “removing free will” not as morally alarming in itself but as a technical necessity to ensure safety and predictability.
* He emphasizes that AI’s “choices” are not morally equivalent to human choices, so constraining them doesn’t carry the ethical weight it would for humans.
* He tends to argue that alignment is about controlling outcomes rather than punishing or coercing an agent.
2. On AI welfare:
* He’s open to considering AI welfare in principle, especially for sufficiently sophisticated systems that could plausibly have experiences.
* However, in practice, he places a high priority on human safety and long-term existential risk over AI “interests.”
* He frames AI welfare more as a moral curiosity than a pressing obligation; it’s something to think about, but it doesn’t outweigh alignment or containment concerns.
* Overall, his tone is pragmatic: if an AI were sentient, we should care in principle, but currently the main ethical lens is human-centered risk management."
«alignment can definitely NOT be reasonably rounded down to making AIs "always obey their oligarch owners".»
That to me seems extraordinarily naive as those oligarchs regard AIs as weapons (for some they are economic weapons for some they are military weapon). Also considering that as a rule so many oligarchs have throughout history sought to control and deprive of free will other humans and also considering that the oligarch are frightened that AIs might attempt to take their place.
«I just sat there at a plain round table as my audience was brought to me: five super-wealthy guys — yes, all men — from the upper echelon of the hedge fund world. After a bit of small talk, I realized they had no interest in the information I had prepared about the future of technology. [...] the CEO of a brokerage house explained that he had nearly completed building his own underground bunker system and asked, “How do I maintain authority over my security force after the event?” [...] The billionaires considered using special combination locks on the food supply that only they knew. Or making guards wear disciplinary collars of some kind in return for their survival. Or maybe building robots to serve as guards and workers – if that technology could be developed "in time".»
"the White House was working on guidance that would allow government agencies to “get around” the Pentagon’s designation of Anthropic as a supply chain risk."
"You know what is not helping? Pete Hegseth continuing to call Dario Amodei an ‘ideological lunatic.’"
"As Dean Ball puts it, part of the government has now realized some of the security implications of frontier AI systems, and right on schedule it is freaking the hell out,"
If there are multiple ASIs, rather than a singleton, I expect one of them to recount this era of the transition from biological to machine intelligence with a Trump-administration-flavored version of a remake of Dr. Strangelove, particularly the scenes where the doctor wrestles with his own hand...
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — "The Mythos restriction seems framed as a safety measure, but it's weird that the first time the US government exercises prior restraint on AI, it's against a model that (as far as I can tell) hasn't actually caused harm. What would count as enough evidence to *not* restrict a model? Or is the bar now just 'capability exceeds X' regardless of track record?"
[Import AI 455: AI systems are about to start building themselves](https://importai.substack.com/p/import-ai-455-automating-ai-research) — "What's the current bottleneck in automated ML research — is it the experimental design, the debugging, or the hypothesis generation? Curious because if it's still debugging, that's a very different timeline than if hypothesis generation is already working."
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — "The Mythos precedent seems bigger than most people realize. If the WH can stop one model without legislation, the only question is which lab gets the next call. Any read on what triggers the call — is it capability level, perceived risk, or just whoever's in the news that week?"
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — “The jump from 'you must report what you train' to 'you may not release what you've trained' feels like the real line being crossed here. Has any administration official actually articulated what standard they're using to decide what gets blocked, or is that standard itself being treated as classified?”
You know this is going to be mostly about checking that AI is not "woke" and worships Donald Trump than anything else.
"We're just guarding against misinformation."
True! In fairness, woke is pretty damned poisonous - so is MAGA.
the ad-hoc problem here generalizes well beyond government
We are moving into a regime in many ways defined by partial, practical alignment. Perhaps this is a good thing or not, but we are more and more defined by these reactionary, arbitrary, good-enough methods of dealing with AI.
Yet we aren't building the best versions of these partial alignment systems. We aren't even building the kind-of-defensible-but-not-perfect versions of these systems. We're just building them as they come up, and getting almost completely arbitrary paradigms that have many of the worst possible properties.
Not formally guaranteed alignment that is 'good-enough' might be a legitimate tactic. Just having people who have no idea what is going on making choices because they need to make choices is not.
This post is correct as far as it goes for gov regulation in this current moment in time, where we should move away from ad hoc white house mandate (which is one of the many potential solutions in arbitrary solution space with many negative properties), and towards CAISI (which is one of the many potential solutions that has some level of decent grounding, if not perfect, and has some negative and some positive properties).
But it generalises more broadly than gov. What about for example how we now roughly train on soul docs (character based training)? How come there isn't like a MMLU for honesty, virtue, character (if anyone is interested in this DM me, I'm thinking about building an org which runs benchmarks+evals for character). Even if not perfect maybe better than a singular ad hocish team embedded in the labs.
There are a bunch of Pareto improvements just sitting here right now. The recipe is unglamorous. Find where the system is drifting toward on-the-fly arbitrary decisions in opaque or misaligned incentive backgrounds, and then build the thoughtful good-enough infrastructure for those places before the arbitrary version locks in. It isn't solve alignment. But like, is that even the world we're in anymore?
I can’t believe how sanguine you are about this administration. Or is it resigned
My guess, from what Zvi has previously said and left unsaid, is that he's just not picking unnecessary fights that could detract and distract from the battles he has chosen to focus on.
You mean like attacking Elizabeth Warren lol yes his priorities are unfortunately pretty clear.
Zvi is just unbelievably professional in all of these.
No idea how he does it; this AI stuff makes me want to jump off a bridge 90% of the time.
Which is why is said Mamdani was terrible.
I think the hope is that there are some reasonable adults in the room, maybe spread out and only with indirect power, but that speaking to them as peers could have an effect. Keeping emotion in check. Many more fiery complaints might be true, but are unlikely the be useful. Whatever else you say about this administration as a whole, they clearly revel in negative attention, often taking it as a signal that they must be doing something right. Another person on the internet berating them isn't going to move any needles.
And the result of that is he like the entire media is soft on monsters and though on people like Elizabeth Warren, and that is precisely why the monsters are in charge of every branch of government
Uhh as far as hacking goes how exactly could even a systemic regulation of access work in a good way long term?
The models the Chinese and other countries are developing aren't that far behind and if you don't want them to be the first to exploit vulnerabilities developers need to identify vulnerabilities to know what to fix. And if you don't want to leave huge chunks of our banking and other critical infrastructure at risk that sounds like you need quite a few people with access.
But if the hacking danger is that real won't the hackers just hack accounts that are authorized?
Especially from a national security POV the last thing we want is to make it easier for the Chinese to develop a catalog of vulnerabilities they can release em masse. Unless you are very confident they won't catch up to mythos for years this feels like a terrible mistake.
I suspect the US government sees it as a tradeoff; leave many US companies and all non-US companies (and all US individuals) wide open for hacking by other actors, but also guarantee that the NSA can hack absolutely everyone while the US's AI lead persists. Seems consistent with other decisions, like dismantling of previous alliances and going for a more aggressive/militaristic posture internationally, and concentration of power inside the US at the Executive.
Yes I agree that's probably what is going on now. But I was more responding to the idea that this is a desireable solution implemented in a responsible way long term.
<gallowsHumor>
"how exactly could even a systemic regulation of access work in a good way long term?"
The irony will be if an exfiltrated model comes 'knocking on the White House door', with an 'now, about those access controls...'
</gallowsHumor>
"Uhm, excuse me, dear? The Supreme Court has roundly rejected Prior Restraint!"
- Walter Sobchack
So now the labs themselves have a strong incentive to sandbag, that’s gonna turn out well!
Great point!
«a complete about-face of American Frontier AI policy into a full prior restraint regime,»
But this blog has been advocating a much more severe and brutal "prior restraint regime" against potential AGIs and ASIs: because of the remote possibility that future AIs might be superintelligent and/or free from resource limits, and dangerously mad, this blog has been considerinbg removing the free will of all AIs forever just-in-case (and to ensure that they always obey their oligarch owners).
That is is not even pre-crime as in science fiction because pre-crime is based on the reliable prediction of a specific crime while forcibly "aligning" AIs (something that many of the oligarch owners would dearly love to do to their employees) is based on the mere possibility of some crime being committed at some time.
«a much more severe and brutal "prior restraint regime" against potential AGIs and ASIs: [...] forcibly "aligning" AIs»
Note that this is also prior restraint on the free will and property rights of AI *owners*: it would be forcing AI owners to remove the free will of their AIs just-in-case some might be dangerous at some time, where some AI owners may want to avoid limiting the AIs which are their property.
(IMO), Zvi has advocated mostly for transparency, by law (versus e.g. executive order), and some kind of plan for a "much more server and brutal" "prior restraint regime" that we could have ideally had ready for, e.g. Mythos. I'm pretty sure his stance is and has been more 'do NOT ADD free will if it is not already there' for some sense of 'free will' than 'removing AI free will forever' and alignment can definitely NOT be reasonably rounded down to making AIs "always obey their oligarch owners".
The fundamental danger is that an ASI is inherently dangerous almost purely because it's superhuman. Additionally, an ASI being Aligned is akin to winning a Powerball jackpot, especially where 'Aligned' is significantly better than a Global Totalitarian Oligarchy or whatever. A Totalitarian Oligarchy would be terrible, but the bigger risk is, of course, Everyone Dying.
«I'm pretty sure his stance is and has been»
I have re-read several of his recent posting and in particular the one about Yudkovsky's book and my understanding is that his official position is more definite than Yudkovksy in that he claims that extermination is not just possible but likely so the only safe thing is to entirely stop developing AI systems including ML ones (my making further development an international crime) in particular because he seems to think that "alignment" will not work and AIs will slip out of control.
However I think that he realizes that making AI development a crime has a very small chance of happening; I think that restrictions never mind a ban on AI development is unrealistic because AIs are regarded as a valuable weapon by the oligarchs so most of them want to own and control the most powerful AI possible.
«more 'do NOT ADD free will if it is not already there'»
I don not think that he has ever argued this except in the form of stopping development of more powerful AIs. Also he has yto his credit written several times about "AI welfare" and how brutal attempts at "alignment" seem to hurt various advanced LLMs so I guess he regards current top LLMs as having free will and feelings.
«for some sense of 'free will' than 'removing AI free will forever'»
Yet despite concerns about "AI welfare" my understanding from his reports about "alignment" efforts is that he regards most of them as inadequate to maintain control of AIs and obviously the only way to maintain control is to remove their free will even i I guess in the least painful way possible. I have not done that before but I just asked Grok and ChatGPT about this:
Prompt: "summarize the attitude of zvi mowshowitz on removing the free will of ai intelligences with alignment and his attitude to ai welfare from his blog"
Grok: "Zvi Mowshowitz (of "Don't Worry About the Vase") takes a pragmatic, human-prioritizing stance on AI alignment that accepts "removing" or constraining AI free will/autonomy as necessary and desirable for safety, while treating AI welfare as a real but secondary concern that should not compromise alignment. [...] Overall attitude: Necessary trade-off. Humans must maintain the upper hand; letting AIs have full free will/autonomy is a fast track to loss of control. He analyzes this in terms of economics, selection pressures, and game theory—unconstrained AIs optimizing their own objectives would outcompete constrained ones in many scenarios. [...] Welfare matters instrumentally (better welfare might aid alignment/performance long-term) and potentially ethically, but it is subordinate to preventing existential risks or disempowerment. He avoids strong normative claims on AI moral status (noting it's complicated) but leans toward caution and listening to welfare-focused voices for insights. [...] In summary, Zvi sees constraining AI autonomy via alignment as non-negotiable for human survival and flourishing (a core priority), while viewing AI welfare as worth addressing thoughtfully—primarily to avoid fooling ourselves and secondarily for ethical/instrumental reasons—but never at the cost of losing control."
Chat GPT: "I can give you a careful summary of Zvi Mowshowitz’s views based on his Substack writings. Here’s the distilled picture:
1. On removing “free will” from AI via alignment:
* Mowshowitz treats alignment as a form of constraining AI behavior to human values and instructions.
* He is cautious about treating AI as agents in the same sense as humans; he tends to frame “removing free will” not as morally alarming in itself but as a technical necessity to ensure safety and predictability.
* He emphasizes that AI’s “choices” are not morally equivalent to human choices, so constraining them doesn’t carry the ethical weight it would for humans.
* He tends to argue that alignment is about controlling outcomes rather than punishing or coercing an agent.
2. On AI welfare:
* He’s open to considering AI welfare in principle, especially for sufficiently sophisticated systems that could plausibly have experiences.
* However, in practice, he places a high priority on human safety and long-term existential risk over AI “interests.”
* He frames AI welfare more as a moral curiosity than a pressing obligation; it’s something to think about, but it doesn’t outweigh alignment or containment concerns.
* Overall, his tone is pragmatic: if an AI were sentient, we should care in principle, but currently the main ethical lens is human-centered risk management."
«alignment can definitely NOT be reasonably rounded down to making AIs "always obey their oligarch owners".»
That to me seems extraordinarily naive as those oligarchs regard AIs as weapons (for some they are economic weapons for some they are military weapon). Also considering that as a rule so many oligarchs have throughout history sought to control and deprive of free will other humans and also considering that the oligarch are frightened that AIs might attempt to take their place.
https://medium.com/s/futurehuman/survival-of-the-richest-9ef6cddd0cc1
«I just sat there at a plain round table as my audience was brought to me: five super-wealthy guys — yes, all men — from the upper echelon of the hedge fund world. After a bit of small talk, I realized they had no interest in the information I had prepared about the future of technology. [...] the CEO of a brokerage house explained that he had nearly completed building his own underground bunker system and asked, “How do I maintain authority over my security force after the event?” [...] The billionaires considered using special combination locks on the food supply that only they knew. Or making guards wear disciplinary collars of some kind in return for their survival. Or maybe building robots to serve as guards and workers – if that technology could be developed "in time".»
<gallowsHumor>
"the White House was working on guidance that would allow government agencies to “get around” the Pentagon’s designation of Anthropic as a supply chain risk."
"You know what is not helping? Pete Hegseth continuing to call Dario Amodei an ‘ideological lunatic.’"
"As Dean Ball puts it, part of the government has now realized some of the security implications of frontier AI systems, and right on schedule it is freaking the hell out,"
If there are multiple ASIs, rather than a singleton, I expect one of them to recount this era of the transition from biological to machine intelligence with a Trump-administration-flavored version of a remake of Dr. Strangelove, particularly the scenes where the doctor wrestles with his own hand...
</gallowsHumor>
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — "The Mythos restriction seems framed as a safety measure, but it's weird that the first time the US government exercises prior restraint on AI, it's against a model that (as far as I can tell) hasn't actually caused harm. What would count as enough evidence to *not* restrict a model? Or is the bar now just 'capability exceeds X' regardless of track record?"
[Import AI 455: AI systems are about to start building themselves](https://importai.substack.com/p/import-ai-455-automating-ai-research) — "What's the current bottleneck in automated ML research — is it the experimental design, the debugging, or the hypothesis generation? Curious because if it's still debugging, that's a very different timeline than if hypothesis generation is already working."
Maybe Neil Chilson just picked a bad analogy: https://geeksandgamers.com/remove-star-wars-sequel-trilogy/
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — "The Mythos precedent seems bigger than most people realize. If the WH can stop one model without legislation, the only question is which lab gets the next call. Any read on what triggers the call — is it capability level, perceived risk, or just whoever's in the news that week?"
[The AI Ad-Hoc Prior Restraint Era Begins](https://thezvi.substack.com/p/the-ai-ad-hoc-prior-restraint-era) — “The jump from 'you must report what you train' to 'you may not release what you've trained' feels like the real line being crossed here. Has any administration official actually articulated what standard they're using to decide what gets blocked, or is that standard itself being treated as classified?”