The real recent story of AI has been the background work being done on Cybersecurity, as we process the Mythos Moment along with GPT-5.5, and figure out both how to patch the internet and what our new regulatory regime is going to look like.
“The overall picture is that GPT-5.5 is a big jump, Mythos is a very big jump, and there is a substantial gap from GPT-5.5 to Mythos but yes both are big deals.”
this was the perfect summary of it. that’s been my experience. great read!
"Open-weight Qwen agents copied their own weights to a new machine and launched a replica. API-only models like Claude and GPT cannot access their weights, so we simulated replication by having them set up and run Qwen weights instead."
Ok, for a minute there I'd thought that the SOTA LLMs had gone full Andromeda Strain. Not quite yet, I guess...
The Commerce vs Intelligence turf war assumes the cyber threat is an application-layer problem: find bugs, patch code, harden networks. But two papers on March 30 collapsed the qubit threshold for breaking encryption by orders of magnitude, and the tool that shortened that timeline was AI itself. Glasswing and Daybreak are patching walls while the same class of technology erodes the cryptographic foundation those walls sit on. Classical encryption protects model weights, API keys, and the training data that makes these defensive tools possible in the first place. Everyone debating who governs the vulnerability race is missing that a second, deeper race is already underway.
The cyber angle feels like the place where governance has to cash out in boring visible ways. I keep wondering what the unit of proof should be here. Critical vulns closed before public weaponization? insurance exclusions changing?
The Commerce versus Intelligence fight over frontier-model release access is the cleanest test of whether US AI governance is going to look like export controls or like national security review. Both agencies have a defensible claim, both have incompatible operating tempos, and the de-facto resolution will set the precedent for every frontier release after Mythos. The METR reliability spread reinforces the timing: 16-hour median horizons at 50 percent success, under 4 hours at 80 percent, under 5 minutes at 99 percent. A regulator that picks one reliability tier as the bar effectively decides which deployments are governed and which are not.
Dean Ball's take on Germany wanting Mythos is Hegseth-level jingoism. "Technology" is so broad as to be meaningless, but I assume he's referring to big tech products - well, US big tech makes tons of money from Europe, and of course countries want to regulate it, because it's gone largely unregulated since day 1 and that is not Good, Actually, an opinion not limited to outside the US.
I didn't see anyone complaining about the forced sale of TikTok, or demanding that China has untrammelled access to the US, how dare they want any control of foreign-owned tech operating in their country, either you have everything or nothing, etc. One rule for me, another for thee.
The Commerce vs. Intelligence jurisdictional fight is the most consequential governance story that's getting underreported. Who decides what conditions apply to frontier model access is a foundational question — it determines whether the primary frame is export control (Commerce, treats AI like semiconductors) or national security classification (Intelligence, treats AI like classified information systems). These produce very different regulatory architectures with very different secondary effects on the research ecosystem.
The "kicking and screaming" into situational awareness observation is accurate, and the timing matters. Regulatory frameworks built in reactive mode — after the capability already exists — tend to optimize for controlling the last thing rather than shaping the next thing. The UK AISI approach of systematic evaluation before release is closer to the right architecture, even if the specific thresholds are debatable.
Intelligence has shown itself to be highly susceptible to infoweapons. (The WMD in Iraq thing.)
That ought to be disqualifying.
“The overall picture is that GPT-5.5 is a big jump, Mythos is a very big jump, and there is a substantial gap from GPT-5.5 to Mythos but yes both are big deals.”
this was the perfect summary of it. that’s been my experience. great read!
"Open-weight Qwen agents copied their own weights to a new machine and launched a replica. API-only models like Claude and GPT cannot access their weights, so we simulated replication by having them set up and run Qwen weights instead."
Ok, for a minute there I'd thought that the SOTA LLMs had gone full Andromeda Strain. Not quite yet, I guess...
The Commerce vs Intelligence turf war assumes the cyber threat is an application-layer problem: find bugs, patch code, harden networks. But two papers on March 30 collapsed the qubit threshold for breaking encryption by orders of magnitude, and the tool that shortened that timeline was AI itself. Glasswing and Daybreak are patching walls while the same class of technology erodes the cryptographic foundation those walls sit on. Classical encryption protects model weights, API keys, and the training data that makes these defensive tools possible in the first place. Everyone debating who governs the vulnerability race is missing that a second, deeper race is already underway.
The cyber angle feels like the place where governance has to cash out in boring visible ways. I keep wondering what the unit of proof should be here. Critical vulns closed before public weaponization? insurance exclusions changing?
The Commerce versus Intelligence fight over frontier-model release access is the cleanest test of whether US AI governance is going to look like export controls or like national security review. Both agencies have a defensible claim, both have incompatible operating tempos, and the de-facto resolution will set the precedent for every frontier release after Mythos. The METR reliability spread reinforces the timing: 16-hour median horizons at 50 percent success, under 4 hours at 80 percent, under 5 minutes at 99 percent. A regulator that picks one reliability tier as the bar effectively decides which deployments are governed and which are not.
Dean Ball's take on Germany wanting Mythos is Hegseth-level jingoism. "Technology" is so broad as to be meaningless, but I assume he's referring to big tech products - well, US big tech makes tons of money from Europe, and of course countries want to regulate it, because it's gone largely unregulated since day 1 and that is not Good, Actually, an opinion not limited to outside the US.
I didn't see anyone complaining about the forced sale of TikTok, or demanding that China has untrammelled access to the US, how dare they want any control of foreign-owned tech operating in their country, either you have everything or nothing, etc. One rule for me, another for thee.
The Commerce vs. Intelligence jurisdictional fight is the most consequential governance story that's getting underreported. Who decides what conditions apply to frontier model access is a foundational question — it determines whether the primary frame is export control (Commerce, treats AI like semiconductors) or national security classification (Intelligence, treats AI like classified information systems). These produce very different regulatory architectures with very different secondary effects on the research ecosystem.
The "kicking and screaming" into situational awareness observation is accurate, and the timing matters. Regulatory frameworks built in reactive mode — after the capability already exists — tend to optimize for controlling the last thing rather than shaping the next thing. The UK AISI approach of systematic evaluation before release is closer to the right architecture, even if the specific thresholds are debatable.