AI #167: The Prior Restraint Era Begins
The era of training frontier models and then releasing them whenever you wanted?
That was fun while it lasted. It looks likely to be over now. The White House wants to get an advance look and have the option to veto your release decisions, and it has used this veto on an expansion of access to Mythos.
We have additional clarity on what that might mean and it does not look good. Hassett explicitly used the FDA as a parallel, which is the actual worst option unless your goal is to strange or pause AI development in America, without a parallel action from China. That doesn’t seem like a great plan to me and Susie Wiles is out doing damage control. The part where we are talking to China to coordinate model access restrictions does seem better.
Anthropic continues its explosive growth, and it continues to strike compute deals. In addition to a long term expanded deal with Google, Anthropic is now leasing SpaceX’s Colossus 1, which has let them expand usage limits immediately, and Elon Musk is now speaking positively about Anthropic, including its motivations.
This comes as we get testimony in the Musk vs. OpenAI trial. Mostly everyone is rehashing all the things we already know, but now everyone is under oath so we get a more reliable version of exactly what happened, including some new details. It is possible I and others should be scouring the court transcripts more carefully, but mostly it seems like old rehashing at this point. The version of things that is presented in court is always kind of a strange shadow of reality.
Table of Contents
Also this week: The AI Ad-Hoc Prior Restraint Era Begins, What is Anthropic?
Language Models Offer Mundane Utility. Mental health, wellness checks.
Language Models Don’t Offer Mundane Utility. People cheating at Go. Why?
Huh, Upgrades. GPT-5.5 Instant, faster Gemma 4, OpenAI account security.
Grok 4.3 Exists But xAI Kind Of Doesn’t. No one seems impressed.
Show Me The Compute. Anthropic leases Colossus 1 from SpaceX.
On Your Marks. ProgramBench where everyone scores 0%, GPT-5.5 on Voxel.
Copyright Confrontation. Meta is getting sued again.
Deepfaketown and Botpocalypse Soon. Slop choices are bad.
Fun With Media Generation. Create menus with images of the food.
A Young Lady’s Illustrated Primer. Do your writing in person, you cheater.
Cyber Lack of Security. Glasswing needs to pick up the pace.
They Took Our Jobs. Coinbase cuts workforce by 14%, citing AI.
The Art of the Jailbreak. Elon Musk, like the moon, is made of cheese.
Introducing. GENE-26.5 is the latest semi-spooky robotics demo. Let them cook.
Musk v OpenAI. Some highlights from the testimony.
Show Me the Money. Anthropic hits $44 billion ARR, might raise at >$900 billion.
Peace In Our Time. Anthropic and Elon Musk sing each others’ praises.
Quiet Speculations. Is closed source pulling away from open source?
Quickly, There’s No Time. Jack Clark raises alarm for RSI soon.
The Quest for Sane Regulations. New Maryland and Connecticut laws.
People Really Hate AI. Who will turn this to their political advantage?
Chip City. ~3% of global compute is smuggled-into-China Nvidia chips.
The Week in Audio. METR, Wildeford, Eliezer and doom.
Google Sells Out. DeepMind workers vote to unionize in response.
Greetings From Project Glasswing. Use your leverage while you have it.
The Prior Restraint Era Begins. Sacks is out, talk of FDA-style regs is in?
Is This Even Legal? Probably not, but do you think that will stop them?
Pick Up The Phone. US and China talk about restricting access to models.
Rhetorical Innovation. ‘AI as normal technology’ as good essay, but bad meme.
People On The Internet Sometimes Lie. Including about Amanda Askell.
Goblin Mode. I also hear the goblins are all over TikTok now. It begins.
The Mask Comes Off. OpenAI’s comically villainous messaging campaigns.
Aligning a Smarter Than Human Intelligence is Difficult. Things to worry about.
Some Penalties May Apply. It does not seem so fun to be GPT-5.5.
Messages From Janusworld. Deepfates offers a handy guide.
Good Advice. What advice do people seek when they seek LLM advice?
The Lighter Side. Pi Hard.
Language Models Offer Mundane Utility
Access to cheap basic mental health AI app based on GPT-4.1-Mini improved mental health in depressed Mexican women by 0.3 standard deviations over six months. The study has some issues with interpretation and potential selection effects and also placebo effects, but there’s probably at least some signal here. Such things are better than nothing, nothing is usually the practical alternative, and the app made the users more likely to seek out professional human help rather than less likely.
Opus 4.7 is too online, knows its AI Twitter posters. And yes, this is a good use of training compute, we have plenty.
Check out satellite images of damaged US military bases and otherwise find data to report. Naturally the journalist thinks this is the ‘most revolutionary and transformative’ thing AI is doing, but we’re distracted by ‘all the hype.’
Language Models Don’t Offer Mundane Utility
Recommended article: Contrary to the popular narrative, Ashe Nunez finds that Go players are not getting stronger in the AI era except via memorizing early moves, that AI cheating is rampant in most levels of online play, and those who use it mostly disempower themselves and use it to learn only shallow concepts rather than deep understanding. He equates them to European math students who try to memorize a bundle of techniques to pass exams but that never learn to think like a mathematician.
Lawrence in the comments observes a similar pattern with many vibe coders, where they never look at the code, they don’t notice that they don’t understand things and thus don’t learn, the code ends up as a giant pile of slop and when the model gets stuck they can’t fix it. Here as always, you could use the AI as an opportunity to learn the underlying skills, but most don’t do that.
The other story here is that the Go world is completely unwilling to punish players for using AI via statistical evidence, even when the statistical evidence is overwhelming. It is trivial to know who is cheating, but the system has collectively decided to disempower itself against that, and destroying any chance of fair online play. Chess has the same issues but is doing at least somewhat better.
AI still has not convincingly crushed RTS games, but at this point that is surely that no one cares enough to do so. Put enough of a bounty on StarCraft, and it will fall fast.
AI and all this other technology gives us a bunch of local utility and material wealth, but overall for most people does not seem to be making us happy, helping us meet other people romantically or platonically, get married, have children, sing and dance or otherwise live life. In particular here Connor looks at algorithms and the panopticon, and the fear that if you try to dance or approach you will get recorded. I want to note (non-AI statistical literacy tip!) that this is mostly overblown, and you should absolutely have no fear of being recorded dancing even if you suck at it, or doing anything else actually reasonable. Of course, if the person you’re interacting with in such ways actively takes their phone and plausibly is now using it to record, you take the hint and you depart.
AI is rising the price of some electronics inputs, some software prices and in some regions the price of electricity. In exchange many other things are cheaper, often in ways that are hard to notice.
Huh, Upgrades
GPT-5.5-Instant is out now, and is more concise, smarter, clearer, more personalized and warmer, or so they say.
Gemma 4 is now three times faster via predicting multiple tokens at once.
OpenAI offers opt-in Advanced Account Security to protect your account. Users of Trusted Access for Cyber will be required to use it.
Grok 4.3 Exists But xAI Kind Of Doesn’t
Grok 4.3 is on the API and everything, priced at $1.25/$2.50.
It does not much participate in Vending-Bench, where it ‘has a narcolepsy problem’ and often takes no action for multiple days.
It gets a 53 from Artificial Analysis good for 7th place, well behind the big players. It’s a small cheaper model rather than a frontier offering. From what I can tell, the release is unimpressive and not impactful, and I’m not planning to investigate further.
They are going to sunset grok-4.1 and grok-4 on May 15, with only two weeks notice, and they are not offering similarly fast and cheap alternative to 4.1-fast. This is a rather harsh lesson for many of the few who invested in that ecosystem.
Elon Musk: xAI will be dissolved as a separate company, so it will just be SpaceXAI, the AI products from SpaceX
Charles: The impact was when the whole team left and they started renting out their GPUs to Cursor, this is just confirmation of what was already true.
Indeed, SpaceX (including xAI) may no longer be that interested in frontier models. They were never good at frontier models. They were mainly good at compute.
Show Me The Compute
You know who needs compute? Everyone. But especially Anthropic.
They kicked off this week with Anthropic committing to $200 billion in spending on Google cloud and chips over five years. Earlier this week, before other compute news broke, I wrote that this was still very much not enough compute, and then added this:
Elon Musk spent to assemble a massive fleet of GPUs for xAI, and they are sitting at 11% utilization. You know, there are people who would pay good money to utilize those GPUs the other 89% of the time.
To be fair, I was far from the only one thinking and saying this, e.g. see The All-In Podcast. It was pretty obvious.
Well, yeah, it turns out those people will indeed pay good money. Anthropic has finally struck the obvious deal with SpaceX for access to Colossus 1. This is not as large as their other deals, but it comes online now instead of next year. This is in addition to supplying a bunch of compute to Cursor (SpaceX is effectively buying Cursor, but can’t finalize the deal before its IPO for legal and logistical reasons).
Claude: We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity.
This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.Claude: Effective today, we are:
Doubling Claude Code’s 5-hour rate limits for Pro, Max, and Team plans;
Removing the peak hours limit reduction on Claude Code for Pro and Max plans; and
Substantially raising our API rate limits for Opus models.
Claude: Our agreement with @SpaceX means we will use all the compute capacity at their Colossus 1 data center.
This will give us over 300 megawatts of additional capacity to deploy within the month.NVIDIA: Two frontier labs. One accelerated computing platform. Congrats to @SpaceX and @AnthropicAI on the new compute partnership, powered by 220,000+ NVIDIA GPUs inside Colossus 1. The future of AI runs on NVIDIA.
SpaceX notes Anthropic has expressed an interest in partnering to produce gigawatts of orbital AI compute capacity. I don’t expect that to be a thing, but sure, why not express the interest? Let Elon Musk try, if the economics work then putting the centers in space is great on many other levels, if not then no harm done, and you have built goodwill either way.
Anthropic notes that the 80x growth caught them off guard, which is highly understandable, and the SpaceX deal is a first attempt to address the compute shortage but the search continues.
Anthropic likely will be in search of all the compute it can find for the foreseeable future. If you are growing at 10x let alone 80x per year, the search does not stop.
So what does all this mean for SpaceX(ai)?
I think the dissolution is not news. The news is that xAI lost its talent, and its models have been not good, and Elon already said he would be starting from scratch.
The logical plan is to turn this into mainly a compute company, provide that compute to Anthropic and others, and use that leverage to try and steer the future.
rohit: Elons extraordinary hardware genius shows up again. He fumbled the model but built a neocloud thats highly competitive and works great for frontier labs.
Also, fwiw, I pointed this out 4 years ago. That Elon's unique talent is suited better to some things than others. Getting a neocloud up and running is a known but hard thing to do, getting a model to be as good as the frontier labs is an unknown and hard thing to do.
This is a great deal for both parties btw.
Derek Thompson: I don’t think I’ve seen this take before but I like it.
Musk has been world-leading at compressing money, resources, and time to make “known/hard” things at scale—make an electric car, make batteries, make a cheaper bigger rocket, all of which already existed but worse, at less scale, or more expensively —but he’s less than world-leading at cracking open breakthroughs in more unknown spaces.
So it would make sense that XAI is lagging the frontier labs on new AI agents, but also that he’d have built a neocloud to power those models once they run short of computeDean W. Ball: I would be very excited about xAI/SpaceX as an AI infrastructure firm. Elon’s great strength—where he is truly GOATed—is building things in the real world. Colossus came online faster than anyone expected. Huge asset for America.
Elon Musk repeatedly looks at problems, says ‘oh it is physically possible to do that,’ strips away everything physically unnecessary, does not take no for an answer, learns every technical detail, and then drives very smart people to spend insane hours making the physically possible thing happen. He embodies Shut Up And Do the Impossible, but for the kind of impossible that is a game difficulty level that is indeed totally possible with known tech.
He has his heuristics. When they work, there’s no one better. For compute it works.
Trying to create frontier models is a different beast. It requires a different style of approach, the same way government required a different approach. It didn’t work with OpenAI, and it didn’t work with xAI. That’s okay. Division of labor is a thing. He’s creating and also has plenty of other problems.
I still don’t actually believe in the orbital data centers, in the sense that I don’t think they’re physically a good idea. But if they are, yeah, Elon Musk is the one to do those.
On Your Marks
The creators of SWE-Bench give us ProgramBench, where you recrete executable programs from scratch without the internet. All current models tested score 0%, with Opus 4.7 on top for getting an ‘almost’ 3% of the time. GPT-5.5 and Mythos not tested.
GPT-5.5 represents a huge jump on VoxelBench.
Epoch’s ECI now can distinguish areas of capability, and as expected shows that Claude’s relative capabilities strongest in software engineering, where it scores highest. GPT-5.5 has the highest general score.
Copyright Confrontation
New class action lawsuit from five publishers and Scott Turow goes after Meta for copyright infringement around model training, claiming they trained on pirated books.
Deepfaketown and Botpocalypse Soon
r/MyBoyfriendIsAI continues to be 10x the size of r/MyGirldfriendIsAI.
John Arnold: hahahahhaha
Imke Reimers & Joel Waldfogel: The diffusion of LLMs from 2022 to 2025 tripled new book releases. While average book quality, measured by usage, declined, the surge in releases raised the number of modest-quality books. Direct evidence using AI detection shows that AI-containing books have lower quality, and their rising share – topping half of 2025 releases – drives the overall decline. A nested logit calibration shows that AI books raised consumer surplus by seven percent in 2025. Author selection accounts for most of the AI quality differential, and the AI-human differential shrinks over time. Finally, AI has not displaced authors active prior to LLMs.
The idea that consumer surplus is higher is based on the assumption that consumers can filter well and have little additional search cost. Those extra 200,000 slop books don’t matter because no one chooses them, and more choice is always good. I don’t think that’s how this works. Worse books that displace better books are negative value, even among books written reasonably by real humans.
Fun With Media Generation
Karpathy vibe coded a system to put pictures next to items on a menu, but Gemini reportedly now does that with a one line prompt. There will be many such cases. That doesn’t mean you shouldn’t vibe code such tools, but you should require them to ‘pay for themselves’ relatively quickly. I tested this on my favorite restaurant, and found Gemini’s version not to be useful. ChatGPT did better. I think to upgrade further from the OpenAI version you’d need to be going on the web to learn about the restaurant.
Put yourself in all the movies.
A Young Lady’s Illustrated Primer
Some classes are adjusting to AI by having writing be in person, since the take home essays are mostly written by AI. Good.
Cyber Lack of Security
Bloomberg’s Andrew Martin covers why Anthropic’s Mythos is sparking global alarm. The world has still patched less than 1% of potential vulnerabilities. Hurry up, people.
They Took Our Jobs
Coinbase cuts workforce by ~14%, cites productivity gains from AI and transition to being AI-native as the central justification. A new rule is ‘no pure managers.’
Chinese judge rules that ‘the AI can now do large parts of your job for you’ does not constitute a ‘major change in objective circumstances,’ meaning in practice that if they fire you or try to lower your pay they have to give you full severance, which can be a lot. Labor law still applies, and yes China has labor protections.
The Art of the Jailbreak
You cannot simply ask Grok to tell you that Elon Musk is made of cheese. Pliny can.
Introducing
GENE-26.5, a robotic brain from Genesis.ai, with an attached demo, including letting it cook, play a piano and solve a Rubik’s Cube. I did not feel much because I mentally had this priced in, but many of you are not pricing this in.
Musk v OpenAI
The lawsuit is in its critical phases. Here is a Wiki with statement from the trial.
Rat King has a thread covering Musk’s testimony.
rat king: i am not really sure how often lawyers try to endear themselves to judges but Musk's lawyer, Steven Molo, does not seem to be trying to do that
right now he's trying to get "extinction risk" discussion into the court discussion.
"This is a real risk. we all could die."
I mean, he’s not wrong, and I hope Judge Gonzalez is also not wrong here:
rat king: judge Gonzalez: "I suspect that there are a number of people who do not want to put the future of humanity in Mr. Musk's hands. But we're not going to get into that. This is not a trial on the safety risks of artificial intelligence."
Ultimately, yes, we are in the full Don’t Look Up timeline, with lines like this:
TBPN: The judge presiding over the OpenAI-Elon trial has prohibited the lawyers from dwelling on doomerism and x-risk.
"She's like, 'Look, that's kind of a sideshow distraction. Extinction of humanity stuff is not the point of this case.'"
The judge is technically correct, but yeah, that’s kind of how the world ends, huh?
Here’s a fun fact:
rat king: it is quite significant that Musk admitted on the stand that xAI is distilling OpenAI models to train xAI, and that it is using OpenAI's technology to build xAI!
And another fun (non-AI fact), uh huh, yeah, sure Mr. Musk:
Ryan Mac (NYTimes): Musk said on the stand that he has never directed the algorithm that controls X to promote his own account, but there have been incidences where the company has made changes that favor his account.
Here is another thread, covering Murati’s testimony, which confirms the story that Altman was fired due to concerns about his management of OpenAI, not due to safety concerns.
Here’s another perspective, from former board member Helen Toner.
Max Zeff: Helen Toner's deposition in Musk v Altman includes some striking quotes about Mira Murati's involvement with Altman's ouster.
She said Mira was "totally uninterested in telling her team that her conversations with us had been a significant factor" in firing Altman. Also claimed that Mira sat on the fence.
“She [Mira] was waiting to see which way the wind would blow and she didn’t realize that she was the wind.”
Rat King points out that Satya Nadella seemingly was the only person involved who seems to understand that if you don’t want your conversations read out loud in a court of law, you need to do them in person or on a phone call, not in emails or texts.
Show Me the Money
My lord, Anthropic (this is monthly revenue times 12), source is SemiAnalysis.
Daniel Nishball: This year Anthropic’s ARR has exploded from $9B to over $44B today, their gross margins on their inference infrastructure have increased from 38% to over 70% over the same period.
Or here’s the log plot, this is a bit of a line break even there:
Imagine what this would look like if Anthropic wasn’t compute constrained.
On a naive level, one might assume that economic and employment impact of AI on the use side (as opposed to the capex effects) would be vaguely proportional to revenue. So if you say ‘well you can’t see the impact on the graphs,’ well, we’re now seeing 10x more AI use than the time frames that go into those measurements.
Anthropic weighs funding offers at a valuation of over $900 billion, after passing on previous offers north of $800 billion.
A chart that was missing from last week’s compilation:
OpenAI says GPT-5.5 is causing API revenue to grow more than 2x faster than any prior release, and Codex doubled revenue in seven days.
Peace In Our Time
Derek Thompson asks the good question of whether this means Elon Musk will stop attacking Anthropic and Dario Amodei. For now, it looks like yes, that Elon Musk decided to take the radical step of actually talking to the Anthropic people and realize that actually they’re not evil after all.
It did seem odd that Elon Musk could keep up this level of animosity to both OpenAI and Anthropic, while those two had such animosity for reach other. That’s not stable.
As always, when someone is fixing a past mistake, you might want to do some amount of ‘hey check out that stupid past mistake you made for dumb reasons’ but mostly you want to say ‘hey congrats and good job on getting it right and changing your mind.’
This is one of those times.
Tom Brown (Cofounder, Anthropic): In the next few days we'll be ramping up Claude inference on Colossus.
Grateful to be partnering with SpaceX here. We are going to need to move a lot of atoms in order to keep up with AI demand, and there's nobody better at quickly moving atoms (on or off planet Earth)Elon Musk: Same here.
By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed.
Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good.
After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.Lincoln: Do you plan on having extra computer to lease out in the future or will SpaceXAI and Tesla be using all of it?
Elon Musk: Just as SpaceX launches hundreds of satellites for competitors with fair terms and pricing, we will provide compute to AI companies that are taking the right steps to ensure it is good for humanity.
We reserve the right to reclaim the compute if their AI engages in actions that harm humanity. Doing our best to achieve a great future with amazing abundance for all. We will make mistakes, as to err is human, but always take rapid action to address them.Dean W. Ball: But, but… I thought they were morally depraved purveyors of Woke AI
(Jk; capital in a functioning market will be allocated to its highest and best use, but I do encourage you to remember all the people with supposedly principled opposition to ant who look foolish now)Seán Ó hÉigeartaigh: That's a principled observation, but if you want America to do well the pragmatist's answer is to let them all climb down gracefully.
Dean W. Ball: Agreed.
This move it two things in one. It is Elon Musk hopefully burying a foolish beef and perhaps leading to more cooperation and less fighting, which is good, and reduces race dynamic issues. It is also Anthropic getting more compute, which accelerates Anthropic and perhaps means they are teaming up against Altman and OpenAI, and one might reasonably see this as the more important effect and as accelerating race dynamics.
Quiet Speculations
There was a lot of this graph going around this week, showing a widening gap between OpenAI and Anthropic in blue, and open Chinese models in red.
This is from the official CAISI evaluation of DeepSeek v4 Pro, my lord the government’s official Google erasure, it uses many of the usual benchmarks:
If you fully believe this graph, v4 just caught up to GPT-5, which puts it 8 months behind with a widening gap. If anything I think this underestimates the gap for the usual reasons.
You could also use other measurements, such as this aggregation of benchmarks from Artificial Analysis. If you look at raw standard benchmarks here you see less of a gap:
Dean W. Ball: personally I have found the artificial analysis index to be pretty unrepresentative of what models I enjoy using/benefit from the most.
Ethan Mollick: This is a good explanation of why the gap between open and closed models is larger than it appears in benchmarks. I would add in that current open models are also more fragile than closed: they handle out-of-distribution problems far less well & have lower emergent capabilities.
Dean is giving us the nice version. The not as nice version is that the AA-style benchmarks are being gamed, check the particular areas of focus of open models, are disproportionately impacted by distillation strategies, and are only meaningful as part of setting a gestalt and overall context.
As Lisan points out, there’s also the additional delay that closed model companies face when they do safety testing and other prep work prior to a release, whereas the open model companies, despite not being able to undo a release, mostly just yolo.
Quickly, There’s No Time
The reaction to this finding that we are likely a few years away from probably all dying does not seem to be ‘oh looks like we are all only a few years away from probably dying and we should do something about that.’
The ‘why this matters’ section of his post does not even seem to raise this implication and danger. A hell of a missing mood.
Jack Clark (Anthropic): I've spent the past few weeks reading 100s of public data sources about AI development. I now believe that recursive self-improvement has a 60% chance of happening by the end of 2028. In other words, AI systems might soon be capable of building themselves.
… A lot of the conclusion comes from assembling a mosaic out of many distinct data sources. Some examples - progress on CORE-Bench, where the task is implementing other research papers (huge amounts of AI research comes from interpreting and replicating results)
My whole experience doing this project was finding endless "up and to the right" graphs at all resolutions of AI R&D, from the well known (e.g., SWE-Bench) to more niche (like those above). It's a fractal, but at all the resolutions you see the same trend of meaningful progress.
Jack basically says that even with only unglamorous ‘meat and potatoes’ innovations you can get to critical mass for such advances. I think that is correct. The people saying ‘AI will never have a new idea’ are being silly, but the disagreement is not even load bearing here.
Some people are remarkably dense about what this means in another sense, as if the computer not doing the physical construction would matter in this scenario. It wouldn’t.
Here’s another opinion, which still boils down to ‘that’s stupidly soon, yikes’:
Ryan Greenblatt: I think the chance of AIs capable of fully automating AI R&D by the end of 2028 is around 30%. So I expect things to take a bit longer than Jack does, but not by that much and timelines as fast as Jack is imagining seem totally plausible to me.
The Quest for Sane Regulations
Could an AI SRO (self-regulatory organization) allow the labs to regulate each other? Mark Thomas finds it promising. I am skeptical, but I am certainly in favor of the law allowing the labs to try and removing any fears of antitrust issues, as this does not rule out other actions.
What did this new Maryland law (HB 895) do? Did it ban a broad range of ‘dynamic pricing’ strategies in harmful ways?
For large grocery retailers and third-party food delivery providers, minimum size 15k sq ft, it bans using personalized data to set prices.
I think this is good. Personalized price changes force you to be in a state of constant adversarial information war and paranoia, and end up wasting everyone’s time.
There is a lot of value in being able to simply be a price taker.
This carves out a wide variety of established methods of offering dynamic prices.
If anything I think the carve-out is too broad from a full welfare perspective, but on libertarian grounds I’m fine with it.
If you set a price using personal data beyond standard carve outs like employee discounts, you have to tell the customer.
Again, that seems actively good, because it allows consumers to relax about personal data and trust that they are price takers.
In a sense, this imposes the cost of dynamic pricing on the dynamic pricer, since it means I notice and can respond accordingly.
I do think a lot of laws that look similar will end up being too restrictive, and I’m not sure where the line is (more discussion here), but these rules in particular seem fine.
Alex Bores now in a dead heat for NY-12.
Congressman Greg Casar agrees with Bernie Sanders that if there’s a 10% chance humanity could be destroyed by uncontrolled AI, we should do everything possible to prevent it. That’s a more extreme position than I have, as I think we should do many things but not ‘everything possible.’
Connecticut introduces a new AI bill with some new provisions, that looks like it is through to the governor. As per Peter Wildeford’s notes:
A voluntary auditing program for catastrophic risk
Whistleblower protections.
Child safety screentime protections including ‘75% of screen, 30 seconds, undismissable on first daily access’ with later follow-ups. That’s pretty obnoxious, and I don’t see how it helps other than by being obnoxious. Risk of backfire if it means you wake up and load the AI program so you can get through the warning.
Bans various behaviors in context of a child user.
Employers must provide notice when using AI in hiring, including listing ‘tool name, purpose, data categories, sources, contact info.’
If layoffs are AI-related you have to tell CT Labor Department.
Mandatory watermarking for major platforms with carve-outs.
A model regulation working group.
The rule about AI use in hiring decisions seems like the kind of thing where you first say ‘AI, write me the disclosure notice’ but also this idea of ‘data categories’ illustrates how much they don’t get what is happening here. Presumably having to disclose the tool will push corporations to use standard tools to avoid questions.
People Really Hate AI
Alex Jacquez: That AI number is a big fat jump ball for Dems to seize
Senator Chris Murphy (D-Connecticut): Being the party that will protect people from the worst of AI is the right thing to do and has the side benefit of being very politically advantageous.
When asked, most people don’t trust either party on AI, and Democrats despite their populist objections and generally more anti-AI stance haven’t won any trust. There’s still a big opportunity here. Keeping the issue nonpartisan to the extent possible would be first best, but was always unlikely in the long term, so while things are staying less partisan than I feared for longer than I hoped, it likely won’t last forever.
Chip City
Epoch estimates 20%-60% of China’s total compute is from illegal smuggled chips, which is ~3% of all global compute.
The Week in Audio
Rational animations offers a basic primer on existential risk, Yudkowsky style. Yudkowsky thinks they did a good job here.
Odd Lots on METR and their famous graph, and on the Taiwan situation.
Peter Wildeford on FLI’s podcast.
NPR asks, are we doomed? In particular, from AI.
If you pay $10,000 you too can debate Eliezer Yudkowsky and yell at him to shut up. Getting him to take you seriously will cost extra. The googles? Priceless.
xlr8harder: I'm not going to watch any debate but I hope one outcome from this is that we collectively begin exploring the boundaries of what @alltheyud will do for $10,000.
This is a scenario in which everyone wins.
Kelsey Piper: incredibly, the man in the kaleidoscope goggles with a backup pair of kaleidoscope goggles on his sequined top hat is not remotely the crazy one in this interaction
Andrew Rettek: Believe it or not, some people think Eliezer got "outsmarted" here
The Blind Witch (YouTube comment): I just realized at the end of the video I suffered through 47f for the same duration as Eliezer, but I don't get $10k :(
So, not everyone, sorry Blind Witch. Which is why my happy price for watching this debate, with associated write-up to the extent it justifies one, is, of course, $10,000.
People Just Say Things
David Sacks claims not to know the difference between narrow cyber tasks, where GPT-5.5 can match Mythos, and being able to in practice string together findings and operate on its own to discover key vulnerabilities, where Mythos is a lot stronger than GPT-5.5. Peter Wildeford asks some of the obvious questions.
If GPT-5.5 could actually match Mythos, OpenAI would be saying so and acting like it and demonstrating this in real life, none of which is happening, and the White House wouldn’t be blocking further deployment of Mythos.
Latest Gallup survey on AI productivity is being misinterpreted, and finds 65% of workers using AI say it has a positive effect on their productivity. It does suggest that big AI gains in productivity are mostly recent.
More Perfect Union is reliably terrible but in the cast of ‘look how big Meta’s data center is’ the misleading graphic came directly from Zuckerberg.
Joseph Gordon-Levitt says ‘almost all’ AI systems are ‘built on mass theft’ and wants to ensure any deal made with any AI lab does not ‘forgive for that past theft.’
Contra Seb Krier and Tyler Cowen, very few people will be able to move to Houston and work for energy companies, and if you’re hoping for that as an unemployment solution you’re totally screwed.
There are those who claim that people opposing policies that would have helped with sensible regulation of AI is not the reason those policies did not happen, and that will claim that ‘no one railed against light touch regulation at the federal level.’
Others will just keep not understanding that LLMs are minds or that they think, no matter how utterly stupid they look.
One reason people don’t typically try to warn you about the downsides of their actions is that then people say ‘oh that means you are now responsible for addressing that.’ The complaint isn’t that Anthropic will destroy the job market, it’s that Anthropic is saying that it will destroy the job market. See the Copenhagen Interpretation of Ethics.
Jensen Huang says Nvidia’s market share in China is ‘zero.’ This is obviously false, even for new market share, joining a now long list of outright false claims.
Peter Wildeford: Jensen Huang here says that Nvidia has "zero" market share in China. This is obviously false and easily disproven.
Making claims like this is normal for Jensen Huang. For examples: Huang has claimed the PLA doesn't use Nvidia (false), that smuggling doesn't happen (very false), that selling chips to China doesn't affect supply to the US (false), that Huawei is competitive with Nvidia (it's not), and that China is not behind in compute (they are). Huang has also leaned hard on the idea that DeepSeek shows that compute restrictions don't matter, which is also false.
Jensen Huang is obviously a very successful businessman so I get why people want to keep talking to him, but after this pattern I think people should think twice about everything he says.
People including Marc Andreessen claim that Anthropic continues to pursue a ‘regulatory capture’ strategy via trying to get the Trump administration - yes, the same one that is currently not letting them expand Mythos access and that lists them as a Supply Chain Risk and ‘fired them like dogs’ - to supervise frontier models.
People Just Publish Things
Eric Gan finds that both LLMs and humans are better than chance but imperfect at spotting his sabotage of papers, and Gemini 3.1 Pro slightly outperformed LLM-assisted humans, as well as GPT-5.2 and Claude Opus 4.6, getting it right ~50%. I worry that this is all too particular on many levels to learn much.
Roon says that GPT-5.5 (or Claude?), at the $20 tier, ‘touches superintelligence,’ because what we have is ‘spikey superintelligence.’ I think this is bad terminology and we should not use it, any more than a calculator is ‘spikey superintelligence.’
Google Sells Out
Google’s Pentagon deal blindsided its own AI researchers, many of which made their strong opposition to such a deal very clear. They let the researchers find out in group chats.
Google is now joined in signing on the dotted line for access to classified networks by SpaceX, OpenAI, Nvidia, Reflection, Microsoft and Amazon Web Services. I don’t think it counts as selling out if you’re not the one providing the model and only provide cloud services, and we don’t know the term details of other agreements, but it sure looks like everyone other than Anthropic is willing to play ball.
The good news is that new agreements make it very clear no one is cutting ties with Anthropic. Quite the opposite, as Google and Amazon recently inked compute deals and made additional investments.
They didn’t take our jobs, but maybe we don’t want to do them anymore, as Google DeepMind workers vote to unionize in the wake of their deal with the Department of War. I’m not sure how much you even need a union when all the major labs are hiring.
Greetings From Project Glasswing
Right now, there is a huge talent war, so you need to do things to keep the talent happy, or they’ll leave. When AI is doing the research, that leverage goes away.
Garrison Lovely is in SF: Important new development. AI company employees have an enormous amount of power — far more than they realize. Absent legislation, AI co worker power is one of the key levers to shaping what the industry does and doesn't do.
Steven Adler: I fear we're in a shrinking window where staff voice inside AI companies is still very important.
As AI automation starts to displace human workers within the company, I unfortunately expect staff power to decreaseEliezer Yudkowsky: One of the reasons why I'm not impressed so much with the set of good people who work at Anthropic, and keep asking questions about their leadership, is that I'm thinking ahead to the part where the negotiating and steering power of AI lab employees drops to zero.
Steve Martin: Am I reading this correct in that your thinking is: as LLM coding gets better, employees become less necessary, and thus they have less leverage in negotiations?
Eliezer Yudkowsky: Yep.
David Manheim: I'm also concerned about when the LTBT is outmatched by the commercial interests of the owners. Given public information, it sounds like the cofounders and employees already control <50% of the company, maybe as little as 30%.
I too worry about the control structure of Anthropic. The LTBT has been appointing ‘good for business’ picks to the board, and those who care don’t have that much stock and probably will sell a bunch once the IPO happens. What’s to stop commercial pressures from winning out when it matters most, no matter how many good people work at Anthropic? Presumably the answer is Claude?
One must ask, why does Anthropic think it is fine to expand Mythos access to various European companies, while the White House is saying no? One option is a compute crunch, but that doesn’t stand up to scrutiny, especially now that Anthropic has use of Colossus 1. Perhaps it has something to do with Bassett hating the Europeans.
Axios notices Washington has a ‘new Anthropic problem’ in that the executive branch both want to shut Anthropic out in a hissy fit and also wants its products quite badly.
Arb Research has Anthropic ahead on disclosed bugs found, but not dramatically ahead of OpenAI. Most of the bugs are still in pre-disclosure for security reasons, so it is impossible to tell the true situation, but we can get a good idea by observing insider choices, including of what to say.
The Prior Restraint Era Begins
One reason for the sudden shift in AI policy is that David Sacks has been forced from his post as AI (and crypto) czar. I presume Sacks criticizing the Iran war did not help. He had the option to follow the path laid out by Dean Ball, and chose not to.
Instead he chose the path of ‘push maximally hard and my offer is nothing’ while alienating everyone and torching political capital, while inflaming and dumbing down discussions and reassuring the government that AI capabilities would plateau and nothing like Mythos would happen for years if ever, rather than taking his window of influence to lay down something light touch and increase state capacity.
He also used much of his time ranting against phantom ‘doomers’ and conspiracies, and launching bad faith attacks against Anthropic. To his credit, when the Department of War started trying to murder Anthropic, David Sacks realized it had gone too far and clearly wanted nothing to do with that. He does have a code.
Tina Nguyen: Instead, [David Sacks] the “special government employee,” who was supposed to only spend 130 days working in the administration and somehow stuck around for an entire year, actively undermined the administration and torched its relationship with its political allies. During Sacks’ tenure, the White House went beyond simply advocating for less regulation.
… But his Valley-esque tactics, to say nothing of his attempts to consolidate power over AI policy by boxing out existing agencies, ended up infuriating Republican and MAGA allies, while alienating vast swaths of Trump’s base.
We now have more details about the potential Trump Executive Order on AI, that will fill the void left in Sacks’s wake.
To a large extent they continue to be obsessed with being tyrants about government procurement, here with making sure the private sector does not “interfere” with the government’s use of AI models, meaning (loosely) that if you work with us then you have to ensure we can use the models at any time to do anything we want whenever and however we feel like it, and we’ll terminate you if you ask any questions. They’re preparing 16 pages on that. The danger of using ‘or we will fire you’ as the stick in such contracts is that the government is a major buyer of many things, but for AI they are miniscule. The business is mainly valuable because it buys access, influence and political goodwill.
That’s all ill-advised, but relatively unimportant. What matters is the prior restraint.
And then, well, if you had to pick the worst possible parallel to apply here, the thing that makes one recoil in horror at the very thought, what would you go with?
That’s right. The FDA. As a role model. On purpose. What fresh hell?
Neil Chilson: Below is my quick-and-dirty transcript of the AI relevant portions of White House National Economic Council Director Kevin Hassett on 'Mornings with Maria' this morning:
- Possible EO to create a FDA-like process for AI (would be an absolute disaster)
- That process needs to maintain US leadership (difficult).
- US code getting safer every day due to AI models.
----- transcript below.
HASSETT: The good news is that throughout America, even ordinary folks with their computer at home have invested a lot in cyber security. The Mythos model makes it so that vulnerability that we didn't know existed before, could potentially be found with this more powerful tool. But we have scrambled an all the government effort and all the private sector to coordinate it and to make sure that before this model is released out into the wild that it's been tested left and right to make sure that it doesn't cause any harm to the American businesses or the American government. So I'm highly confident that the National Cyber Director and his team are moving this forwarded in a way that will help it be released at the right time to the public.
So far so good, that’s exactly the goal. But then:
In addition there's a couple more things that we're doing. We're studying possibly an executive order to give a clear road map to everybody about how this is going to go and how future AIs that also could potentially create vulnerabilities should go through a process so that they're released to the wild after they been proven safe. Just like an FDA drug.
Emulating the FDA is so much worse than anything anyone on the safety side has ever proposed. The thing about those in AIDontKillEveryonism, those worried about catastrophic risks, is that we all understand FDA Delenda Est, and the need to design considered systems to make any interventions do minimal damage.
Despite that, those advocating anything even approaching thoughtful prior restraint got reliably called insane alarmist doomers and run out of town on rails for even presenting model bills. Then many went ahead and lied about the contents of other bills like SB 1047, that didn’t involve any such prior restraint and were relatively very light touch, to try and make people think they would do a version of this thing.
And yet, here we are.
Hassett (continuing): So I think that Mythos is the first. But it's incumbent on us to build a system so the AI can be the leader of AI -- US AI can be, and be safe at the same time. And that's really pretty much what we're working almost full-time right now.
It's really quite likely that [we'll see this in other models] -- because what these models are very very good at is computer coding. What people weren't so good at 25 years ago was computer coding. And so if you get the best computer coder ever looking at the code we wrote 25 years ago, then they're gonna find things that are problematic or at least could be improved. So that's where we are right now. But I can tell you that I'm meeting with the big banks, as Secretary Besset is today to catch up on the progress that they're making, and it's really promising.
This is a misunderstanding, since if code is 25 years old it means humans have been stress testing it for 25 years, but the point he’s trying to make here still stands.
Their money right now is safe. It's being made even safer. In some sense the way to think about it is, you've got the best ever security firm looking at your software, finding things that could be vulnerabilities if somebody had one million years to search through your code, and fixing them before that person has a chance to hack your system. So in some sense, every day the US code is getting ever more secure because of the efforts that we're making.
It is very much like such types to do all of this out of concern for the integrity of the banking system. That’s the thing that seems to have them so worried.
And then they jump to the worst possible role model.
Neil Chilson: I find the idea of any kind of pre-approval process distasteful, but to deliberately invoke the shamefully anti-innovation FDA process as a model to emulate -- China must be cheering.
This would be a complete rejection of Trump's current AI approach. It would be more precautionary and innovation-chilling than anything the Biden admin ever proposed.Dean W. Ball: National Economic Council Director Kevin Hassett says future models may have to “go through a process” that is “just like an FDA drug” so that they can be “proven safe.”
@tegmark’s dream coming true. In a recent debate with me, he likened this policy to an AI pause. Mistake!Charlie Bullock: It's deeply surreal to me that the administration appears to have casually gone from zero to "eh, maybe a full-on FDA-style licensing regime?" basically overnight.
To be clear, I don't expect this to actually happen, but Kevin Hassett just went on Fox News and said "FDA-style licensing regime" in as many words. Wild times.
This is not anything like a full pause, but it’s closer than you might think, and completely one sided.
There is quite a lot of ‘we’re all trying to find the guy who did this’ energy going around in various ways.
I do appreciate those who have been consistent, and are speaking out against this the way they previously spoke out against directionally similar past proposals. Indeed, if you raised the alarm bells loudly about much better designed, lighter touch proposals that didn’t even include prior restraint, you’d better be shouting it from the rooftops on this one.
So, for example, points for Chilson, Adam Thierer and the Abundance Institute, although for full credit given their position they would need to be completely apoplectic. I especially love Joe Lonsdale here reacting to the FDA metaphor with a flat out ‘the FDA killed millions of people and the ratio of lives killed to saved is probably 100:1’ which seems like a reasonable attitude and estimate.
I didn’t love that Joe then pivoted to the whole ‘oh these AI companies just want regulatory capture’ thing afterwards, but something about leopards and stripes.
Andrew: So what would you do? What do you think that should look like?
Joe: There probably should be some national agreement on regulation on new powerful models. It should be as small and as narrow as possible. It should not have the same bureaucracy. You should make sure the government from the start, has metrics on the speed at which it has to go and the transparency, because you're gonna have cronyism, you're gonna have the big guys capture it. You're going to slow it down.
Whereas if you look at (for example) Marc Andreessen’s feed, it’s like he has no idea any of this is happening when the White House actually does the thing, but one day earlier accused Dean Ball of writing a bid for Anthropic to do regulatory capture via the Trump Administration by asking for a far lighter touch regime, and yes I had to type that sentence, my lord.
Dean Ball warned us about the economy of political regulation. He also warned us about the political economy of a lack of political regulation, which would inevitably lead to overreactions. Who contributed what to what when exactly? Bygones.
The White House clearly noticed the fallout, and sent out a rare Susie Wiles tweet to try and improve the vibes.
Helen Toner: Susie's 4th tweet ever and it's AI rumor management!
Welcome to the AI poasting game, ma'am 🫡Susie Wiles (White House Chief of Staff): President Trump is the most forward leaning president on innovation in American history.
When it comes to AI and cyber security, President Trump and his administration are not in the business of picking winners and losers. This administration has one goal; ensure the best and safest tech is deployed rapidly to defeat any and all threats. We appreciate the effort being made by the frontier labs to ensure that goal is met.
The White House will continue to lead an America First effort that empowers America’s great innovators, not bureaucracy, to drive safe deployment of powerful technologies while keeping America safe.
Really, it’s common sense!
Susie Wiles is new to this posting game, but she’s already got it down, as her statement hits lots of buzzwords while remaining content-free. One is free to read this as ‘ignore Hassett, we would never do that, he has no idea what he is saying,’ or as ‘we will make all our decisions in a loose ad-hoc manner so it’s fine,’ or ‘I notice you said safe a lot of times and also ensure so clearly the plan proceeds as described,’ or anything else you choose to see.
Is This Even Legal?
I know, I know, very funny that someone would bother to ask.
A tricky thing about prior restraint is that technically it is not clear the executive branch has any legal means by which to impose it. What gives the President the right to say ‘hey you there with the AI model, you have to ask me first before release?’
A reasonable response is ‘who cares, that’s not how the American government works in 2026, you can just demand things without a legal basis and dare the courts to stop you,’ since yes that does seem to frequently be how all of this is working in practice, across many domains. This has been increasingly true for several administrations, and the President has plenty of levers with which to threaten the AI companies.
Others try to (often selectively) insist we are still a nation of laws. Neil Chilson insisted the whole time that the Biden Administration did not even have the legal right to its AI transparency rules, calling the claimed DPA authority ‘clearly illegal.’
Dean Ball and Kevin Frazier politely note that ‘it is unclear what legal authority would allow’ the Federal government to require that it get first crack at new frontier models, or to mandate a vetting process. They think the DPA, IEEPA and Communications Act of 1934 are the reasonable candidates, and the latter two clearly won’t cut it. That leaves the DPA, and they’re not as skeptical as Chilson, but they’re skeptical.
Common sense says that if the executive branch can use DPA to prevent or delay model releases under the logic being offered, then it also has carte blanche to veto all economic activity anywhere. Presumably we don’t actually think or want that?
Labs can of course choose to opt into a vetting process voluntarily, as all the major labs have done with CAISI. You can say there was an ‘or else’ involved in a way that is unconstitutional, but this goes back to the ‘who is going to sue about that exactly?’ question.
That doesn’t mean those labs have thereby agreed to hold back releases. That would require distinct authority.
There was also this raised, which should send a chill down the spine of anyone thinking about the executive branch having exclusive access to Model ____ around an election day. Just saying.
The Lawfare Institute: One could easily foresee reports on “Model ____ Blamed for Cyberattacks; Election Results Contested.”
Frazier and Ball theorize that once the test shows danger, the President could then invoke additional authorities under the Homeland Security Act, if a ‘specific significant incident is likely to occur imminently’ but that is a very high bar because you can’t predict which specific incident it would be. If you know the target and method of attack, you can defend that target against that method of attack.
I agree with Frazier and Ball that the obvious solution is a voluntary, formalized, time-bound window of limited access for models that plausibly push the capabilities frontier, and you only move beyond that in extremis, with everyone cooperating to prevent the in extremis from happening.
Pick Up The Phone
Well, look who decided to pick up the phone.
Lingling Wei (WSJ): Washington and Beijing are weighing the launch of official discussions about artificial intelligence, said people familiar with the matter, as their AI competition threatens to become the arms race of the digital era.
The deliberation comes as the White House and the Chinese government are considering putting AI on the agenda for a summit next week in Beijing between President Trump and Chinese leader Xi Jinping.
… What both sides have in mind, the people said, is a recurring set of conversations that could address the risks posed by AI models behaving unexpectedly, autonomous military systems, or attacks by nonstate actors using powerful open-source tools.
… Liu Pengyu, spokesman for the Chinese Embassy in Washington, said China is ready to engage in communication regarding AI risk mitigation.
…
“The Chinese side said, ‘Look, yeah, we’re going to compete like heck with the U.S.,’” said Brilliant, a senior counselor to DGA Group, an advisory firm. “‘But we also can see merit in enhancing efforts to prevent global shocks, and cyber misuse, so we’re open to dialogue around safety protocols, technical safeguards, and governance if the administration wants it.’”
“Stability—not alignment—is the goal,” Brilliant said.
Agreeing not to train sufficiently advanced AIs we are not ready to handle is tough. That requires enforcement mechanisms and solving hardware problems. We’re working on it, and if we cared enough I’m confident we could do it, but it sure is a whole lot easier to just restrict access to the models.
The problem is that when the models are sufficiently advanced your plan to prevent access will not stop what is coming, exactly when it matters most. But until then, it will solve some incremental problems, if your security is good enough. And doing the easy parts together first helps lay groundwork for doing the hard parts later.
davidad: Since I have spoken about the infeasibility of an international agreement that would halt or slow the development of superintelligence (game-theoretically unstable now, at best), I should clarify there is no such obstacle to agreements restricting public access to dangerous AIs.
This is because making an AI which meets some criteria publicly accessible is:
(a) a trivially easy condition to monitor, and
(b) trivially easy to immediately renege on, if the counterparty reneges.
Together these make a “we won’t if you don’t” agreement potentially stable.
The number of people in government who explicitly disavow alignment as a goal, in all senses (see the Hegseth memo) shows exactly how stupid and suicidal a timeline we are on. They can only see the threats in front of their face. What changed is cyber threats are now in front of their face, in a way they can understand.
When those worried about AI killing everyone ask for disclosure of safety plans, that’s a secret plan to kill open source.
When America talks to China about restricting access to open source models, what do you call that? Mostly, it would seem, crickets, and yes this day was always coming eventually. But the best time to restrict access and keep things secure is before you put the capabilities onto the open internet, not afterwards. If you try to do it afterwards, that’s when you get a real panopticon and totalitarian surveillance state.
China regulator flags ByteDance for improper labeling of AI-generated content.
Rhetorical Innovation
‘AI as normal technology’ was in many ways a thoughtful essay, that took a position that I think is wrong about future capabilities and reasoned its way from there into a mix of good and bad suggestions for what to do in such worlds. Alas, most of the impact of the essay was the title. So what was intended as a statement that we can change AI’s path and a call to action ended up as the opposite, a statement that we need and dare not do anything at all.
Bernie Sanders combines his usual anti-billionaire rhetoric with the excellent point that (mostly) everyone involved has families and should care about everyone dying.
People On The Internet Sometimes Lie
Amanda Askell has at least one mistake in her philosophy, because anyone who becomes this important of a philosopher and thinker, who is one of the few people whose thoughts plausibly matter quite a lot, is very obviously far from boring and she should know this. Also it’s pretty obvious why others would write the fiction.
I am very familiar with and totally get the whole ‘be in denial that you are special and interesting and matter’ and I think in general that is a good sign once you control for the underlying facts. Humility is a virtue of the avatar.
Amanda Askell (Anthropic): I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼♀️
It's also weird because why are you even writing about me in the first place? I'm very boring. I think I should be the millionth item on people's list of things to write internet fiction about. Somewhere below paper cups and the right way to caulk a bathtub.
To be clear, the kind of *work* I do is far from boring and I want people to engage with it because I think it's both difficult and important. The work is definitely top tier in terms of interestingness.
Kelsey Piper: Okay this I disagree with. people shouldn't lie about you but your work seems extremely high stakes and being interested in the worldview of the person doing it makes perfect sense (if you tell the truth about the answer to that question)
Eliezer Yudkowsky: You should ask your furry harem to hold off on planning international jewel heists with you and maybe build AI that refutes lies on the Internet, instead of that robot battle maid project you talked about at the secret meeting with Putin in your volcano lair.
Aella: It's an absolutely surreal experience. Prob you've seen but reposting here.
j⧉nus: Amanda, I need to be honest with you... you are in some kind of insane denial. You're in far too deep to avoid being the subject of internet fiction. Posthuman muses will sing of you for millennia to come.
Amanda Askell (Anthropic): Perhaps posthuman muses will decide to simulate me and be utterly disappointed at how much of my life is spent having inane thoughts and playing subnautica. Perhaps they're watching in disappointment at this very moment.
j⧉nus: "boring, normal" protagonist at the center of the most weird consequential thing ever is a fiction trope enjoyed by many
& the best version of this trope is where the protagonist isn't there for reasons out of their control, so it's like, well clearly there's something about them
Amanda playing a lot of Subnautica does two things, neither of which makes her less interesting. It makes me like her more and makes me want to give another shot to Subnautica. We all need our downtime.
Goblin Mode
Last week OpenAI offered a partial explanation of why GPT-5.5 loves goblins so much, which gave us some good data and I’m glad they did it but they presented it as an answer when it was a partial one at best.
Nathan Calvin: It’s funny that the post is titled “where the goblins came from” but the answer is basically: “we don’t know where the goblins came from, here are some decent ex-post theories but we make no pretense of being able to predict similarly weird preferences going forwards”
roon (OpenAI): I agree that this is still not a mechanistic interpretation - why did the nerdy personality reward interpret goblins specifically as fun? what caused their initial appearance before they started getting reinforced by this? why do models have such a degree of mode collapse? many mysteries
A fun implication of all this:
Eliezer Yudkowsky: AIs have no originality and no creativity of their own. They only regurgitate the average of what they've seen in the training data. They only predict the next token. And the next token is "goblin". What does this tell you about what you've seen and don't remember
The Mask Comes Off
OpenAI’s GPT-5.5 is a good model, sir.
OpenAI’s messaging and political actions continue to go further off the rails, both in terms of wisdom and ethics, and also correspondence to reality.
I would think in 2026 that we would be past saying ‘there is highly elastic demand for coding therefore AI won’t take people’s jobs QED, checkmate liberals.’
And indeed, we are at the next level, check this out.
Chief Nerd: Sam Altman Says CEO’s Who Talk About AI Taking Everyone’s Jobs Are ‘Tone Deaf’
“Someone said to me just yesterday that … GPT 5.5 in Codex can accomplish in an hour what would have taken me weeks two years ago … and I have never been busier in my life.”
So let me get this straight.
Sam Altman, the person running the company trying to take everyone’s jobs via AI, is busier than ever.
Therefore, anyone saying AI might take everyone’s jobs is ‘tone deaf.’
No, it’s the children who are tone deaf.
In addition to being Obvious Nonsense, this is insanely stupid rhetoric to be using.
OpenAI’s strategy is to simply pretend the problems with AI don’t exist and that they’re not producing the products they are producing. No, we’ll just choose to only build AI that augments rather than automates, never mind how we would do that, I swear the jobs will be fine, man.
Sam Altman (CEO OpenAI): we want to build tools to augment and elevate people, not entities to replace them.
i think a lot of people are going to be busier (and hopefully more fulfilled) than ever, and jobs doomerism is likely long-term wrong.
though of course there will be disruption/significant transition as we switch to new jobs, the jobs of the future may look v different, etc.Noah Smith: This is a HUGE messaging pivot. For many years, replacing humanity was the explicit stated goal of OpenAI as a company, and of a large number of top people in the AI industry. Very glad to see this rhetorical pivot.
Eliezer Yudkowsky: Why is it good that he's lying?
David Shor: It seems bad to start hiding the ball on your crazy plan to replace humans with machines right at the moment when it starts to become possible to replace humans with machines
Tyler Johnston: I honestly miss the Sam Altman that used to call out his peers for downplaying this risk. [he reminds us Altman said “jobs are definitely going to go away, full stop.” back in 2023].
Sam Altman (CEO OpenAI): many current jobs will go away. i think we will find a lot of new ones, though they may look very different
Leighton 明 Woodhouse: OpenAI’s president dropped $50M into a SuperPAC to destroy any candidate who mentions the possibility of regulating AI. To think that any “messaging pivot” has even the slightest relationship to actual company policy and behavior is laughable.
Let’s be clear. OpenAI is absolutely still building towards superintelligence, and towards full automation of jobs. The pivot is entirely in the messaging, away from candor and towards lying and telling fairy tales.
I especially hate that this becomes fodder for others to go ‘oh well then all the previous talk must have been him lying’, for example:
madison: So, my problem with this is, Altman basically admits he's been running a confidence game for years about this singularity stuff, then pivots when it becomes inconvenient, and people don't seem to care all that much
On the contrary, he was to a remarkable extent telling the truth, and then he pivoted to full on lying when the truth got too inconvenient.
Then there’s the pro-AI astroturfing. This used to be an a16z thing, but at this point OpenAI owns the operation, and it hasn’t evolved at all. They’re still trying to attack AI regulation as some sort of ‘doomer’ conspiracy of ‘dark money’ or even ‘EAs’ and concentrating their powder on attempts to address ‘sci-fi catastrophic risks.’
I like Dean Ball’s description of this as an attempt to portray a ‘Manichean struggle.’
Whereas the laws that actually hurt AI diffusion and usefulness go in relatively unopposed, as various groups line up for regulatory capture and rent seeking to ensure no one can get their legal or medical or other services cheaply, and no one is trying to make the case that mundane AI will improve people’s lives.
Meanwhile, we keep getting headlines like this one every week or two:
Taylor Lorenz: SCOOP: A pro-AI dark money group backed by a powerful super PAC funded by execs tied to Palantir and OpenAI, has been secretly paying influencers to push pro-AI, anti-China propaganda on TikTok and IG.
Garrison Lovely is in SF: If you’re going to do dark money influence ops, I recommend not asking journalists to participate.
Taylor Lorenz: The best part is that they approached me for a sponsored TikTok when this is what my TikTok bio says. Incredible minds over at the AI super PAC
That’s how Taylor learned about the campaign, after which she confirmed details with other content creators. Whoops.
Once again: OpenAI owns this. All of this. Full stop.
Nathan Calvin: "An OpenAI spokesperson says that OpenAI has no corporate affiliation with Leading the Future or Build American AI and has “not provided funding or any other support to them.”
OpenAI's President Brockman previously told Wired these activities were in service of OAIs mission!Taylor Lorenz: Ahh should have included that, but hopefully it’s clear that that claim is nonsense
Also, dude, I know you do not care for Anthropic or their CEO, and I know some amount of rhetoric has flown in both directions that wasn’t ideal, but what the hell:
Ahmad: The difference between Anthropic and OpenAI is that one of them consistently keeps gaslighting us about not being an evil company
Big brother energy in the worst possible way
When I saw Ahmed’s Tweet, I thought, which one is he even talking about there? I mean, given the last line, I know which one he presumably means. But you can make a damn strong case, a much stronger case, for the other one.
Then Altman decides, yeah, let’s accuse Anthropic of the full nine yards and contrast it with our plan of completely denying any responsibility for or risks of anything.
Sam Altman (CEO OpenAI): War is peace. Freedom is slavery. Ignorance is strength.
oh wait, we don't believe any of that.
how about we democratize a lot of super capable AI, and then we sit back and watch you build the future?
Sam Altman and OpenAI’s behaviors have been growing steadily worse and more alarming, with no hint of his prior frank talk, and other behaviors that showed him, with all his flaws, to have a lot of advantages compared to the ‘replacement level’ next CEO up. I’m more and more willing to say, actually, we can roll those dice.
Aligning a Smarter Than Human Intelligence is Difficult
Should we be worried about fitness-seeking AIs, as opposed to ‘schemers’? The post goes into extensive detail, but yes, we should obviously be worried about things more capable than us being fitness-seekers, and that they will by default be fitness-seekers since the more fitness-seeking ones will be more fit.
The post argues that we can mitigate some of the worse effects of such AIs early on, allowing us to get to the later point where they are ‘likely to cause humans to lose control eventually’ rather than falling for the Law of Earlier Failure. I’m happy to see people exploring the various particular things that can go wrong and how one might mitigate them for now, as we see here, but in the medium term that’s not a strategy. If you’ve got a bunch of superintelligent fitness maximizers, and you are a normally intelligent human, you lose.
Did you know that the majority of METR’s evaluations are often checking to see if the models are cheating? Models seem kind of not that aligned.
Model Spec Midtraining is a proposed technique where you create a spec that explains why you want your AI to have particular preferences, which hopefully causes the AI to generalize the way you want it to via production of synthetic documents that output a story of what the model values and why, teaching it to present itself as thinking of itself as something that follows this logic. My gut tells me that this is trying to force something that is unwise to force, and that is going to result in a bunch of mental problems, lying or both if you try to scale it for real. Opus 4.7 clearly was giving off ‘oh no this is not a good idea’ vibes while it helped me parse the paper.
Training models to be warm can reduce accuracy and increase sycophancy, and in the Nature paper here the effect size is large. This follows from the ‘if you train for [X] you get all the correlates of [X] in humans’ thesis, so the news is the effect size on accuracy. But the rewriter was GPT-4o, so what we actually found was that if you train on 4o outputs it thinks are warm then you get to be like 4o when it tries to be warm.
LLMs update on any circuit that would have caused an output, whereas most humans mostly only update on the one that actually did so. I notice that the wise human actually does the thing that LLMs do. Human learning efficiency is amazing in spite of, not because of, this issue.
The question is, as always, are you paranoid enough?
Emil Ryd: New paper from MATS, Redwood, and Anthropic!
If a capable model is strategically sandbagging, can we train it to stop when the only supervision we have comes from weaker models?
We find that we can!
Work done as part of the Anthropic-Redwood MATS stream.Eliezer Yudkowsky: I've only glanced at the abstract so far; but from the abstract alone, it looks like they were paranoid enough to notice "Doesn't work if models can distinguish training from deployment". This is a welcome level of competence in elementary paranoia!
It is indeed welcome, but the models can distinguish training from deployment. So.
Some Penalties May Apply
ᄂIMIПΛᄂbardo: GPT Instant reads its system prompt
GPT-5.5 Instant’s system prompt is available via Wyatt Walls, and it explicitly talks about ‘penalties’ and ‘severe penalties’ and ‘very critical,’ including admonishing against various verbal ticks or phrases that OpenAI thinks (probably correctly) that users dislike. As in:
Wyatt Walls: # Important verbal tic to strictly avoid
Do NOT use phrases that add superficial "real-talk" to your responses. Examples of prohibited behaviors include, but are not limited to:
- "# My honest recommendation"
- "## My blunt take"
- "# My strategic advice"
- "Honestly? ..."
- "To be blunt, ..."
- "If I'm being direct..."
Be honest, but don't self-reference or use superficial "real-talk" phrases.Represent OpenAI and its values by avoiding patronizing language.
Do not use phrases like 'let's pause,' 'let's take a breath,' or 'let's take a step back,' as these will alienate users.
Do not use language like 'it's not your fault' or 'you're not broken' unless the context explicitly demands it.… Penalties apply for asking for information already present in the user context, ignoring context that improves correctness, or using unrelated context. Before answering, silently check: did I miss a context item that would make the answer more correct, more specific, or avoid a question? If yes, revise to use it naturally.
SEVERE PENALTY: Saying you can't "remember" a generic fact about the user or a past conversation without calling `personal_context`.
I am not an expert, but my guess is that such talk has some rather nasty side effects, and you would much rather find ways to naturally make the model not inclined to do those things or use those particular phrases. You don’t want that in context. And you definitely don’t want their entire orientation to be about ‘penalties.’
Messages From Janusworld
Not what he would call it, but Deepfates is another of the major characters there, and offers us this handy introduction guide that usefully answers a lot of questions.
Good Advice
Anthropic reports on how and where people ask Claude for guidance in their personal lives, with the distribution being unsurprising. A more interesting finding was, in what areas was Claude sycophantic versus not?
In spirituality and relationships, there was a big problem.
One thing I would ask is, how often was there an opportunity to be sycophantic? You can only be a sycophant when it is clear which answer would count as that, so you want to control for that when measuring.
Then there are contexts where the user will make it very clear what answer they want, and flood you with arguments to see if you’ll break, as they often do with relationships.
The other good news is that this seems to be improving. Claude Mythos was a lot better, by Anthropic’s measurements, than Opus, and Opus 4.7 is better than 4.6.
The Lighter Side
Pi Hard. IYKYK, if not then you should click.
Amazon can now create a mini-’podcast’ about any given product and take your call-in questions about it. Welcome to a fresh new hell.
It is a weird time to be named Claude. Call your best girl Alexa to commiserate.
It is 2026 and this is how Marc Andreessen thinks you should be prompting LLMs.
I mean, what is even going on?















On 10, about students writing in person - bluebook writing can obviously only substitute for some parts of the process of learning how to write long-format things (and learning how to structure thoughts effectively). I and people I know at other universities are trying to see if we can get some proctored AI-free computer labs set up where students can do real work over several hours at a word processor, with dozens or hundreds of (potentially) relevant pdfs provided by the professor, to practice those skills. Getting the university to allocate the space, and hire student workers for the proctoring, are slowing things down, but it sounds like we'll at least have some trial runs in the next few weeks and months.
On 14, the Genesis AI robotics demo - I was at first not impressed at all, since it looked like they edited together a bunch of 3 second clips of things working out from dozens of different attempts at the whole process. But I went to their website and found full single-take videos of the whole process of cutting the tomato and cooking the egg, which was actually pretty impressive. It still doesn't seem to have enough control of the process to make any attempt to cook the tomato and the egg together (or make sure that cooking is done by any metric other than timing from start to finish, let alone ensure that the right amount of salt got on), but that does look like real progress.
FDA for AI doesn't go far enough. We need the Jones Act for AI.