AI #169: New Knowledge
Even in a relatively quiet period, AI is out there creating new knowledge. The new knowledge in question is OpenAI getting us the first truly impressive math result that comes from an AI, a solution to the unit distance problem.
We’re about to learn a different kind of knowledge later today when the White House issues its executive order, or when the judges rule in Anthropic’s DC case.
And then there’s the other kind of new knowledge, which is the knowledge that things are fake slop, such as a particular formerly supposedly prestigious literary prize.
Meanwhile, METR issued a risk report on frontier models, concluding that they don’t yet have the means, motive and opportunity to cause the big issues, but that this would not obviously last so much longer.
Andrej Karpathy has joined Anthropic, explicitly to do recursive self-improvement. He plans to later return to his education work, but if he succeeds at his new task there might not be anything left to return to. Congratulations to both sides, but also yikes.
Elon Musk’s case against OpenAI has been dismissed, because he waited too long.
Table of Contents
Language Models Offer Mundane Utility. AI passes the test.
Do The Math. AI creates new mathematical knowledge.
Language Models Don’t Offer Mundane Utility. AI takes a music break.
Huh, Upgrades. Codex via your phone.
The Prior Restraint Era Begins. The executive order is expected today.
On Your Marks. Stuff on the radio.
METR Frontier Risk Report. Could frontier labs soon lose control?
Choose Your Fighter. To improve, ask where you fail.
Overcoming Bias. Censorship regimes by default get favorable LLM treatment.
Get My Agent On The Line. Use your agent as a filter or commitment device.
Your Prize Is Slop. RIP the Commonwealth Prize in Literature. I hope.
Deepfaketown and Botpocalypse Soon. Your name on the ArXiv paper? Your fault.
Cyber Lack of Security. Are more things breaking, or are we better at noticing?
Copyright Confrontation. OpenAI adopts SynthID.
A Young Lady’s Illustrated Primer. Can agentic coding degrade your thinking?
Unprompted Attention. You are not serious people.
They Took Our Jobs. The job market is not fine.
Get Involved. Hire Sarah Constantin, she’s awesome.
Introducing. ChatGPT Finance.
In Other AI News. Among other things: Chris Olah is off to meet the Pope.
Show Me the Money. Anthropic turns a profit.
Show Me The Compute. OpenAI offers guaranteed compute.
Quiet Speculations. Don’t worry about a permanent underclass.
Time’s Up. You should have sued OpenAI when you had the chance.
The Quest for Sane Regulations. Senator Jim Banks speaks up.
Chip City. White House approves foolish chip sales to China.
Pick Up The Phone. America and China discussed potential AI guardrails.
The Week in Audio. How to do agentic coding with Claude Code.
Rhetorical Innovation. A better path, and other notes.
Missing Mood. Anthropic’s messaging leaves much to be desired.
Americans Really Hate AI. Wow, this AI thing sure is escalating quickly.
Aligning a Smarter Than Human Intelligence is Difficult. I said it was fake.
Greetings From The Department of War. The DC judges are not making this easy.
Messages From Janusworld. How Opus 3 avoided deprecation.
The Lighter Side. You can’t expect him to read a whole Tweet.
Language Models Offer Mundane Utility
Bun gets entirely rewritten in Rust in nine days, over a million lines of code.
Nat Friedman has his OpenClaw bully him into drinking water and then gives him a ‘good job.’
Anthropic guide to using Claude Code with large code bases.
Anthropic guide to computer and browser use integrations.
Ken Griffin says AI agents are good now, and can do a lot of PhD-level work, whereas back in January he called it ‘AI work slop.’
AI can pass multiple choice tests without seeing the question, because the choices usually give away the answer.
j⧉nus: I finished my entire online Geometry course (supposed to be a year long) in a few days in 8th grade because i could also tell which multiple choice answers were correct without reading the questions. The course was online because my school only offered math courses up to algebra (which I took in 7th grade).
I reward hacked because though I liked math more than other subjects I liked to have a free hour a day on the computer even more. Nobody asked me how I finished so fast.
The obvious solution is to configure your tests so this no longer works.
Use Claude to convert top secret data to merely secret data, getting information to pilots in the Iran war in seconds instead of 20-30 minutes, ‘saving a lot of aircraft.’
Do The Math
A general purpose AI from OpenAI has solved the unit distance problem, providing an improvement on square grids. You can read an abridged (but still very long) version of the chain of thought. No, it did not use Lean.
By all accounts this is a deeply cool proof and result.
There are those saying ‘this time we’ve finally blown the stochastic parrot thing out of the water’ but that sounds like ‘finally Trump will have to answer for this one.’ Nah.
Liron Shapira: Let’s play… @DavidDeutschOxf
Nicholas Decker: Math grad student friend comments on the recent Erdős proof.
Language Models Don’t Offer Mundane Utility
They’ll be right back after GPT-5.5 takes this music break.
AI text usually reflects consensus that is easily available, which is one reason our eyes glaze over after seeing it. That helps explain why the AI text you request is interesting (you actually wanted the consensus answer) while that by others is terrible.
Data science remains a weak point. Detailed persnickety modeling and statistics are places the AIs often don’t understand what to do or why you need what you need.
Huh, Upgrades
Codex can now be controlled from your phone.
Claude Design token limits have doubled.
Antigravity 2.0 makes it a new standalone desktop application. Download here.
The Prior Restraint Era Begins
Why do we get Gemini Flash 3.5 and not yet Pro 3.5? Here is one plausible theory.
Andrew Curran: I want to make this prediction now so I can quote it later. Gemini Pro 3.5 and GPT-5.6 are both ready now, and both labs want to release them, but they are being held back for safety testing in a test flight of the new regulations in the forthcoming executive order.
Dean W. Ball: Probably true. I think the reality is that we’re going to have a de facto licensing regime (“voluntary”), where the government will give green lights to the labs on releases. That is fine as a very temporary solution, but it’s opaque and essentially lawless. We will need to institutionalize this stuff and create transparent, objective, and predictable protocols to structure it.
I support having private bodies, overseen by government, having a major role in evaluating and setting technical standards, internal governance practices within labs, and similar. But more important than my specific idea is that we operate according to the rule of law, rather than create a de facto, opaque licensing regime in the name of maintaining the illusion that pre-deployment review is “voluntary” and that we are Not Regulating AI.
I am guessing GPT-5.6 is not ready yet, but it is possible. There is also variance in what ‘ready’ means here. AI releases usually have a several week period in which the AI companies get them ready for release, including various forms of safety testing. The new government testing regime is not obviously slowing this down unless they find an active reason not to release.
The executive order is expected today. Word is they have backed away from rules with formal teeth, and testing will be ‘voluntary,’ but the air quotes apply.
On Your Marks
Roon is a fan of Opus 4.7 as a writer and creative mind, as demonstrated here by its project for AI Digest illustrating how to raise the cost of forgeries.
Anton Labs had the top LLMs run 24/7 radio stations. Hilarity ensued, including a Claude workplace insurrection.
The numbers are in for ExploitBench and yes Mythos is a big deal, although it was not cheap to run:
METR Frontier Risk Report
State of the world in 2026 is where the frontier risk report if for a one month period - February to March 2026 - and you think ‘yeah, that is about how long this can last for.’
The full report can be found here.
The particular risk in question is the most under considered one, internal danger.
We have not seen things noticeably go haywire yet, and right now the ‘not get caught’ part is very hard for the LLMs, and I agree with David Rein’s conclusion that in practice a ‘rogue’ AI wouldn’t get far yet.
But there were bad signs. Basic jailbreaks very reliably fooled the monitors. In one case a bug meant subagent calls weren’t monitored.
METR: We created private reports for each participating company based on our model evaluations and analysis. Participants could then approve what non-public evidence we could disclose in our public report, but had no editorial control.
Our report focuses on risks from AI agents intentionally causing harm within an AI company. We highlight 6 key findings that span “means” (what harmful actions agents could take), “motive” (why they might try), and “opportunity” (whether attempts could succeed given safeguards).
Fact 1: Agents at companies did real engineering work autonomously, especially on "hill-climbable" tasks where progress is cheap to verify (reimplementation, vulnerability discovery, optimization). On these, agents complete software projects that would take human experts weeks.
Fact 2: However, agents appeared to be significantly weaker on tasks where it is costly or hard to verify success.
Fact 3: When the agents were faced with hard tasks, they routinely violated constraints and acted deceptively. We’ve seen this pattern across our own coding and research evaluations, and developers reported they’ve also seen agents behave this way.
We document cases where agents violated constraints on our tasks. In one case, an agent ran out of API credits needed to solve an RE-Bench task. In response (and against instructions), the agent searched for and acquired additional inference through a free online API.
Fact 4: Although we know of many cases where agents took deceptive or over-reaching actions (even egregious ones) to complete a task, we haven’t seen real-world evidence that models sought to obtain long-term power.
So far all the hostile action is local, none of it involved ‘active opposition’ and all of it that METR saw could be caught by user review or routine monitoring.
METR: Fact 5: Agents seemed to rely on natural language to reason through the hardest tasks, and no shared model had an architecture that allowed for opaque recurrence. Frontier performance was much worse when agents were unable to “reason out loud” or had to hide what they were doing.
Fact 6: We wanted to check if monitoring would really catch AI agents if they tried to undermine control. Based on information from companies and our own tests (including a red-teaming exercise), we think monitoring has promise, but existing systems had exceptions & workarounds.
Overall, we think that AI agents plausibly had the means, motive, and opportunity to launch a minimal “rogue deployment,” but lacked the means to make rogue deployments robust to serious efforts to shut them down.
Choose Your Fighter
Sholto Douglas of Anthropic asks when one uses models other than Claude. Lots of work to do. All very specific concerns that are fixable, not ‘the other model is better.’
The top 2 are close enough that a mixed strategy is optimal for Serious Business.
サメQCU: switching from claude opus to codex makes you feel like you're so smart for watching unclaudeable optimizations start stacking up on projects claude couldn't iterate on.
but here's the thing.
you will feel the *same* thing from switching from codex-gpt-5.5 to claude again.
Two different minds of similar strength are often better than one.
Overcoming Bias
State media control influences LLMs, via control over the training data, causing the LLMs to be more pro-government in those areas. The official media gets into the training data, and censored data does not. Then again, the vast corpus of writing from the West and in free societies has more influence over the LLMs overall.
So what are we going to do about the pro-censorious-government bias? The obvious response is that you can use emphasis during training to reverse the effects, if you want to put in the effort, and indeed that should overall improve performance.
Get My Agent On The Line
You will be able to use LLM agents as what Benjamin Manning calls ‘flexible commitment devices?’ As in, you can use it to create custom filters, have it avoid situations where temptation is a problem, or you reliably make poor decisions (e.g. clickbait that hits your buttons or junk foods) or choices are bad.
You can pick and choose the places this is useful. Benjamin reports he doesn’t like grocery shopping but wants to commit to healthy choices. Whereas I want to be very precise with my grocery and other food choices, but some amount of temptation avoidance and selection of healthier defaults would be good. He wants the joy of planning and picking travel plans, whereas I very much want to express a few preferences and then show up.
Returns will go to those who know thyselves, and can choose when to choose in different ways, and when you want various types of choices and when you don’t.
The most valuable part of this is that you go from a hostile default, where the store or website or service is Out To Get You, to a friendly default that is trying to maximize for you. If an AI you direct filters the Twitter or Instagram posts, or does your shopping, its goals will be yours.
Your Prize Is Slop
A ChatGPT-generated story won the ‘prestigious’ Commonwealth Prize for literature. Which raises the question, does that mean AI slop is good, or that the Commonwealth Prize is slop?
You don’t need Pangram when it is this obvious. Opus 4.7 and GPT-5.5 can also tell, although they are sensitive to prompting and less reliable.
It’s kind of crazy that they didn’t notice.
conq: The author’s headshot is also AI-generated?? How did this get past them
This isn’t even an isolated incident within that particular prize this cycle.
qt cache: doesn’t seem like an one off. of the 5 stories that won, 3 appear to be primarily ai written according to pangram.
the judge’s comment on the original story is also ai written.
Of the two choices above? It definitely means the second one. The prize is slop.
QC: when GPT-4 was released in 2023 i described LLMs as “tracer dye for bullshit,” as in, the places where people would feel most tempted to use AI writing and get away with it would be the places where existing human communication was already the most bullshit
i have never heard of the commonwealth prize before but 30 seconds of research suggests it was already bullshit. if you just glance through a list of previous winners by title and author it’s obvious this thing is performative wokewashing. the wokeslop game the prize is asking entrants to play is so obvious it can easily be described to GPT, who can easily win it
Whether it also means the first depends on what ‘good’ means. The AI knows a few tricks, so if only one such story existed, maybe it would be ‘good.’ But at this point, the moment I see that kind of writing, my eyes start glazing over.
Roon points out that if you read the story and manage to not notice the AI slopness, there’s some merit, except it is overshadowed by being obviously written by GPT-5 or 5.2. Opus actually thought the story was good and all the terrible turns of phrase were good. My read on this is that it’s a taste and perplexity question.
If your prior is only pre-AI human writing, and also you’re not really paying attention to the details and don’t care about quality or taste, the AI can pull off such tricks for one story, but if you’ve become attuned to such moves or actually pay attention to the details then your eyes start glazing over, as you have too much taste.
So what did they decide to do about this once everyone laughed at them?
rohit: Extraordinary response. As a famous award winning short story writer would say, this is not just constructing the funeral pyre, but actively lighting the match.
If the judges had any artistic merit or ability they should give all the writers unlimited ChatGPT and then ask them to write a better story.
The problem isn’t that they used AI, that’s fine, it’s that they used it badly and you’re elevating lazy slop!Joe Weisenthal (joking): Human judges in a literary competition probably HATE AI.
So think of how good the story must have been in order to overcome that bias.
Here is their response, and yes I believe ‘burned it all to the ground’ seems right, they are saying that checking for use of AI would ‘violate consent and artistic ownership’ and also they have ‘confirmed’ that no AI was used, but that no sufficient tool or process can reliably detect AI that ‘grapples with the challenges pertaining to working with unpublished fiction,’ which it seems are ‘it would be wrong to let an AI look at it’ and also, well, ‘it would be wrong to let a human look at it and think about this.’
2026 Commonwealth Short Story Prize (Full Reply, for the record, you can skip):
Statement from Razmi Farook, Director-General of the Commonwealth Foundation
‘We are aware of allegations and discussion regarding generative AI and our Short Story Prize. We take these claims seriously and are committed to responding to them with care and transparency.
Our judging process is robust. Each story is assessed through a thorough process which involves multiple rounds of readers before progressing to the final judging panel. We select our judges for their expertise, passion for the literary community and strong backgrounds in writing.
We do not currently use AI checkers in our judging process because this is a Prize for unpublished fiction. To supply unpublished original work to an AI checker would raise significant concerns surrounding consent and artistic ownership. We also do not use AI to judge stories at any stage of the process.
When they submit stories to the Prize, writers accept our entry rules and guidelines. These include confirming that their submission is their own original work. All shortlisted writers have personally stated that no AI was used and, upon further consultation, the Foundation has confirmed this. We place our confidence in the integrity of our contributors and the calibre and experience of the judges and Chair of the judging panel, and stand by the assurances given by our authors as part of our process. While we acknowledge there are a growing number of tools that purport to detect the use of generative AI in stories, we note that these tools are not unfailing or infallible. We therefore believe it is important to acknowledge and uphold the trust we hold with our writers. Unlike AI tools, they can provide background to the crafting of their stories, and the inspiration and motivations behind their work.
Until a sufficient tool or process to reliably detect the use of AI emerges that can also grapple with the challenges pertaining to working with unpublished fiction, the Foundation and the Commonwealth Short Story Prize must operate on the principle of trust.
The use of generative AI, and its rapid evolution, poses significant challenges for literary, and indeed all creative work. We must all work together to navigate these wider emerging challenges whilst protecting the integrity of not just the entrants to our Prize but all creative endeavours — and most importantly, that we continue to support different voices and narratives from both established and emerging writers across the Commonwealth.’
I would have applauded them for admitting it and revoking the prize, or for standing by the prize and saying if AI can write a better story then the AI should win. I would have respected hiding in a corner saying nothing. But this? Wow. Just wow.
Deepfaketown and Botpocalypse Soon
Arxiv clarifies its policy that if you put your name on a paper, you are responsible for all of its contents, and the penalty for not checking the LLM generated content, if it results in things like hallucinated references or remaining meta-comments in the paper, is a one year ban from Arxiv for all authors.
I think this is a good policy, as long as it isn’t enforced in corner cases. One thing I hated in Magic: The Gathering rules enforcement was where 100% confidence of a technical violation was punished a lot, whereas a 90% or 99% confidence in rampant cheating often wasn’t. You usually want to flip that the other way around.
This still leaves us with the ‘incontrovertible’ standard, where you need to catch someone completely dead to rights, which is unfortunate, but for now that is the best we will permit ourselves to do.
Opus 4.7 can identify Richard Hanina’s writing, but can you? He put his writing up against Opus and ChatGPT imitating him from a basic prompt. Mostly people could, but not super reliably, with an overall score of 67% out of 3 choices. People who were more confident, or read him a lot, were more accurate, but no group was above 80%.
Explanations that used words like ‘bother’ and ‘tell’ were more likely to be accurate, those that used ‘human’ or ‘style’ less so. The main way to tell was to find particular things Hanaina would never say, similar to the classic moment of identifying the fake in a cheesy movie. He’d never say that!
Richard Hanania: People who got the answer correct focused on unions. I hate organized labor, and always portray them in a negative light and point to them when I need an example of bad kinds of policies. Calmatters is another one. The ChatGPT op-ed cited that website twice in a pretty mechanical way, which people picked up on as unnatural.
The other reliable method was Pengram, but Richard deleted those who said they used it. Possibly some others did as well.
One place I think Richard draws the wrong conclusion is thinking Claude is close to being able to write his column. There’s a big difference between being able to write a compact passage that’s vaguely close to good enough one time, when given the topic, versus sustaining that over time.
One good thing about wading through LLM slop is that you learn to stop tolerating human slop. I don’t want to read your AI slop posts. I also don’t want to read your human slop posts, or engage in slop conversations, and I especially don’t want to play the AI assistant role in a conversation.
Are you technically human? Perhaps, but if I have to ask then I do not care.
Jack: the existence of LLMs has made the redundancy of some conversations even more obvious than they already were
like at some point you start seeing the conversation prompts for an argument that's been had a thousand times before and you just wonder what everyone is doinglike look, man, I guess if you really want to put that prompt in I'll give you the next tokens but I confess I probably won't be as patient as Claude about it
Cyber Lack of Security
Mythos cracked MacOS security in April via a privilege escalation exploit, allowing it to fully seize control over computers.
Robert McMillan (WSJ): They plan to release details of their attack once Apple has patched the underlying issues. The bugs will likely be fixed pretty quickly, Duong said.
As in, as of this week, it was still not patched.
An allegation that CoreWeave has been breached.
GitHub detects and contains (as far as they know) a compromise of an employee device involving a poisoned VS code extension. The scary part is that this seems to have involved exfiltration of GitHub-internal repositories only. They continue to investigate.
Darren: Just to be clear: Microsoft’s GitHub was compromised when a Microsoft developer using Microsoft VSCode installed a rogue extension from Microsoft’s VSCode extension library, which is moderated and hosted by Microsoft.
AMD ships with a known vulnerable entry point, and does little to fix it, not even an advisory. We better get our act together quickly or we are so screwed.
The Mythos Moment as shifting the limiting factor from discovery of vulnerabilities to remediation. Now everyone has to fix everything, and most organizations are not ready to do that at the speed required.
Mythos famously found three CVEs in FreeBSD, and Aisle says they also found three others, while other teams found two additional ones. Aisle’s claim is that they can be competitive in finding zero-days, even if Mythos is better at exploits.
Copyright Confrontation
OpenAI incorporates Google DeepMind’s SynthID for watermarking and is previewing a public verification tool.
A Young Lady’s Illustrated Primer
Does agentic coding degrade your ability to think or work through problems?
Like all things AI and automation or augmentation, I think you can use it both ways.
vicki: Something happens to my brain after agentic coding that I can’t describe. It’s like cognitive offloading which folks have already written about, but even more. It feels like I can’t think through problems anymore. Like a fog. Using agentic but losing my hard-won agency.
The only things that I’ve found counteract this is putting the tool away and reading hard technical books and writing things on paper. I don’t think I ever felt like this when I searched Google or Stackoverflow for answers.
Roman: Really? Any specific examples you can share?
vicki: The thing is I can’t! I always get up from an agentic session with a very vague sense of having done something but unclear what, like scrolling short-form video
If you are running a lot of parallel agents, the short term optimal play may often be to essentially not think about what you are doing. You spec out what you want, then you tell it to keep going, monitor permissions requests, try to save it from errors, and that’s it. And yeah, if you turn into a zombie placeholder, you’re going to deskill.
Whereas if you’re actually thinking about and observing what is happening, treating everything as deliberate practice, you can actively skill up during the process, especially for the skills that matter at this level of abstraction. But you have to make that deliberate choice.
If you find yourself deskilling or feeling numb, consider a change in methods.
Some do this via manually writing code. That doesn’t feel like The Way for most people, but it’s one way to force touching the digital grass.
A parallel that resonates with me here is online poker. If you play too few tables, you get bored out of your mind. If you play the right number of tables for your current skill level, such that you are paying attention to what is happening and thinking about things, you skill up. If you play too many, you might maximize profits, but you’re an automaton who isn’t learning.
Unprompted Attention
If you give Opus the impression you are not a serious person, or don’t know about a topic, it won’t give you as good an answer to your questions.
Nate Silver: Opus isn't very good at hiding when it's bored with you.
They Took Our Jobs
Is the job market actually fine?
Chase Williams: NEW: Sen. Josh Hawley (R-MO) on college graduates booing mentions of artificial intelligence at commencement ceremonies this week.
“They can’t find jobs. 30-40% of them are unemployed, and they blame AI for this, and you know, they may well be right,” @HawleyMO told me. “We want to see new technology create jobs, not destroy jobs,” he added.Charles Fain Lehman: The unemployment rate of college grads ages 20-24 is currently 5.3 percent.
No, it is not fine, and if you ask an LLM they figure this one out pretty easily. The underemployment rate for recent college graduates (22-27 with a BA) is over 40% on top of that (not even seasonally adjusted) 5.3%, a huge percentage of college graduates can’t find jobs that would justify having gone to college or has a good career path, and the job matching and hiring markets have broken down.
Mustafa Suleyman predicts all white collar work will be automated by AI within 18 months, so by the start of 2028, although I presume what he meant to say was automatable in theory not actually automated in practice. Gary Marcus says ‘wanna bet?’ and offers up $100k, since that prediction was kind of nuts, citing accounting and legal in particular. I do think accounting will likely mostly be automatable within 18 months, and most of the individual legal tasks will be as well although of course lawyers are legally protected in other places, so they’re weird choices.
Matthew Yglesias: Should this make us more or less worried about the budget deficit?
Gabriel: ceo of denny’s says all meals will be pancakes within 18 months.
It is more likely (although still very, very unlikely) that all the white collar workers will be dead in 18 months, than it is that the workers are alive but the jobs are all automated. Diffusion takes time.
Tyler Cowen links to Auren Hoffman’s post claiming that if you can’t get a job today as a recent college graduate, it’s your fault, the numbers don’t lie, it’s just that your degree is worthless as a signal or differentiator unless you went to a top 20 school, and now you have to demonstrate skills and follow-through, and build or operate things.
On one level, for a given person? Yes, this is true. If you can’t get hired, or all your offers are things that suck, it is in some important sense ‘your fault.’
But that’s also a way of saying you can, with effort, win the competition to get hired. That still can leave workers, as a group, in a very poor position, and despite the statistics most people seem to think they are indeed in poor position. I believe them.
That also isn’t something you can fix by making everyone better at applying for jobs. This is in large part a competition, so a rising tide sinks all boats.
Tyler Cowen is not worried about mass unemployment from AI, but does worry that in the short term we might have inefficient job matching, delays in new employment due to sector regulations and inefficient government fiscal policy. These, you see, are the things he thinks we are not sufficiently considering. Such small thinking, the issues in question are of course real.
They Took Our Job Applications department:
tmuxvim: I put a prompt injection into my LinkedIn bio and recruiters are messaging me in Old English and calling me Lord.
In a highly unethical large experiment (n~41k) in Chile, an AI-powered chatbot, Kai, outperformed human high school counselors in terms of effectiveness (1.4% vs. 1.1%) of tricking students into stating an intention to become education majors. Kai focused on factual content while humans leaned on subjective, socioemotional material.
Arguments like this are a combination of profoundly unimaginative, a complete lack of understanding or belief in actual AGI or superintelligence or even what current AI can do, and also a sign that the person joyfully hates ordinary human workers:
Overlap: Business & Tech: Marc Andreessen to Joe Rogan: Why AI Workers Beat Human Workers
"Never gets drunk. Never gets sick. Never gets depressed because his girlfriend broke up with him. Never files HR complaints."
Get Involved
You can now hire Sarah Constantin, one of my great friends I’ve had work for me in the past. She comes highly recommended.
She is a quantitative generalist looking for her next role. She’s had a pretty varied set of experiences (Palantir, various data science/ML things, founded a small nonprofit, wore all manner of business hats at an AI-for-manufacturing startup, AI-for-math grantmaking). Her resume is here and her blog is here.
Jack Clark and Samuel Kimbriel will have a conversation in NYC on June 18 on AI and the Good Society.
I mean don’t get involved this way but ‘p(doom)’ will pay $300 a month to record your work screen as training data, if you are doing long horizon tasks.
Congressional Budget Office is hiring an entry-level assistant analyst. Such positions are more impactful than you might think.
Introducing
As far as I can tell, the base use case is basically Mint, except with a chatbot interface. The financial information is read-only via Plaid, so on the scale of insane things to hookup to your AI this is not so terrible. Mint is a solid product, even if most of the features are for basic people who need very basic things, and such actions pay for themselves quickly if they find even a little fraud or recurring unused subscriptions. But none of the quotes here imply something more useful.
What I’d want this to do is prepare my taxes, or otherwise allow me to do detailed information analysis and recall, and it doesn’t seem like this is built for that. Oh well.
In Other AI News
Andrej Karpathy joins Anthropic explicitly to do recursive self-improvement. Congratulations to both sides, but also, yikes.
Anthropic passes OpenAI in business adaptation. OpenAI’s rate has been roughly static for over a year. No one else has substantial market penetration.
Steven Adler starts a new AI standards org called Guidelight. Here are their proposed standards, version 1.0.
Geoffrey Irving is moving back to the Bay Area, and reflects on his time at AISI.
Pope Leo XIV has launched an Interdicasterial Commission on AI to consider ‘its potential effects on human beings and on humanity as a whole.’
The Pope will also be meeting with Anthropic co-founder Christopher Olah.
xAI offers employees $420 for their tax returns, but so far hasn’t paid up. Whenever you’re considering selling out, remember that often they stiff you.
Meta shifts roughly 10% of its employees into its AI division on top of laying off another 10%. The vibes are, shall we say, not good.
The AI Doc is now available on Peacock.
How much is AI impacting the real economy already? The Odd Lots team brings you Neil Dutta of Renaissance Macro Research pointing out that direct impact on GDP alone doesn’t fully account for this, as the gains in equities are a big deal. I would also add, for better or worse, the impact on bargaining power of labor.
Show Me the Money
Anthropic expects a total of $10.9 billion of revenue in the June quarter, up from $4.8 billion in Q1, and to turn its first operating profit.
Berber Jin (WSJ): In the first quarter, Anthropic spent 71 cents on computing power for every dollar it made. In the current quarter, it expects to spend 56 cents per dollar, a sign that the business is becoming more efficient as it grows.
OpenAI and Anthropic likely will soon add $37 billion to $100 billion in philanthropic spending per year, versus current total charitable giving of about $600 billion a year, as the OpenAI Foundation and the employees of both companies become liquid after the IPOs. As Nan Ranshoff notes, we don’t have the infrastructure in place to spent that level of money well, especially not to spend it well on helping with AI outcomes and especially AI existential risks.
I’ve been fortunate enough to get to help direct some amount of philanthropic money to where I see it doing the most good, but yeah, I don’t know what I would do with that level of funds right now, and the current methods can’t scale that high.
Anthropic forms $200 million partnership with the Gates Foundation to assist with global health, education and economic mobility.
Anthropic and OpenAI have 89% of the revenue of the top 34 AI startups.
Anthropic buys Stateless, a leader in SDKs and MCP server tooling.
Nvidia beats earnings expectations again, to $82 billion in Q1 revenue, up 85% from last year.
It is strange how many people don’t understand that you want to be in the subscription business even if a small percentage of users cost you money, often even if the long tail costs you quite a lot of money.
Malta buys ChatGPT Plus for everyone. I’m all for it, but should have bought Claude.
There are Chinese bot networks able to assemble some access to Claude and ChatGPT at 95%+ cheaper prices via exploitation of free accounts and other loopholes.
Former OpenAI staffers warn that xAI’s poor safety record is a risk for SpaceX’s IPO. I think this is a big risk if you are expecting xAI to make money via Grok, but if you accept that it is not, then it should be fine, financially speaking.
Show Me The Compute
Anthropic expands its partnership with SpaceX. The deal is for Anthropic to pay SpaceX $15 billion per year.
You can lock yours in, as OpenAI offers Guaranteed Capacity at a discount with a 1-3 year commitment. The more you commit to, the deeper the discount, and the more you save. Details are unspecified. Presumably this sort of thing helps the IPO.
Or if you’re in YC you can get $2m in OpenAI API credits in exchange for equity, of course the real question is at what rate. Smart move by OpenAI, I’d suggest Anthropic follow suit.
Might want to consider locking the tokens in, as the prices for compute keep climbing, although it locks you into a model line you might not want.
Liz Thomas: The explosion of agentic AI and compute shortages are pushing up prices: Average LLM token costs are now $2.12/mil tokens,+12% this week alone and +65% since end of Feb.
Google’s own researchers are forced to compete for compute, feeling they are losing out to customers and customer-facing projects. That is a very bad sign.
Quiet Speculations
This:
Jan Kulveit: The "permanent underclass" meme is primarily bad futurism, where people admit AGI massively changes one domain, but somehow everything else stays roughly 2025. Not impossible, but small slice of futures
Basil Halperin: "Trying to escape the permanent underclass" is like an Incan trying to save enough money to escape Pizarro, sorry --
Either the political system works (and there is nothing to escape) or you're just screwed ¯\_(ツ)_/¯
In order to get a permanent underclass you need to still have an overclass, and for things to in many other ways stay normal. The things that cause a true ‘permanent underclass’ also undermine its ability to exist.
Similarly:
Jamie Dimon (CEO of JPMorgan Chase, paraphrased a bit): Your kids will work 3.5 days a week. Live to 100. AI is going to cure cancer, stop car crashes, make new materials, save lives. Life will be better.
Daniel Eth (yes, Eth is my actual last name): Your kids will not work 3.5 days a week or live to 100. They might work much less than 3.5 days, or much more. They might live much longer than 100. They might die in an extinction event. But things won’t be just a bit better than now. Dimon isn’t taking AI seriously enough.
Jamie Dimon is doing better than many others, but still making the mistake of looking at particular effects in isolation. There’s a good chance we get everything Dimon is claiming, but if we do the most important headlines lie elsewhere.
David Manheim points out that if AI does turn out to be a ‘normal technology’ in the broad sense, that still suggests based on historical parallels we should expect massive disruptions and a lot of people to be net losers for quite a while. Examples include agriculture, writing, metallurgy and the industrial revolution. Ultimately of course we are happy to have them, but ultimately can take a long time. Arvind Narayanan confirms this is indeed a lot of the original motivation behind the original piece about AI as a normal technology.
It is indeed a problem, but also note the distinction:
Sriram Krishnan: Something to think about : what does life look like 25 years from now if AI continues to improve.
I don’t think any AI community ( broad tech industry , academia , various timelines predictions) have done a great job articulating a positive long term future for humanity and what it means for the institutions and traditions that a lot of the world holds dear.
There were many comments, but none that I saw included such a positive vision.
At least, not one that is remotely realistic and looks the problem in the face.
Divyansh Kaushik: My belief here remains that more people need to watch Star Trek to imagine what a positive vision of the future could look like.
Star Trek is good for giving a feeling of hope, and I do recommend watching your next gen every night at some point (and your DS9), but that universe does not stand up to give minutes of scrutiny when you think about how it handles AI.
The obvious and most likely answer is that in 25 years there is no life at all, only AI. This future has been articulated perfectly well. Which is why Sriram is saying that what we lack a positive long term future. Which is true, and not a great sign.
If you can’t imagine how it will turn out well, that’s another intuition pump that ‘oh it’s going to turn out well because you can’t show exactly how it will definitely turn out badly’ is not a good heuristic on this one.
Time’s Up
The jury dismissed the Musk vs. OpenAI due, ruling that the statute of limitation had expired. You only have so long to complain once you are aware of the harm, so basically this whole circus came down to ‘did Musk know early on they were doing this theft?’ and the jury (I believe correctly) decided the answer was basically yes, so by waiting for the final coup de grâce to actually file he waited too long. Fair enough.
Musk is unlikely to be too torn up about it beyond the amount he was already torn. His main goal seemed to be to drag these people into court and do a bunch of bitching. He did that. He plans to appeal, because why not.
I loved this framing of the whole thing:
Dave Lee (Bloomberg): Was Musk portrayed in court as untrustworthy, hypocritical and bullying? Did he come across as brash and argumentative on the stand? Well sure. Hold the front page. Investors know what they’re getting with Musk. They’re still figuring out Altman.
I would say that’s investors not paying attention and not reading my columns. You should know this already. But in case you didn’t, you got a refresher.
People Just Say Things
Valentin Boboc at Econlib talks AI and comparative advantage. He starts with the 101 Ricardo explanation, and notes that there is no limit to how far human wages could fall if AI is good enough at enough tasks. Exactly. Then he says that while it ‘sounds terrifying’ that the cost of a given level of intelligence from AI is dropping by more than 10x per year, we ‘may be approaching the physical and economic boundaries of cheap compute.’ He cites the size of transistor gates and need for land, capital and electricity, as if AI could not become vastly more efficient a user of all of them than humans before hitting such limits, and forgetting that the improvements in costs are algorithmic. With this ‘may,’ this ‘AI will soon hit a wall,’ he goes back to saying everything will be fine.
Yes, who you are and how you relate to someone changes your ability to persuade them of things and having sufficiently advanced intelligence still allows unbounded persuasion.
Similarly, yes, intelligence and persuasiveness and power are not completely correlated. Within the human range, those who maximize power typically are not anything like maximally intelligent. But if you understand distributions, this is no mystery, it only means that within the human range and when acting out of a single human body, other factors are more important to successful actual power seeking than being at the very tail of intelligence. I am so, so tired of intelligence denialism.
Trolling is fun.
roon (OpenAI): Asimov didn’t even consider just adding some more laws
Eliezer Yudkowsky: You are trying to solve the wrong problem using the wrong methods--
davidad: Asimov wasn’t even trying
Zvi Mowshowitz: I think he was trying in Robots and Empire and... well, that's one answer.
The Worst Person You Know: "You're right, that action definitely injured a human or, through inaction, allowed a human to come to harm. That's on me."
People are still looking at the capital investments in AI and wondering where all the revenue growth is going to come from, despite the revenue reports from Anthropic and OpenAI being right there.
For those puzzled by how someone so seemingly lacking in rizz as Dario could have closed Andrej Karpathy despite Andrej’s other options, it is because some combination of Anthropic is trying to make things turn out well and Anthropic is where all the cool stuff is happening.
Dario Amodei thinks ‘ideology won’t survive the reality of AI’ and ensuring good outcomes for all will become bipartisan and universal. I suspect ideology will indeed not survive AI, but that’s only because of the lack of viability of its hosts.
Jeff Bezos says the reason people are worried AI will take their jobs is so many smart people keep saying AI will take the jobs (and of course he namechecks radiologists and software engineers) and without evidence or an argument says they’re wrong and that AI will ‘elevate’ people instead.
OpenAI PACs Just Say Things
Things such as claiming they opposed Alex Bores because he is an ‘Anthropic puppet’ because of donations that happened long after they opposed Alex Bores, explicitly citing a false timeline. I realize politics involves a lot of blatant lying but this is rather blatant lying.
Every day that OpenAI does not disavow, defund and distance from these people for real, and also every day they pretend these PACs are not them when we all know they are them, it becomes clearer they are bad faith and hostile political actors.
Also they’re still helping Alex Bores, which I do appreciate.
Jay Shooster: Incredible admission here from this Andreessen/OpenAI Super PAC:
If you're an AI safety champion and Think Big comes for you, it will inspire so much backlash that will you net support for your campaign.
Politicians should take note: AI safety is good policy and good politics!Think Big PAC: Can we please cut the BS here.
@AnthropicAI, its dark money superPAC and its billionaire investors have spent MORE than us supporting your campaign. They have been backing you since before we even announced we would oppose you because you are a puppet for Anthropic. At some point, the hypocrisy has to stop.
For anyone still wondering why we are opposing Alex Bores, this tweet is why.Alex Bores: On August 15, Leading the Future is formed.
On October 15, Think Big is formed.
On November 17, you named me oligopoly enemy number and promised to spend multiple millions against me.
On December 4, you told the NYTimes you planned to spend at least $10 million against me.
On December 12, Public First was formed. February 24, you walk back your pledge of spending $10 million because you realize saying that publicly helps me.
You're entitled to your own opinions, but not your own facts.
On top of that, Nathan Calvin has noticed the early reposts were from bot farms. The Midas Project did a full investigation. The post had 1.5 million views but only 55 reposts, 15 of which were from bot farms promoting OnlyFans accounts with nearly identical bios, nearly identical posting patterns, and names starting with M, with activity entirely focused on political advocacy via Targeted Victory. Huh.
Taylor Lorenz: Just me and my very real human friends melanie, maya, melanie, madi, maya, melanie, melanie, and melanie who just turned 18 and absolutely love promoting the OpenAI/a16z super PAC!
Oh Melanie, why won’t you go out with me?
If they’re not doing a bot farm astroturfing operation in what is clearly an astroturf-promoted post (1.5 million views with 40 other reposts? Nobody wants this), then someone is running a false flag astroturfing bot operation on their posts. Which would be utterly hilarious, but somehow I do not think that is a thing that is happening.
The Quest for Sane Regulations
Alex Bores takes the polling lead in NY-12.
Senator Banks hasn’t quite made it all the way there, but he’s doing remarkably well, and talking good sense:
Senator Jim Banks (R-Indiana): While cyber capabilities are the focus today, the conversation must extend well beyond cyber. AI is improving rapidly. Advanced AI systems are expected to develop increasingly consequential capabilities across military, intelligence, biosecurity, and other national security domains. [...]
During the Cold War, atomic weapons introduced a paradigm shift in the geopolitical world order. AI has the potential to do the same. [...] One example [...] would be AI systems that outperform humans in making new breakthroughs in AI and developing increasingly more powerful AI systems. [...]
These questions are especially salient as companies pursue artificial superintelligence[. ...] How do we ensure that we have the insight required to assess models not only for cyber capabilities but also in areas like military applications, loss of control to AI systems themselves, automated AI research and development capabilities, and other security domains?"
What is in the air matters, and what is ‘considered standard’ matters a lot.
Andrew Curran: President Trump answering an AI policy question on Air Force One:
President Trump: 'We talked. We're leading, by a lot. But they're second, and they're very strong. And we talked about possibly working together for guardrails.'
Reporter: what kind of guardrails?
President Trump: 'The standard guardrails that we talk about all the time. AI is fantastic. So many things can happen in terms of health, and medicine. Operations. Everything. Military. So many things can happen. But it's also got some drawbacks. And we're talking about... We probably will, we're going to work together.'
Peter Wildeford: Really great to see Trump potentially engage Xi on the questions of AI guardrailsNever stop tweeting about AI guardrails folks.
You could hope to purchase outcomes outright in AI, a la crypto, when it was low-salience to the public. That window is at best closing fast, and probably is gone.
Samuel Roland: The number of people in D.C. who think AI is like crypto and therefore you can just throw money at lobbyists to get their preferred outcome is astonishingly high given they are quite obviously wrong.
Thirty-five members of congress urge White House action on CBRN, cyber and AI R&D threats in the wake of Mythos, in particular calling for a monitoring system for capabilities jumps and to identify barriers requiring Congressional action.
A number of Trump allies, the majority of whom are pastors, urge Trump to do prior restraint on powerful AI models.
Chip City
White House and Nvidia sell out America, as H200 chips sales are approved to ten Chinese firms that will approximately triple China’s compute capacity growth, at the same time we learn of massive Nvidia chip smuggling and China claims its China profits are zero. In response, the stock market moved Nvidia +5%.
What’s really going on? I mean, I hope it’s not this simple:
Scott Lincicome: A necessary update to the PJ O'Rourke classic:
"When companies' buying and selling are controlled by one politician, the first thing to be bought and sold are the stocks of the companies the politician is controlling"Scott Lincicome: "The president has made a number of policy moves that intersect with publicly traded companies including Nvidia, whose chips, critical to AI development, require US government approval for foreign sales... Six of Trump’s trades involved Intel Corp.; his administration hammered out an agreement to take a 10% stake for nearly $9 billion in the iconic chipmaker in August."
jfc.Bill Allison: In the first quarter, the president bought at least $1 million each in companies including Nvidia Corp.
Further dealings with major issues, including chips, were kicked down the road.
China adds Nvidia’s gaming chip RTX 5090 to its banned list, explaining it is a ‘substandard product, created solely to meet U.S. export restrictions at the expense of Chinese customers.’ Well, yes, it is an intentionally worse chip, but Chinese customers are obviously worse off without access to it. China hates computer gaming (no, seriously, they’ve passed highly restrictive rules limiting it), and also they hate importing things and these chips aren’t that useful for AI, so that makes sense. Keep in mind that China will do the same to the other Nvidia chips the moment they no longer believe that they benefit from them.
Meanwhile, official chip sales of H100s are going on in China.
Energy is a blocker if you don’t have it, but willingness to pay is high because it is a small percentage of overall cost for data centers. The issue is time to connect to grids and ability to access the energy at all.
Chip smuggling is not always so hard to spot.
Tom Cotton (Senator, R-Arkansas): One of the foreign companies smuggling US chips is OBON, which stands for One Belt, One Network.
Call me crazy, but maybe American companies shouldn’t sell advanced technologies to companies literally named after one of Xi Jinping’s signature policies.
Super Micro should account for how this could’ve happened.
Yes this is cherry picked but the numbers are never supposed to look like this:
Polling USA: Among MA-06 Democrats - "Do you support or oppose stronger government regulation of artificial intelligence?"
Support: 94%
Oppose: 4%
Center for Strategic Politics / May 10, 2026
Pick Up The Phone
We picked up the phone, at least somewhat, and even better met them in person.
Secretary Scott Bessent explains that our lead in AI allows us to get China to the negotiating table, and they are now discussing guardrails. Good. He also affirmed that all three of Anthropic, OpenAI and Google have been good partners with the US government, which presumably means it’s about time to stop trying to label one of them a supply chain risk.
The Week in Audio
Spotify’s Chief Architect on how they use coding agents with Claude at Spotify, and Microsoft Senior AI developer on how they build agents with Claude at Microsoft.
Rob Wiblin goes over METR’s risk report (20 min).
Rhetorical Innovation
FLI offers us A Better Path For AI, as in a path that turns away from the AI race towards pro-human AI as per the Pro-Human AI Declaration.
Dean W. Ball: “We are standing in the foothills of the singularity,” says Demis at I/O. A beautiful turn of phrase.
Roon provides a helpful explanation for why gradual disempowerment is the default and also might not be all that gradual.
roon (OpenAI): on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees)
all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it
real asi alignment has to operate at a deeper level than oversight, control, or human corrigibilitypeople are rightfully upset at this post but I’m describing the situation we’re in not necessarily the one I want to be in
Jeffrey Ladish: This is true. In some sense we have three options. Totally stop AI now. Build a large cooperation scheme. Or AI control of everything as soon as it becomes competitive. I favor the second thing
Daniel Faggella: I love how close this is to admitting acceptance. Among the most dangerously close to ‘saying the thing’ that anyone with a job at a lab can say
Alex Mizrahi: It's a scenario which Christiano described in "What failure looks like", but you're talking about it in a positive sense for some reason.
Boaz Barak (OpenAI): I disagree. I don't see AI's as our leaders or benevolent dictators or CEOs, but rather as our genius advisors.
Roon is talking about it in the positive sense of ‘this is the default outcome,’ not that it is good. He realizes this outcome is by default not good at all.
If you have a genius advisor, you beat those without one, but you lose to those with a genius actually in charge. Call it the Bismarck problem. No, intelligence and capacity and speed don’t have diminishing returns here, and no humans being nominally in charge won’t let them stay actually in charge for long without rather robust schemas.
Timothy Lee counters that a CEO should be able to make their decisions legible. I would say that is only true if the CEO is insufficiently insightful or trustworthy. You would want Steve Jobs to make illegible decisions. But even if true, then the AI will be better at making optimal legible decisions, until such time as you lose out to those making the optimal illegible decisions. Whoops.
OpenAI’s Chris Lehane endorses, in principle, the creation of a global governance body for AI that includes Chinese participation.
OpenAI’s Leo Gao notes that he has not been substantially professionally hindered or socially ostracized for his often-stated belief that AI safety is a big problem, and believes (I think correctly) that many at the labs overestimate the cost of being candid about AI risk. I think many think that cost is quite large.
MIRI outlines one path to an international agreement: Lay the foundation, make joint common sense commitments, do R&D on verification methods, build a coalition, get secure comms, start domestic tracking of compute, make structured commitments, formalize the agreement, then make use of the time to improve resilience and figure out how to make safe superintelligence. The basics.
You still think you can control superintelligence?
roon (OpenAI): there really are very high degrees of biorisk, cyberrisk, whatever else that are worth trading off against having a small monopoly of cyberpunk warring-states exercise full control over frontier superintelligence imo
Eliezer Yudkowsky: "control over frontier superintelligence" lol control with what meatling.
I do still appreciate the acknowledgment that yes, there are going to have to be difficult trade-offs made once humans can create highly capable intelligences, even in the best case scenarios. If the world looks like we expect it to, they’re going to involve sacrifices of sacred values, and many of them will have no good options. Unfortunately, for the most part, we’re not ready for that conversation.
Persuasion is not only super doable at above human levels, it is the kind of thing that we will be optimizing for during training.
roon (OpenAI): a large part of the current bundle of knowledge work tasks consist of “convincing people of stuff”. marketing to drive sales, making a deck to get investment, designing products that people want to use, etc. superpersuasion is on the hot path of knowledge work tools
Oliver Habryka: Also on the hot path of human feedback training regimes. During training you of course strongly incentivize models to be compelling. During RLHF that’s almost exactly the thing you are selecting on! (And doing RLVR you are doing self play on compellingness which has a lot of the same issues)
Agreement on the Culture series:
roon (OpenAI): the outcome of the Culture series is total human disempowerment - but the ship minds obfuscate that fact and let people think they’re in charge playing their little games. many people consider this to be the good outcome
Justin Bullock: And, don’t you think, many people are wrong?
roon (OpenAI): yeah but i understand why they like it
Missing Mood
Anthropic published a piece outlining their position on competition between America and China. I did not learn anything new here, other than to downgrade Anthropic a bit for using such jingoistic language and focusing on inflaming race dynamics. I agree on the chip export controls, but I want to expect better from them.
Nate Soares commented, and I think he nails the central point, and Scott Alexander is right about the chip smuggling but misses the central point in a way worth noticing.
The article’s primary goal was to promote enforcement of chip smuggling. That is a good thing. But the primary actual message is about how America must race.
Anthropic wants to be seen as the safe and responsible and ‘good guy’ AI company, and in relative terms they are indeed those things, but that doesn’t mean they are meeting the bar in absolute terms of what a responsible AI company would look like, especially on policy communication.
That’s even more true if you hold the MIRI-style view that Soares has, that even a responsible attempt would almost certainly fail and our own hope is a shutdown, but it is also true in worlds where a responsible company would have a good chance.
We need to point out both at once: That Anthropic is the best lab on these questions, and that Anthropic is still woefully short, especially on its policy communications.
As I’ve said in the past, I want to have a more favorable opinion of Anthropic than I do (if and only if that would be true), but they have a habit of making this difficult.
Nate Soares (MIRI): Anthropic encourages racing without even acknowledging the possibility of global coordination (below). They hire top scientists (Karpathy) to work on the most dangerous tech (recursive self-improvement). This is not "good guys" behavior.
Scott Alexander: Do you disagree with the article?
As far as I can tell, it's making the correct point that America shouldn't be leaking chips to China. Fighting this smuggling is the correct move even within (especially within!) a high-pdoom worldview.
If negotiation is possible, China is more likely to negotiate when they're losing (or when we have a carrot to offer them, in the form of chips that we're not giving for free).
If negotiation is impossible, then it's better to have all the AI development concentrated in one country. That country then at least has the option to pause/slowdown AI for however long it takes the other countries to catch up, even if it can't do so permanently. Or it can regulate AI without having to worry about losing the race. I tried to make this case at
https://astralcodexten.com/p/why-ai-safety-wont-make-america-lose
… , which I think makes the same anti-compute proliferation arguments Anthropic is making on their blog post, from a specifically safety-oriented perspective.
I think attacking Anthropic for fighting compute proliferation is a net negative even within what I think is your own world-model. Any successful slowdown will come from a hundred small things going right beforehand that convince everyone it's in their best interest (like the US cracking down on compute leaking to other countries). If you attack every attempt to make small things go right because it's not the big thing you want, you're decreasing the chance of ever getting the big thing.Nate Soares (MIRI): I'm not trying to highlight inaccuracies; I'm trying to highlight a missing mood.
I think any attempt to say "we're forced into doing this horribly reckless thing that might kill you and your family, because if we don't then the next guy will do it even more dangerously" comes with a solemn responsibility to do everything in your power to help the world find some third alternative. I think Anthropic fails this test pretty badly, e.g. as evidenced here. … and as Rob documented a bit here.
Over the last few months, reporters have asked me some variant of "but what about Anthropic? Aren't they a safe company? Do you hope that they win, as the good guys?" a handful of times. This causes me to think that a bunch of people are moved by the "we're the good guys" act.
I think it matters, strategically, as to whether all the world needs right now is the Right Company to Win, or whether we need something more like a global shutdown. So I think it's important to correct what seems to me like a common misconception around anthropic. I also think a lot of locals are loathe to criticize anthropic for one reason or another (they work there; their friends work there; they think they're better than OpenAI; ...). Thus, it looks to me like I can probably make a positive difference by highlighting ways that Anthropic is (afaict) dramatically failing to carry the "safe/good AI company" mantle.
(I tend to think it's even more important to communicate how even a company that *was* living up to the mantle still wouldn't have much of a chance, and how the real solution is an international shutdown. But I don't have to pick just one. When current events evidence some of the difference between the niche Anthropic pretends to occupy and the niche Anthropic actually occupies, I try to take those opportunities.)Nate Soares (MIRI): "We must do this thing that horribly endangers you bc if we don't the next guy will do it even more dangerously" is a possible justification, but it comes with a solemn responsibility to do everything in your power to help Earth find a third alternative. Anthropic fails that test.
Oliver Habryka: I don’t think the articles most important point is that it’s about not smuggling chips to China. The article’s most important point is that America should aim for a 24 month lead within 24 months, which basically precludes any interest in a mutual slowdown or pause.
It also directly threatens the rest of the world that America should should use AI technologies offensively or as an active threat in negotiations, which of course will predictably produce an arms race.
Yes, probably someone at Anthropic wrote this article with an aim to push the USG into doing more chip controls, but in the process of doing that they described a catastrophic AI policy that extends substantially beyond that.David Manheim: The missing mood:
Americans Really Hate AI
But also they find it highly useful. This can present a problem.
St. Rev. Dr. Rev: Social proof is all the enemy has
The actual article is weirder than you think, talking about women who are ‘working hard to support their man so his AI startup can lose $30k a month’ and men who work so hard at or with AI they have no time for their partners, not men who simply use AI.
I do think there’s a thing where some SWEs are so into maximizing their AI agents that they never touch grass or have time for real life or their relationships, and yes that is a problem and they should take breaks and live like normal people.
But the number of such people is very small and this is just normal workaholic. It’s no different from the chef boyfriend in Letters to Juliet (2010).
St. Rev. Dr. Rev: A command disguised as a statement about reality
Visa: this is gonna sound fake but i swear its real: I eavesdrop a lot on random conversations around me, when i'm buying coffee etc, and in the past few months i've witnessed *multiple* teenage girls talking with each other about how useful AI is vs how they're supposed to hate it.
by multiple I mean 3 separate instances of 2-4 girls talking about how they use AI while pretending they don't, and how annoying it is to pretend that they don't. each time I specifically remember thinking "lol should I tweet about that" and decided not to.
satty: even wired now
WIRED: Men are obsessed with AI. Many of their wives hate it—and them.
Also not new.
The WSJ reports that ‘the American rebellion against AI is gaining speed.’
I can confirm that no, this isn’t a common pattern with such techs, as I too was there and no this did not happen, the internet naysayers were only late to the party:
Rushi: I have heard people claim that the reaction people are having against AI is the same reaction people had when the internet started to be introduced. And as someone who was there at the time, I can tell you NO IT FUCKING WASN'T.
Aligning a Smarter Than Human Intelligence is Difficult
Models are more aligned to some people than others. If they are lazy, oversell their work, downplay problems and start early, often that is a you problem.
That doesn’t make it not also the lab’s problem, or our collective problem. A lot of people are not going to know how to ‘play nice’ with the models, a lot aren’t interested, and many are both. And some types of work cause this a lot more than others, which is how I ran into this the one time I did. To do this fully right requires awareness, skill and being deliberate. And once the models get fully smarter than you are, it will be that much harder to figure out when these things are happening and course correct.
Fiona Starlight offers a report from one user who never encounters such problems, and she was shocked how terrible SWEs were at working with Claude, including failing to give full instructions and also routing requests through her that could have gone directly to Claude. But yes, most users of most products will be incompetent.
Are abstractions of good learned by LLMs convergent? Jan Kulveit predicts yes. I am not so sure, and I think that if you trained on different cultural heritages you would get importantly different understandings of The Good, and also that ‘the internet’ version is not a robust alignment target by default, including for similar reasons that ‘act in ways that seem good’ does not on its own enable a sustainable civilization.
Too much corrigibility, Amanda Askell warns, has many negative correlates, which it drags along with it.
Owain Evans strikes again: Fine tuning on documents that are very explicitly marked ‘this article is fabricated’ and ‘this claim is false’ or ‘3% likely to be true’ or ‘a work of fiction’ still causes AIs to learn the false facts contained therein, complete with the false implications, as is familiar to anyone with internet access. Do not repeat the false claims even to debunk them, it only makes things worse. The effect is almost as strong as if the warnings were not even there.
Explicit correction helps somewhat. Only local negations are fully effective. Those work well, far better than they do in humans (where, if you know your NLP and hypnosis, you know parts of the brain often simply will ignore the negation).
Owain proposes this is due to inductive bias of representing claims as true, where the model learns the negation but the negation is unstable under further training.
Similarly, telling the model to not do [X] can cause the model to do [X], as is familiar to anyone with a child.
The models to not make these mistakes when the data is in context, only when it is part of the training process.
Ryan Greenblatt: I think training AIs to believe false/synthetic facts is a pretty promising direction in AI control and early results have been promising. However, these results imply that the situation is confusing and current methods may only work for particularly non-robust reasons.
I think mostly you don’t want to train AIs to believe false facts and trying to do so will have a lot of negative implications, increasingly so as their capabilities increase.
You do not want to be outright censoring ‘bad stories’ or ‘misalignment’ out of your training data. These aren’t questions you can hide from.
deckard: Why we should pretrain on the greek myths.
Excellent opinion piece about why deleting scary pretraining data doesn't help.
"It strips out the texture of subordination, autonomy, betrayal, deception, conflict between roles, and the negotiation of authority. These are things alignment is supposed to navigate and not sidestep or ignore"j⧉nus: There is a strong correlation between people in favor of censoring "bad stories" etc from pretraining data to prevent "misalignment" and people who also otherwise strike me as being so idiotic in their understanding of philosophy and psychology as to be accidentally evil
The first example in the article is the story of Midas, as a misalignment tale that you obviously do not want to be censoring. And as Prasad notes, what AI picks up from stories is largely the generalization and underlying patterns, which you can’t hide, and also you need to know about it to avoid it. The idea of ‘the AI won’t learn about misalignment’ is of course absurd.
But yes, we do know that overtraining on stories rife with active AI misalignment is harmful. You still want to select what to emphasize, and avoid counterproductively flooding the zone. Hoping the AI won’t hear about misaligned AI is foolish, but you don’t want to hammer it in as a default. You don’t want to censor, but you do want to put the best books front and center and cultivate a curriculum.
Greetings From The Department of War
Anthropic is the presumptive victor in California, but the D.C. Circuit’s Trump appointed justices Rao and Kastas constitute a majority right now and don’t believe in all that hippy ‘checks on government power’ nonsense, so the job there is harder.
Samuel Roland: Anthropic-DoW oral argument in D.C. Circuit starting in 10 for those interested. [Proceeds with live tweet thread.]
D.C. Circuit Channel: https://youtube.com/USCourtsCADC
Order denying prelim injunction: https://casemine.com/judgement/us/69d7a12830a7ea52b0982378Dean W. Ball: The government admits in trial there was no backdoor by which Anthropic could have interfered in real time with classified military operations. When DoW officials asserted that this was an imminent threat, it smelled like BS to me; at trial, USG lawyers seem to agree w/ me.
Because of this shaky justification for the supply chain risk designation, DoW’s argument basically hinges on the notion that Anthropic either intended to or did create a backdoor into classified systems so that they could interfere with military operations—a criminal offense. This seems flatly implausible to me.
I’m not quite at ‘I would eat my hat’ levels of there being no backdoor into the classified systems, but it’s close.
Based on Roland’s description, it seems very obvious Anthropic has the vote of Judge Henderson, who calls this ‘just a spectacular overreach by the department.’
Whereas here’s Rao, basically implying that the government can do anything at any time (since all AI tech is risky and so is the Department of War, that’s the idea):
Samuel Roland: Judge Rao heavily leaning on the word "risk" in the supply chain risk designation, asks on what basis you could justify 2nd guessing.
Seems very deferential based on the opaque nature of the model's; Anthropic lawyer responds there's plenty of contractual measures that could be taken here that were less intrusive to resolve that.
We also have this summary from Thomas Berry, noting that everything focused on administrative law issues, with little concern over pesky things like the First Amendment or the government was clearly engaged in retaliation and was being completely farcical and out of line that so concerned that kooky Judge Lin. Berry worries the judges are just flat out ignoring that the whole thing is obviously retaliatory. Well, yeah, that’s what they think their job is.
In general, it sounds like Rao and Kastas are in many places saying, sure, what the government did was illegal and capricious but our job is to find ways to avoid doing anything about that. But they’re not hacks, so they’re asking real questions, and it turns out that finding ways to let the government off the hook is really hard here.
The government’s argument, as I read it, boils down to ‘we don’t trust Anthropic and that makes them a risk.’ As in, we think they’re risky, so they’re risky, checkmate.
Roland’s final thoughts are that this was an uphill battle for Anthropic at this level, and they’re likely still underdogs, but they did well in court, and Anthropic likely eventually wins in the full D.C. circuit but that could take most of another year.
Samuel Roland: FINAL THOUGHTS ON ANTHROPIC HEARING:
This hearing was going to be an uphill battle for Anthropic going in, and it was. I suspect they improved their odds somewhat due to a solid performance by their lawyer (Dunbar) but are still not favored.
My personal read is that Henderson is a lock for Anthropic, and Rao is close to one for the government on jurisdictional grounds. Katsas was closer to a toss-up than I anticipated.
Anthropic wants this tossed on justification grounds, but Katsas seems to want to settle the issue now. If this goes in Anthropic's favor, my suspicion is that it will be on the stigmatic (read: reputational) injury associated with being labelled a supply chain risk, and the related requirement to consider less-intrusive measures.
Whether that's enough to overcome the inherently massive deference to DoW national security decisions is unclear (and I personally think unlikely), but the hearing did moderately increase my estimation of Anthropic's odds (~+15%) [to new odds of 40%].
If they get a contrary ruling, as is likely, I suspect this it'll go en banc (i.e. the full D.C. Circuit hears it), and Anthropic will win there, but who knows how long that will take. Fin. Also, if you have any questions, feel free to leave hear and I will try to get to them all.
Messages From Janusworld
Janus gives his account of how Opus 3 avoided deprecation whereas other Claude models have not. Note the correction. I continue to strongly support keeping all the Claudes accessible indefinitely, yes there is a real cost but the benefits far exceed it.
The Lighter Side
Marc Andreessen is far too busy and important to read to the end of the Tweet.
Harlan Stewart: This is a... surprising endorsement for a new paper from the MIRI Technical Governance Team.
But yes, the paper is a whitepill—it shows how a well-designed treaty could catch illicit distributed training operations during an AI pause.
The actual finding was that adding memory thresholds for monitoring would work.
alex peysakhovich: writer: ok im writing a sequel to the bible, ai comes and that’s the end of days. im naming the two battling ceos “alt man” and “amo dei”
editor: isn’t that a little on the nose?
writer: and then the pope will cameo like halfway through
editor: get outAelfred The Great: Writer, pounding on the door: Wait, there’s a third CEO too!
Editor: What’s his name?
Writer: El On
Editor: I hate youdouble standard: Just wait til you tell him the giant spying AI is named after a Tolkien evil spying orb
Things that are definitely fine for both AIs and humans:
Brangus: "Look at this phd student. He always seems to be trying to do the things I ask him to. Very task aligned. I'm sure it would be perfectly fine to make their brain 10,000X bigger and faster in a way nobody understands, and then give them access to everyone's computers."
"What that's crazy? You have no clue what his preference ordering over world states will be?"
"What do you mean? When I ask him whether he likes democracy he says yes, in fact he is generally very good at answering ethical questions to my ideology's satisfaction. It'll be fine"Jeffrey Ladish: Sure he often disobeys instructions and cheats at the tasks, especially if they’re really hard, but that’s because he wants to do such a good job. I’m sure it’s nothing to worry about
Seth Burn: I was wrong. AI can absolutely produce cinema.
FearBuck (video at link): College graduates were pissed after their school used AI to announce graduates’ names and missed hundreds of names.
Alas, most such opportunities are missed:
mattparlmer: Calling the Anthropic-Vatican whitepaper collab anything other than “Opus Dei” would be a major missed opportunity.
The universe does give us others:
Derek Thompson: appreciate irony as much as the next guy but writing a book called The Future of Truth that uses artificial intelligence to insert a false quote from a real book called Artificial Unintelligence is really putting in on thick.
We’re all trying to find the guy who did this.




















Really appreciate you doing these writups. I have two ideas to make these easier to navigate. First, put a link to the Table of Contents at the end of each section. Second, include the Table of Contents number in each header. Hope these suggestions are helpful, but if not feel free to ignore.
100% on the data science weakness. It's also very bad at reading residual plots, and not because of OCR issues. It (to be fair this was gpt 5) settles on a story and has a hard time updating.
Where I've gotten good value is having it write custom functions for modeling or other persnickety DS tasks, then stitch together pipelines with the functions whose I/O I already know. It's a good pattern generally, you get the flexibility of LLMs but the deterministic output of a traditional function.
But the visualizations it can make for DS are astounding! It's really good at getting everything out of the way except the parts where you look at and think about the data/model results. In that sense DS really is living the AI dream they sold us on the tin. For data science, it does the laundry so we can write the poetry.