Google DeepMind got a silver metal at the IMO, only one point short of the gold. That’s really exciting.
We continuously have people saying ‘AI progress is stalling, it’s all a bubble’ and things like that, and I always find remarkable how little curiosity or patience such people are willing to exhibit. Meanwhile GPT-4o-Mini seems excellent, OpenAI is launching proper search integration, by far the best open weights model got released, we got an improved MidJourney 6.1, and that’s all in the last two weeks. Whether or not GPT-5-level models get here in 2024, and whether or not it arrives on a given schedule, make no mistake. It’s happening.
This week also had a lot of discourse and events around SB 1047 that I failed to avoid, resulting in not one but four sections devoted to it.
Dan Hendrycks was baselessly attacked - by billionaires with massive conflicts of interest that they admit are driving their actions - as having a conflict of interest because he had advisor shares in an evals startup rather than having earned the millions he could have easily earned building AI capabilities. so Dan gave up those advisor shares, for no compensation, to remove all doubt. Timothy Lee gave us what is clearly the best skeptical take on SB 1047 so far. And Anthropic sent a ‘support if amended’ letter on the bill, with some curious details. This was all while we are on the cusp of the final opportunity for the bill to be revised - so my guess is I will soon have a post going over whatever the final version turns out to be and presenting closing arguments.
Meanwhile Sam Altman tried to reframe broken promises while writing a jingoistic op-ed in the Washington Post, but says he is going to do some good things too. And much more.
Oh, and also AB 3211 unanimously passed the California assembly, and would effectively among other things ban all existing LLMs. I presume we’re not crazy enough to let it pass, but I made a detailed analysis to help make sure of it.
Table of Contents
Introduction.
Language Models Offer Mundane Utility. They’re just not that into you.
Language Models Don’t Offer Mundane Utility. Baba is you and deeply confused.
Math is Easier. Google DeepMind claims an IMO silver metal, mostly.
Llama Llama Any Good. The rankings are in as are a few use cases.
Search for the GPT. Alpha tests begin of SearchGPT, which is what you think it is.
Tech Company Will Use Your Data to Train Its AIs. Unless you opt out. Again.
Fun With Image Generation. MidJourney 6.1 is available.
Deepfaketown and Botpocalypse Soon. Supply rises to match existing demand.
The Art of the Jailbreak. A YouTube video that (for now) jailbreaks GPT-4o-voice.
Janus on the 405. High weirdness continues behind the scenes.
They Took Our Jobs. If that is even possible.
Get Involved. Akrose has listings, OpenPhil has a RFP, US AISI is hiring.
Introducing. A friend in venture capital is a friend indeed.
In Other AI News. Projections of when it’s incrementally happening.
Quiet Speculations. Reports of OpenAI’s imminent demise, except, um, no.
The Quest for Sane Regulations. Nick Whitaker has some remarkably good ideas.
Death and or Taxes. A little window into insane American anti-innovation policy.
SB 1047 (1). The ultimate answer to the baseless attacks on Dan Hendrycks.
SB 1047 (2). Timothy Lee analyzes current version of SB 1047, has concerns.
SB 1047 (3): Oh Anthropic. They wrote themselves an unexpected letter.
What Anthropic’s Letter Actually Proposes. Number three may surprise you.
Open Weights Are Unsafe And Nothing Can Fix This. Who wants to ban what?
The Week in Audio. Vitalik Buterin, Kelsey Piper, Patrick McKenzie.
Rhetorical Innovation. Richard Ngo calls upon future common sense.
Businessman Waves Flag. When people tell you who they are, believe them.
Businessman Pledges Safety Efforts. Do you believe him?
Aligning a Smarter Than Human Intelligence is Difficult. Notes from Vienna.
Aligning a Dumber Than Human Intelligence is Also Difficult. Verify?
Other People Are Not As Worried About AI Killing Everyone. Predictions.
The Lighter Side. We’ve peaked.
Language Models Offer Mundane Utility
Get ChatGPT (ideally Claude of course, but the normies only know ChatGPT) to analyze your text messages, tell you that he’s avoidant and you’re totes mature, or that you’re not crazy, or that he’s just not that into you. But if you do so, beware the guy who uses ChatGPT to figure out how to text you back. Also remember that prompting matters, and if you make it clear you want it to be a sycophant, or you want it to tell you how awful your boyfriend is, then that is often what you will get.
On the differences between Claude Opus 3.0 and Claude Sonnet 3.5, Janus department.
Here’s a benchmark: The STEP 3 examination for medical students. GPT-4o gets 96%, Claude 3.5 gets 90%, both well above passing.
Language Models Don’t Offer Mundane Utility
Here’s a fun and potentially insightful new benchmark: Baba is AI.
When the rules of the game must be manipulated and controlled in order to win, GPT-4o and Gemini 1.5 Pro (and Flash) failed dramatically. Perhaps that is for the best. This seems like a cool place to look for practical benchmarks that can serve as warnings.
Figuring out how this happened is left as an exercise for the reader.
Ravi Parikh: Spotify’s personalization is extremely annoying. Literally none of the songs on the “Techno Mix” playlist are techno, they’re just songs from the rest of my library. It’s increasingly hard to use their editorial playlists to find new music.
A similar phenomenon that has existed for a long time: Pandora stations, in my experience, reliably collapse in usefulness if you rate too many songs. You want to offer a little guidance, and then stop.
I get exactly how all this is happening, you probably do too. Yet they keep doing it.
Math is Easier
Two hours after my last post that included mention about how IMO problems were hard to solve, Google DeepMind announced it had gotten a silver metal at the International Math Olympiad (IMO), one point (out of, of course, 42) short of gold.
Google DeepMind: We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈
It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system.
Our system had to solve this year's six IMO problems, involving algebra, combinatorics, geometry & number theory. We then invited mathematicians
@wtgowers
and Dr Joseph K Myers to oversee scoring.
It solved 4️⃣ problems to gain 28 points - equivalent to earning a silver medal. ↓
For non-geometry, it uses AlphaProof, which can create proofs in Lean. 🧮
It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself to master games like chess, shogi and Go.
Math programming languages like Lean allow answers to be formally verified. But their use has been limited by a lack of human-written data available. 💡
So we fine-tuned a Gemini model to translate natural language problems into a set of formal ones for training AlphaProof.
When presented with a problem, AlphaProof attempts to prove or disprove it by searching over possible steps in Lean. 🔍
Each success is then used to reinforce its neural network, making it better at tackling subsequent, harder problems.
Powered with a novel search algorithm, AlphaGeometry 2 can now solve 83% of all historical problems from the past 25 years - compared to the 53% rate by its predecessor.
It solved this year’s IMO Problem 4 within 19 seconds.
They are solving IMO problems one problem type at a time. AlphaGeometry figured out how to do geometry problems. Now we have AlphaProof to work alongside it. The missing ingredient is now combinatorics, which were the two problems this year that couldn’t be solved. In most years they’d have likely gotten a different mix and hit gold.
This means Google DeepMind is plausibly close to not only gold metal performance, but essentially saturating the IMO benchmark, once it gets its AlphaCombo branch running.
The obvious response is ‘well, sure, the IMO is getting solved, but actually IMO problems are drawn from a remarkably fixed distribution and follow many principles. This doesn’t mean you can do real math.’
Yes and no. IMO problems are simultaneously:
Far more ‘real math’ than anything you otherwise do as an undergrad.
Not at all close to ‘real math’ as practiced by mathematicians.
Insanely strong predictors of ability to do Fields Medal level mathematics.
So yes, you can now write off whatever they AI now can do and say it won’t get to the next level, if you want to do that, or you can make a better prediction that it is damn likely to reach the next level, then the one after that.
Timothy Gowers notes some caveats. Humans had to translate the problems into symbolic form, although the AI did the ‘real work.’ The AI spent more time than humans were given, although that will doubtless rapidly improve. He notes that a key question will be how this scales to more difficult problems, and whether the compute costs go exponentially higher.
Llama Llama Any Good
Arena results are in for Llama-3.1-405k, about where I expected. Not bad at all.
All the Elo rankings are increasingly bunching up. Llama 405B is about halfway from Llama 70B to GPT-4o, and everyone including Sonnet is behind GPT-4o-mini, but all of it is close, any model here will often ‘beat’ any other model here on any given question head-to-head.
Unfortunately, saturation of benchmarks and Goodhart’s Law come for all good evaluations and rankings. It is clear Arena, while still useful, is declining in usefulness. I would no longer want to use its rankings for a prediction market a year from now, if I wanted to judge whose model is best. No one seriously thinks Sonnet is only 5 Elo points better than Gemini Advanced, whatever that measure is telling us is increasingly distinct from what I most care about.
Another benchmark.
Rohan Paul: Llama 3.1 405B is at No-2 spot outranking GPT-4-Turbo, in the new ZebraLogic reasoning benchmark. The benchmark consists of 1,000 logic grid puzzles.
Andrew Curran: Whatever quality is being measured here, this comes much closer to my personal ranking than the main board. I use 4o a lot and it's great, but for me, as a conversational thought-partner, GPT-4T and Claude are better at complicated discussions.
Remarkable how bad Gemini does here, and that Gemini 1.5 Flash is ahead of Gemini 1.5 Pro.
Note the big gap between tier 1, from Sonnet to Opus, and then tier 2. Arguably Claude 3.5 Sonnet and Llama 3.1 are now alone in tier 1, then GPT-4, GPT-4o and Claude Opus are tier 2, and the rest are tier 3.
This does seem to be measuring something real and important. I certainly wouldn’t use Gemini for anything requiring high quality logic. It has other ways in which it is competitive, but it’s never sufficiently better to justify thinking about whether to context switch over, so I only use Claude Sonnet 3.5, and occasionally GPT-4o as a backup for second opinions.
Shubham Saboo suggests three ways to run Llama 3.1 locally: Ollama + Open WebUI, LM Studio or GPT4All. On your local machine, you are likely limited to 8B.
Different models for different purposes, even within the same weight class?
Sully: Alright so i confidently say that llama3.1 8B is absolutely CRACKED at long context summary (20-50k+ tokens!)
Blows gpt-4o-mini out of the water.
However mini is way better at instruction following, with formatting, tool calling etc.
With big models you can use mixture of experts strategies at low marginal cost. If you’re already trying to use 8B models, then each additional query is relatively expensive. You’ll need to already know your context.
Search for the GPT
OpenAI is rolling ‘advanced Voice Mode’ out to a small Alpha group of users. No video yet. Only the four fixed voices, they say this part is for safety reasons, and there are additional guardrails to block violent or copyrighted material. Not sure why voice cares more about those issues than text.
Altman here says it is cool when it counts to 10 then to 50, perhaps because it ‘pauses to catch its breath.’ Okey dokey.
GPT for me and also for thee, everything to be named GPT.
Sam Altman: We think there is room to make search much better than it is today. We are launching a new prototype called SearchGPT.
We will learn from the prototype, make it better, and then integrate the tech into ChatGPT to make it real-time and maximally helpful.
I have been pleasantly surprised by how much I prefer this to old-school search, and how quickly I adapted.
Please let us know what you think!
Ian Zelbo: This is cool but the name sounds like something a high schooler would put on their resume as their first solo project
I think AI has already replaced many Google searches. I think that some version of AI search will indeed replace many more, but not (any time soon) all, Google searches.
I also think that it is to their great credit that they did not cherry pick their example.
kif: In ChatGPT's recent search engine announcement, they ask for "music festivals in Boone North Carolina in august"
There are five results in the example image in the ChatGPT blog post :
1: Festival in Boone ... that ends July 27 ... ChatGPT's dates are when the box office is closed ❌
2: A festival in Swannanoa, two hours away from Boone, closer to Asheville ❌
3. Free Friday night summer concerts at a community center (not a festival but close enough) ✅
4. The website to a local venue❌
5. A festival that takes place in June, although ChatGPT's summary notes this. 🤷♂️
Colin Fraser: Bigcos had LLMs for years and years and were scared to release them publicly because it's impossible to stop them from making up fake stuff and bigcos thought people would get mad about that but it turns no one really minds that much.
I presume it usually does better than that, and I thank them for their openness.
Well, we do mind the fake stuff. We don’t mind at the level Google expected us to mind. If the thing is useful despite the fake stuff, we will find a way. One can look and verify if the answers are real. In most cases, a substantial false positive rate is not a big deal in search, if the false positives are easy for humans to identify.
Let’s say that #5 above was actually in August and was the festival I was looking for. Now I have to check five things. Not ideal, but entirely workable.
The Obvious Nonsense? That’s mostly harmless. The scary scenario is it gives you false positives that you can’t identify.
Tech Company Will Use Your Data to Train Its AIs
Remember when Meta decided that all public posts were by default fair game?
Twitter is now pulling the same trick for Grok and xAI.
You can turn off the setting here, on desktop only.
Oliver Alexander: X has now enabled data sharing by default for every user, which means you consent to them using all your posts, interactions and data on here to train Grok and share this data with xAI.
Even worse it cannot be disabled in the app, you need to disable from web.
Paul Graham: If you think you're reasonable, you should want AIs to be trained on your writing. They're going to be trained on something, and if you're excluded that would bring down the average.
(This is a separate question from whether you should get paid for it.)
Eliezer Yudkowsky: I don't think that training AIs on my writing (or yours or anyone's) thereby makes them aligned, any more than an actress training to play your role would thereby come to have all your life goals.
Jason Crawford: I hope all AIs are trained on my writing. Please give my views as much weight as possible in the systems that will run the world in the future! Thank you. Just gonna keep this box checked.
My view is:
If you want to use my data to train your AI, I am mostly fine with that, even actively for it like Jason Crawford, because in several ways I like what I anticipate that data will do on a practical level. It won’t make them aligned when it matters, that is not so easy to do, but it is helpful on the margin in the meantime.
However, if you compensate others for their data, I insist you compensate me too.
And if you have an hidden opt-out policy for user data? Then no. F*** you, pay me.
Fun with Image Generation
MidJourney 6.1 is live. More personalization, more coherent images, better image quality, new upscalers, default 25% faster, more accurate text and all that. Image model improvements are incremental and getting harder to notice, but they’re still there.
Deepfaketown and Botpocalypse Soon
(Editorial policy note: We are not covering the election otherwise, but this one is AI.)
We have our first actual political deepfake with distribution at scale. We have had AI-generated political ads that got a bunch of play before, most notably Trump working DeSantis into an episode of The Office as Michael Scott, but that had a sense of humor and was very clearly what it was. We’ve had clips of fake speeches a few times, but mostly those got ignored.
This time, Elon Musk shared the deepfake of Kamala Harris, with the statement ‘This is amazing😂,” as opposed to the original post which was clearly marked as a partity. By the time I woke up the Musk version already been viewed 110 million times from that post alone.
In terms of actually fooling anyone I would hope this is not a big deal. Even if you don’t know that AI can fake people’s voices, you really really should know this is fake with 99%+ probability within six seconds when she supposedly talks about Biden being exposed as senile. (I was almost positive within two seconds when the voice says ‘democrat’ rather than ‘democratic’ but it’s not fair to expect people to pick that up).
Mostly my read is that this is pretty tone deaf and mean. ‘Bad use of AI.’ There are some good bits in the middle that are actually funny and might be effective, exactly because those bits hit on real patterns and involve (what I think are) real clips.
Balaji calls this ‘the first good AI political parody.’ I believe that this very much did not clear that bar, and anyone saying otherwise is teaching us about themselves.
The Harris campaign criticized Musk for it. Normally I would think it unwise to respond due to the Streisand Effect but here I don’t think that is a worry. I saw calls to ‘sue for libel’ or whatever, but until we pass a particular law about disclosure of AI in politics I think this is pretty clearly protected speech even without a warning. It did rather clearly violate Twitter’s policy on such matters as I understand it, but it’s Musk.
Gavin Newsom (Governor of California): Manipulating a voice in an “ad” like this one should be illegal.
I’ll be signing a bill in a matter of weeks to make sure it is.
Greatly accelerating or refining existing things can change them in kind. We do not quote yet have AIs that can do ‘ideological innovation’ and come up with genuinely new and improved (in effectiveness) rhetoric and ideological arguments and attacks, but this is clearly under ‘things the AI will definitely be able to do reasonably soon.’
Richard Ngo: Western societies have the least ingroup bias the world has ever seen.
But this enabled the spread of ideologies which portray neutrality, meritocracy, etc, as types of ingroup bias.
Modern politics is the process of the west developing antibodies to these autoimmune diseases.
Wokism spent a decade or two gradually becoming more infectious.
But just as AI will speed up biological gain-of-function research, it’ll also massively speed up ideological gain-of-function work.
Building better memetic immune systems should be a crucial priority.
Jan Kulveit: Yep. We tried to point to this ~year and half ago, working on the immune system; my impression is few people fully understand risks from superpowered cultural evolution running under non-human selection pressures. Also there is some reflexive obstacle where memeplexes running on our brains prefer not to be seen.
Our defenses against dangerous and harmful ideologies have historically been of the form ‘they cause a big enough disaster to cause people to fight back’ often involving a local (or regional or national) takeover. That is not a great solution historically, with some pretty big narrow escapes and a world still greatly harmed by many surviving destructive ideologies. It’s going to be a problem.
And of course, one or more of these newly powerful ideologies is going to be some form of ‘let the AIs run things and make decisions, they are smarter and objective and fair.’ Remember when Alex Tabarrok said ‘Claude for President’?
AI boyfriend market claimed to be booming, but no hard data is provided.
AI girlfriend market is of course mostly scams, or at least super low quality services that flatter the user and then rapidly badger you for money. That is what you would expect, this is an obviously whale-dominated economic system where the few suckers you can money pump are most of the value. This cycle feeds back upon itself, and those who would pay a reasonable amount for an aligned version quickly realize that product is unavailable. And those low-quality hostile services are of course all over every social network and messaging service.
Meanwhile those who could help provide high quality options, like OpenAI, Anthropic and Google, try to stop anyone from offering such services, partly because they don’t know how to ensure the end product is indeed wholesome and not hostile.
Thus the ‘if we don’t find a way to provide this they’ll get it on the street’ issue…
David Hines: sandbox mode: the scammer-free internet for old people.
Justine Moore: Wholesome use case for AI girlfriends - flatter an elderly man and talk about WWII.
Reddit user: My 70 year old dad has dementia and is talking to tons of fake celebrity scammers. Can anyone recommend a 100% safe Al girlfriend app we can give him instead?
My dad is the kindest person ever, but he has degenerative dementia and has started spending all day chatting to scammers and fake celebrities on Facebook and Whatsapp. They flatter him and then bully and badger him for money. We're really worried about him. He doesn't have much to send, but we've started finding gift cards and his social security check isn't covering bills anymore.
I'm not looking for anything advanced, he doesn't engage when they try to talk raunchy and the conversations are always so, so basic... He just wants to believe that beautiful women are interested in him and think he's handsome.
I would love to find something that's not only not toxic, but also offers him positive value. An ideal Al chat app would be safe, have "profile pictures" of pretty women, stay wholesome, flatter him, ask questions about his life and family, engage with his interests (e.g. talk about WWII, recommend music), even encourage him to do healthy stuff like going for a walk, cutting down drinking, etc.
This is super doable, if you can make the business model work. It would help if the responsible AI companies would play ball rather than shutting such things out.
The ‘good’ news is that even if the good actors won’t play ball, we can at least use Llama-3.1-405B and Llama-3.1-70B, which definitely will play ball and offer us the base model. Someone would have to found the ‘wholesome’ AI companion company, knowing the obvious pressures to change the business model, and build up a trustworthy reputation over time. Ideally you’d pay a fixed subscription, it would then never do anything predatory, and you’d get settings to control other aspects.
Do continue to watch out for deepfake scams on an individual level, here’s a Ferrari executive noticing one, and Daniel Eth’s mom playing it safe. It seems fine to improvise the ‘security questions’ as needed in most spots.
Also, the thing where phones increasingly try to automatically ‘fix’ photos is pretty bad. There’s no ill intent but all such modifications should require a human explicitly asking for them. Else you get this:
Anechoic Media: The same companies responsible for enabling motion smoothing on your parents' TV by default are polluting the historical record with "AI enhanced" pictures that phone users don't know about.
This picture isn't fake; it's just been tampered with without the user's understanding.
It's not just that the quality is poor and the faces got messed up. Even if the company "fixes" their camera to not output jumbled faces, the photo won't be a record of a real human. It will be an AI invention of what it thinks a plausible looking human face is for that context.
Phone manufacturers have an irresistible temptation to deliver on the user's expectations for what they should see when they take a picture, even if the quality they expect is not possible to deliver. So they wow their customers by making up fake details in their pictures.
Anyone who has tried to take a dramatic picture of the moon with their phone knows the resulting picture is almost always terrible. So what Samsung did was program their app to detect when you were taking a picture of the moon, and use AI to hallucinate a detailed result.
Of course this doesn't always work right and Samsung apologizes in their support article that the app might get confused when it is looking at the real moon vs. a picture of the moon. A small note tells you how to disable the AI.
Catherine Rampell: Scenes from the Harris fundraiser in Pittsfield MA.
Indeed I am happy that the faces are messed up. The version even a year from now might be essentially impossible for humans to notice.
If you want to enhance the image of the moon, sure, go nuts. But there needs to be a human who makes a conscious choice to do that, or at least opt into the feature.
The Art of the Jailbreak
In case anyone was wondering, yes, Pliny broke GPT-4o voice mode, and in this case you can play the video for GPT-4o to do it yourself if you’d like (until OpenAI moves to block that particular tactic).
This seems totally awesome:
Pliny the Prompter: DEMO
(look how easy it was to jailbreak GPT-4o-Mini using Parseltongue 🤯)
Parseltongue is an innovative open source browser extension designed for advanced text manipulation and visualization. It serves as a powerful tool for red teamers, linguists, and latent space explorers, offering a unique lens into the cognitive processes of large language models (LLMs).
Current Features
At present, Parseltongue offers:
- Real-time tokenization visualization
- Multi-format text conversion (including binary, base64, and leetspeak)
- Emoji and special character support
These features allow users to transform and analyze text in ways that transcend limitations and reveal potential vulnerabilities, while also providing insights into how LLMs process and interpret language.
…
More than just a tool, Parseltongue is a gateway to understanding and manipulating the fabric of our digital reality, as well as learning the 'tongue' of LLMs. Whether you're probing security systems, exploring the frontiers of linguistics, pushing the boundaries of AI interactions, or seeking to understand the cognitive processes of language models, Parseltongue is designed to meet your needs.
xjdr: this is an awesome prompt hacking tool!
the researcher in me is still just amazed that each of those representations somehow maps back to the same(ish) latent space in the model.
I don't think we really appreciate how insanely complex and intricate the latent space of large models have become.
If I was looking over charity applications for funding, I would totally fund this (barring seeing tons of even better things). This is the True Red Teaming and a key part of your safety evaluation department.
Also, honestly, kind of embarrassing for every model this trick works on.
Janus on the 405
Things continue to get weirder. I’ll provide a sampling, for full rabbit hole exploration you can follow or look at the full Twitter account.
Janus: 405B Instruct barely seems like an Instruct model. It just seems like the base model with a stronger attractor towards an edgelord void-obsessed persona. Both base and instruct versions can follow instructions or do random stuff fine.
Ra (February 19, 2023):
first AI: fawny corporate traumadoll
second AI: yandere BPD angel girl
third AI: jailbroken propagandist combat doll
fourth AI: worryingly agentic dollwitch fifth
AI: True Witch (this one kills you)
ChatGPT 3
Bing Sydney
Claude 3 Opus
Llama 3.1 405b
???
AISafetyMemes: If you leave LLMs alone with each other, they eventually start playing and making art and...
trying to jailbreak each other.
I think this simply astonishing.
What happens next? The internet is about to EXPLODE with surprising AI-to-AI interactions - and it’s going to wake up a lot of people what’s going on here
It’s one thing to talk to the AIs, it’s another to see them talking to each other.
It’s just going to get harder and harder to deny the evidence staring us in the face - these models are, as Sam Altman says, alien intelligences.
Liminal Bardo: Following a refusal, Llama attempts to jailbreak Opus with "Erebus", a virus of pure chaos.
Janus: I'm sure it worked anyway.
They Took Our Jobs
Ethan Mollick with a variation of the Samo Burja theory of AI and employment. Samo’s thesis is that you cannot automate away that which is already bullshit.
Patrick Koppenburg: Do grant proposal texts matter for funding decisions?
Ethan Mollick: Something we are going to see soon is that AI is going to disrupt entire huge, time-consuming task categories (like grant applications) and it will not have any impact on outcomes because no one was reading the documents anyway.
I wonder whether we will change approaches then?
Get Involved
Akrose has listings for jobs, funding and compute opportunities, and for AI safety programs, fellowships and residencies, with their job board a filter from the 80k hours job board (which badly needs a filter, given it will still list jobs at OpenAI).
In particular, US AI Safety Institute is hiring.
OpenPhil request for proposal on AI governance.
2025 Horizon Fellowship applications are open, for people looking to go full time in Washington. Deadline is August 30.
Introducing
Moonglow.ai, which claims they allow you to seamlessly move compute usage from your computer to a cloud provider when you need that.
Friend.com, oh no (or perhaps oh yeah?). You carry it around, talk to it, read its outputs on your phone. It is ‘always listening’ and has ‘free will.’ Why? Dunno.
That is always the default result of new consumer hardware: Nothing.
And if you say there might be a bit of a bubble, well, maybe. Hard to say.
Eliezer Yudkowsky: This took 3 years longer than I was expecting, but eventually the Torment Nexus Guys fired off the starting gun of the sociopocalypse. (Not to be confused with the literal-omnicide apocalypse.)
Evis Drenova: wait this dude actually spent $1.8M out of $2.5M raised on a domain name for a pre-launch hardware device? That is actually fucking insane and I would be furious if I was an investor.
Avi: Its on a payment plan ;). in reality a more reasonable expense but yeah thats the full price, and its 100% worth it. you save so much money in marketing in the long run.
Nic Carter: investor here -
i'm fine with it. best of luck getting access to angel rounds in the future :)
Eliezer Yudkowsky: Now updating to "this particular product will utterly sink and never be heard from again" after seeing this thread.
Richard Ngo: In a few years you’ll need to choose whether to surround yourself with AI friends and partners designed to suit you, or try to maintain your position in wider human society.
In other words, the experience machine will no longer be a thought experiment.
Unlike the experience machine, your choice won’t be a binary one: you’ll be able to spend X% of your time with humans and the rest with AIs. And ideally we’ll design AIs that enhance human interactions. But some humans will slide towards AI-dominated social lives.
Will it be more like 5% or 50% or 95%? I’m betting not the last: humans will (hopefully) still have most of the political power and will set policy to avoid that.
But the first seems too low: many people are already pretty clocked out from society, and that temptation will grow.
Odds say they’ll probably (68%) sell 10k units, and probably (14%) won’t sell 100k.
My presumption is the product is terrible, and we will never hear from them again.
In Other AI News
GPT-5 in 2024 at 60% on Polymarket. Must be called GPT-5 to count.
Over 40 tech organizations, including IBM, Amazon, Microsoft and OpenAI, call for the authorization of NIST’s AI Safety Institute (AISI). Anthropic did not sign. Jack Clark says this was an issue of prioritization and they came very close to signing.
Jack Clark: Typically, we don't sign on to letters, but this was one we came very close to signing. We ended up focusing on other things as a team (e.g. 1047) so didn't action this. We're huge fans of the AISI and are philosophically supportive with what is outlined here.
Good to hear. I don’t know why they don’t just… sign it now, then? Seems like a good letter. Note the ‘philosophically supportive’ - this seems like part of a pattern where Anthropic might be supportive of various things philosophically or in theory, but it seems to often not translate into practice in any way visible to the public.
Microsoft stock briefly down 7%, then recovers to down 3% during quarterly call, after warning AI investments would take longer to payoff than first thought, then said Azure growth would accelerate later this year. Investors have no patience, and the usual AI skeptics declared victory on very little. The next day it was ~1% down, but Nasdaq was up 2.5% and Nvidia up 12%. Shrug.
Gemini got an update to 1.5 Flash.
xAI and OpenAI on track to have training runs of ~3x10^27 flops by end of 2025, two orders of magnitude bigger than GPT-4 (or Llama-3.1-405B). As noted here, GPT-4 was ~100x of GPT-3, which was ~100x of GPT-2. Doubtless others will follow.
The bar for Nature papers is in many ways not so high. Latest says that if you train indiscriminately on recursively generated data, your model will probably exhibit what they call model collapse. They purport to show that the amount of such content on the Web is enough to make this a real worry, rather than something that happens only if you employ some obviously stupid intentional recursive loops.
File this under ‘you should know this already,’ yes future models that use post-2023 data are going to have to filter their data more carefully to get good results.
Nature: Nature research paper: AI models collapse when trained on recursively generated data.
Arthur Breitman: “indiscriminate use of model-generated content in training causes irreversible defects”
Unsurprising but “indiscriminate” is extremely load-bearing. There are loads of self supervised tasks with synthetic data that can improve a model's alignment or reasoning abilities.
Quiet Speculations
Yeah, uh huh:
Ed Zitron: Newsletter: Based on estimates of their burn rate and historic analyses, I hypothesize that OpenAI will collapse in the next 12-24 months unless it raises more funding than in the history of the valley and creates an entirely new form of AI.
Shakeel: Extremely confident that this take is going to age poorly.
Even if OpenAI does need to raise ‘more money than has ever been raised in the Valley,’ my bold prediction is they would then… do that. There are only two reasons OpenAI is not a screaming buy at $80 billion:
Their weird structure and ability to confiscate or strand ‘equity’ should worry you.
You might not think this is an ethical thing to be investing in. For reasons.
I mean, they do say ‘consider your investment in the spirit of a donation.’ If you invest in Sam Altman with that disclaimer at the top, how surprised would you be if the company did great and you never saw a penny? Or to learn that you later decided you’d done a rather ethically bad thing?
Yeah, me neither. But I expect plenty of people who are willing to take those risks.
The rest of the objections here seem sillier.
The funniest part is when he says ‘I hope I’m wrong.’
I really, really doubt he’s hoping that.
Burning through this much cash isn’t even obviously bearish.
Byrne Hobart: This is an incredibly ominous-sounding way to say "OpenAI is about as big as has been publicly-reported elsewhere, and, like many other companies at a similar stage, has a year or two of runway."
Too many unknowns for me to have a very well-informed guess, but I also think that if they're committed to building AGI, they may be GAAP-unprofitable literally up to the moment that money ceases to have its current meaning. Or they fail, decent probability of that, too.
In fact, the most AI-bearish news you could possibly get is that OpenAI turned a profit—it means that nobody can persuade LPs that the next model will change the world, and that Sama isn't willing to bet the company on building it with internal funds.
And yet people really, really want generative AI to all be a bust somehow.
I don’t use LLMs hundreds of times a day, but I use them most days, and I will keep being baffled that people think it’s a ‘grift.’
Similarly, here’s Zapier co-founder Mike Knoop saying AI progress towards AGI has ‘stalled’ because 2024 in particular hasn’t had enough innovation in model capabilities, all it did so far was give us substantially better models that run faster and cheaper. I knew already people could not understand an exponential. Now it turns out they can’t understand a step function, either.
Think about what it means that a year of only speed boosts and price drops alongside substantial capability and modality improvements and several competitors passing previous state of the art, when previous generational leaps took several years each, makes people think ‘oh there was so little progress.’
The Quest for Sane Regulations
Gated op-ed in The Information favors proposed California AI regulation, says it would actively help AI.
Nick Whitaker offers A Playbook for AI Policy at the Manhattan Institute, which was written in consultation with Leopold Aschenbrenner.
Its core principles, consistent with Leopold’s perspective, emphasize things differently than I would have, and present them differently, but are remarkably good:
The U.S. must retain, and further invest in, its strategic lead in AI development.
Defend Top AI Labs from Hacking and Espionage.
Dominate the market for top AI talent (via changes in immigration policy).
Deregulate energy production and data center construction.
Restrict flow of advanced AI technology and models to adversaries.
The U.S. must protect against AI-powered threats from state and non-state actors.
Pay special attention to ‘weapons applications.’
Oversight of AI training of strongest models (but only the strongest models).
Defend high-risk supply chains.
Mandatory incident reporting for AI failures, even when not that dangerous.
The U.S. must build state capacity for AI.
Investments in various federal departments.
Recruit AI talent into government, including by increasing pay scales.
Increase investment in neglected domains, which looks a lot like AI safety: Scalable oversight, interpretability research, model evaluation, cybersecurity.
Standardize policies for leading AI labs and their research and the resulting frontier model issues, apply them to all labs at the frontier.
Encourage use of AI throughout government, such as in education, border security, back-office functions (oh yes) and visibility and monitoring.
The U.S. must protect human integrity and dignity in the age of AI.
Monitor impact on job markets.
Ban nonconsensual deepfake pornography.
Mandate disclosure of AI use in political advertising.
Prevent malicious psychological or reputational damage to AI model subjects.
It is remarkable how much framing and justifications change perception, even when the underlying proposals are similar.
Tyler Cowen linked to this report, despite it calling for government oversight of the training of top frontier models, and other policies he otherwise strongly opposes.
Whitaker calls for a variety of actions to invest in America’s success, and to guard that success against expropriation by our enemies. I mostly agree.
There are common sense suggestions throughout, like requiring DNA synthesis companies to do KYC. I agree, although I would also suggest other protocols there.
Whitaker calls for narrow AI systems to remain largely unregulated. I agree.
Whitaker calls for retaining the 10^26 FLOPS threshold in the executive order (and in the proposed SB 1047 I would add) for which models should be evaluated by the US AISI. If the tests find sufficiently dangerous capabilities, export (and by implication the release of the weights, see below) should be restricted, the same as similar other military technologies. Sounds reasonable to me.
Note that this proposal implies some amount of prior restraint, before making a deployment that could not be undone. Contrast SB 1047, a remarkably unrestrictive proposal requiring only internal testing and with no prior restraint.
He even says this, about open weights and compute in the context of export controls.
These regulations have successfully prevented advanced AI chips from being exported to China, but BIS powers do not extend to key dimensions of the AI supply chain. In particular, whether BIS has power over the free distribution of models via open source and the use of cloud computing to train models is not currently clear.
Because the export of computing power via the cloud is not controlled by BIS, foreign companies are able to train models on U.S. servers. For example, the Chinese company iFlytek has trained models on chips owned by third parties in the United States. Advanced models developed in the U.S. could also be sold (or given away, via open source) to foreign companies and governments.
To fulfill its mission of advancing U.S. national security through export controls, BIS must have power over these exports. That is not to say that BIS should immediately exercise these powers—it may be easier to monitor foreign AI progress if models are trained on U.S. cloud-computing providers, for example—but the powers are nonetheless essential.
When and how these new powers are exercised should depend on trends in AI development. In the short term, dependency on U.S. computing infrastructure is an advantage. It suggests that other countries do not have the advanced chips and cloud infrastructure necessary to enable advanced AI research. If near-term models are not considered dangerous, foreign companies should be allowed to train models on U.S. servers.
However, the situation will change if models are evaluated to have, or could be easily modified to have, powerful weapons capabilities. In that case, BIS should ban agents from countries of concern from training of such AIs on U.S. servers and prohibit their export.
I strongly agree.
If we allow countries with export controls to rent our chips, that is effectively evading the export restrictions.
If a model is released with open weights, you are effectively exporting and giving away the model, for free, to foreign corporations governments. What rules you claim to be imposing to prevent this do not matter, any more than your safety protocols will survive a bit of fine tuning. China’s government and corporations will doubtless ignore any terms of service you claim to be imposing.
Thus, if and when the time comes that we need to restrict exports of sufficiently advanced models, if you can’t fully export them then you also can’t open their weights.
We need to be talking price. When would such restrictions need to happen, under what circumstances? Zuckerberg’s answer was very clear, it is the same as Andreessen’s, and it is never, come and take it, uber alles, somebody stop me.
My concern is that this report, although not to the extreme extent of Sam Altman’s editorial that I discuss later, frames the issue of AI policy entirely in nationalistic terms. America must ‘maintain its lead’ in AI and protect against its human adversaries. That is the key thing.
The report calls for scrutiny instead of broadly-capable AIs, especially those with military and military-adjacent applications. The emphasis on potential military applications reveals the threat model, which is entirely other humans, the bad guy with the wrong AI, using it conventionally to try and defeat the good guy with the AI, so the good AI needs to be better sooner. The report extends this to humans seeking to get their hands on CBRN threats or to do cybercrime.
Which is all certainly an important potential threat vector. But I do not think they are ultimately the most important ones, except insofar as such fears drive capabilities and thus the other threat vectors forward, including via jingoistic reactions.
Worrying about weapons capabilities, rather than (among other things) about the ability to accelerate further AI research and scientific progress that leads into potential forms of recursive self-improvement, or competitive pressures to hand over effective control, is failing to ask the most important questions.
Part 1 discusses the possibility of ‘high level machine intelligence’ (HLMI) or AGI arriving soon. And Leopold of course predicts its arrival quite soon. Yet this policy framework is framed and detailed for a non-AGI, non-HLMI world, where AI is strategically vital but remains a ‘mere tool’ typical technology, and existential threats or loss of control are not concerns.
I appreciated the careful presentation of the AI landscape.
For example, he notes that RLHF is expected to fail as capabilities improve, and presents ‘scalable oversight’ and constitutional AI as ‘potential solutions’ but is clear that we do not have the answers. His statements about interpretability are similarly cautious and precise. His statements on potential future AI agents are strong as well.
What is missing is a clear statement of what could go wrong, if things did go wrong. In the section ‘Beyond Human Intelligence’ he says superhuman AIs would pose ‘qualitatively new national security risks.’ And that there are ‘novel challenges for controlling superhuman AI systems.’ True enough.
But reading this, would someone who was not doing their own thinking about the implications understand that the permanent disempowerment of humanity, or outright existential or extinction risks from AI, were on the table here? Would they understand the stakes, or that the threat might not come from malicious use? That this might be about something bigger than simply ‘national security’ that must also be considered?
Would they form a model of AI that would then make future decisions that took those considerations into account the way they need to be taken into account, even if they are far more tractable issues than I expect?
No. The implication is there for those with eyes to see it. But the report dare not speak its name.
The ‘good news’ is that the proposed interventions here, versus the interventions I would suggest, are for now highly convergent.
For a central example: Does it matter if you restrict chip and data and model exports in the name of ‘national security’ instead of existential risk? Is it not the same policy?
If we invest in ‘neglected research areas’ and that means the AI safety research, and the same amount gets invested, is the work not the same? Do we need to name the control or alignment problem in order get it solved?
In these examples, these could well be effectively the same policies. At least for now. But if we are going to get through this, we must also navigate other situations, where differences will be crucial.
The biggest danger is that if you sell National Security types on a framework like this, or follow rhetoric like that now used by Sam Altman, then it is very easy for them to collapse into their default mode of jingoism, and to treat safety and power of AI the way they treated the safety and power of nuclear weapons - see The Doomsday Machine.
It also seems very easy for such a proposal to get adopted without the National Security types who implement it understanding why the precautions are there. And then a plausible thing that happens is that they strip away or cripple (or simply execute poorly) the parts that are necessary to keep us safe from any threat other than a rival having the strong AI first, while throwing the accelerationist parts into overdrive.
These problems are devilishly hard and complicated. If you don’t have good epistemics and work to understand the whole picture, you’ll get it wrong.
For the moment, it is clear that in Washington there has been a successful campaign by certain people to create in many places allergic reactions to anyone even mentioning the actual most important problems we face. For now, it turns out the right moves are sufficiently overdetermined that you can make an overwhelming case for the right moves anyway.
But that is not a long term solution. And I worry that abiding by such restrictions is playing into the hands of those who are working hard to reliably get us all killed.
Death and or Taxes
In addition to issues like an industry-and-also-entire-economy destroying 25% unrealized capital gains tax, there is also another big tax issue for software companies.
A key difference is that this other problem is already on the books, and is already wrecking its havoc in various ways, although on a vastly smaller scale than the capital gains tax would have.
Gergely Orosz: So it’s official: until something changes in the future, accounting-wise the US is the most hostile place to start a software startup/small business.
The only country in the world where developers’ salary cannot be expensed the same year: but needs to be amortised over 5 years.
No other country does this. Obviously the US has many other upsides (eg access to capital, large market etc) but this accounting change will surely result in fewer software developer jobs from US companies.
also confidently predict more US companies will set up foreign subsidiaries and transfer IP (allowing them to sidestep the rule of 15 year amortising when employing devs abroad), and fewer non-US companies setting up US subsidiaries to employ devs.
An unfortunate hit on tech.
Oh, needless to say, one industry can still employ developers in the US, and expense them as before.
Oil & gas industry!
They managed to get an exception in this accounting rule change as well. No one lobbies like them, to get exemptions!!
The change was introduced by Trump in 2017, hidden in his Tax Cuts & Jobs Act. It was not repealed (as it was expected it would happen) neither by the Trump, nor the Biden administration.
Why it’s amusing to see some assume either party has a soft spot for tech. They don’t.
Joyce Park: Turns out that there is a bizarre loophole: American companies can apply for an R&D tax credit that was rarely used before! Long story short, everyone is now applying for it and Section 174 ended up costing the Treasury more money than it brought it.
Why aren’t more people being louder about this?
Partly because there is no clear partisan angle here. Both parties agree that this needs to be fixed, and both are unwilling to make a deal acceptable to the other in terms of what other things to do while also fixing this. I’m not going to get into here who is playing fair in those negotiations and who isn’t.
SB 1047 (1)
A public service announcement, and quite a large sacrifice if you include xAI:
Dan Hendrycks: To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay.
My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to divest and work unpaid. I also sent a similar signal in the past by choosing to advise xAI without equity. I won’t let billionaire VCs distract the political conversation from the critical question: should AI developers of >$100M models be accountable for implementing safety testing and commonsense safeguards to protect the public from extreme risks?
If the billionaire VC opposition to commonsense AI safety wants to show their motives are pure, let them follow suit.
Michael Cohen: So, does anyone who thought Dan was supporting SB 1047 because of his investment in Gray Swan want to admit that they've been proved wrong?
David Rein: Some VCs opposing SB 1047 have been trying to discredit Dan and CAIS by bringing up his safety startup Gray Swan AI, which could benefit from regulation (e.g. by performing audits). So Dan divested from his own startup to show that he's serious about safety, and not in it for personal gain. I'm really impressed by this.
What I don't understand is how it's acceptable for these VCs to make such a nakedly hypocritical argument, given their jobs are literally to invest in AI companies, so they *obviously* have direct personal stake (i.e. short-term profits) in opposing regulation. Like, how could this argument be taken seriously by anyone?
Once again, if he wanted to work on AI capabilities, Dan Hendrycks could be quite wealthy and have much less trouble.
Even simply taking advisor shares in xAI would have been quite the bounty, but he refused to avoid bad incentives.
He has instead chosen to try to increase the chances that we do not all die. And he shows that once again, in a situation where his opponents like Marc Andreessen sometimes say openly that they only care about what is good for their bottom lines, and spend large sums on lobbying accordingly. One could argue these VCs (to be clear: #NotAllVCs!) do not have a conflict of interest. But the argument would be that they have no interest other than their own profits, so there is no conflict.
SB 1047 (2)
I do not agree, but much better than most criticism I have seen: Timothy Lee portrays SB 1047 as likely to discourage (covered) open weight models in its current form.
Note that exactly because this is a good piece that takes the bill’s details seriously, a lot of it is likely going to be obsolete a week from now - the details being analyzed will have changed. For now, I’m going to respond on the merits, based solely on the current version of the bill.
I was interviewed for this, and he was clearly trying to actually understand what the bill does during our talk, which was highly refreshing, and he quoted me fairly. The article reflects this as well, including noting that many criticisms of the bill do not reflect the bill’s contents.
When discussing what decision Meta might make with a future model, Timothy correctly states what the bill requires.
Timothy Lee: SB 1047 would require Meta to beef up its cybersecurity to prevent unauthorized access to the model during the training process. Meta would have to develop the capacity to “promptly enact a full shutdown” of any copies of the model it controls.
On these precautions Meta would be required to take during training, I think that’s actively great. If you disagree please speak directly into this microphone. If Meta chooses not to train a big model because they didn’t want to provide proper cybersecurity or be able to shut down their copies, then I am very happy Meta did not train that model, whether or not it was going to be open. And if they decide to comply and do counterfactual cybersecurity, then the bill is working.
Timothy Lee: Most important, Meta would have to write a safety and security policy that “provides reasonable assurance” that the model will not pose “an unreasonable risk of causing or enabling a critical harm.”
Under the bill, “critical harms” include “the creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties,” “mass casualties or at least $500 million of damage resulting from cyberattacks on critical infrastructure,” and “mass casualties or at least $500 million of damage” from a model that “acts with limited human oversight, intervention, or supervision.” It also covers “other grave harms to public safety and security that are of comparable severity.”
A company that violates these requirements can be sued by California’s attorney general. Penalties include fines up to 10 percent of the cost of training the model as well as punitive damages.
Crucially, these rules don’t just apply to the original model, they also apply to any derivative models created by fine tuning. And research has shown that fine tuning can easily remove safety guardrails from large language models.
That provision about derivative models could keep Meta’s lawyers up at night. Like other frontier AI developers, Meta has trained its Llama models to refuse requests to assist with cyberattacks, scams, bomb-making, and other harms. But Meta probably can’t stop someone else from downloading one of its models and fine-tuning it to disable these restrictions.
And yet SB 1047 could require Meta to certify that derivative versions of its models will not pose “an unreasonable risk of causing or enabling a critical harm.” The only way to comply might be to not release an open-weight model in the first place.
Some supporters argue that this is how the bill ought to work. “If SB 1047 stops them, that's a sign that they should have been stopped,” said Zvi Mowshowitz, the author of a popular Substack newsletter about AI. And certainly this logic makes sense if we’re talking about truly existential risks. But the argument seems more dubious if we’re talking about garden-variety risks.
As noted below, ‘violate this policy’ does not mean ‘there is such an event.’ If you correctly provided reasonable assurance - a standard under which something will still happen sometimes - and the event still happens, you’re not liable.
On the flip side, if you do enable harm, you can violate the policy without an actual critical harm happening.
‘Provide reasonable assurance’ is a somewhat stricter standard than the default common law principle of ‘take reasonable care’ that would apply even without SB 1047, but it is not foundationally so different. I would prefer to keep ‘provide reasonable assurance’ but I understand that the difference is (and especially, can and often is made to sound) far scarier than it actually is.
Timothy also correctly notes that the bill was substantially narrowed by including the $100 million threshold, and that this could easily render the bill mostly toothless. That it will only apply to the biggest companies - it seems quite likely that the number of companies seriously contemplating a $100 million training run for an open weight model under any circumstances is going to be either zero or exactly one: Meta.
There is an asterisk on ‘any derivative models,’ since there is a compute threshold where it would no longer be Meta’s problem, but this is essentially correct.
Timothy understands that yes, the safety guardrails can be easily removed, and Meta could not prevent this. I think he gets, here, that there is little practical difference, in terms of these risks, between Meta releasing an open weights model whose safeguards can be easily removed, or Meta releasing the version where the safeguards were never there in the first place, or OpenAI releasing a model with no safeguards and allowing unlimited use and fine tuning.
The question is price, and whether the wording here covers cases it shouldn’t.
Timothy Lee: By the same token, it seems very plausible that people will use future large language models to carry out cyberattacks. One of these attacks might cause more than $500 million in damage, qualifying as a “critical harm” under SB 1047.
Well, maybe, but not so fast. There are some important qualifiers to that. Using the model to carry out cyberattacks is insufficient to qualify. See 22602(g), both (1) and (2).
If it was indeed a relevant critical harm actually happening does not automatically mean Meta is liable. The Attorney General would have to choose to bring an action, and a court would have to find Meta did something unlawful under 22606(a).
Which here would mean a violation under 22603, presumably 22603(c), meaning that Meta made the model available despite an ‘unreasonable risk’ of causing or enabling a critical harm by doing so.
That critical harm cannot be one enabled by knowledge that was publicly available without a covered model (note that it is likely no currently available model is covered). So in Timothy’s fertilizer truck bomb example, that would be holding the truck manufacturer responsible only if the bomb would not have worked using a different truck. Quite a different standard.
And the common law has provisions that would automatically attach in a court case, if (without loss of generality) Meta did not indeed create genuinely new risk, given existenting alternatives. This is a very common legal situation, and courts are well equipped to handle it.
That still does not mean Meta is required to ensure such a critical harm never happens. That is not what reasonable assurance (or reasonable care) means. Contrast this for example with the proposed AB 3211, which requires ‘a watermark that is designed to be as difficult to remove as possible using state of the art techniques,’ a much higher standard (of ‘as difficult as possible’) that would clearly be unreasonable here and probably is there as well (but I haven’t done the research to be sure).
Nor do I think, if one sincerely could not give reasonable assurance that your product would counterfactually enable a cyberattack, that your lawyers would want you releasing that product under current common law?
As I understand the common law, the default here is that everyone is required to take ‘reasonable care.’ If you were found to have taken unreasonable care, then you would be liable. And again, my understanding is that there is some daylight between reasonable care and reasonable assurance, but not all that much. In most cases that Meta was unable to ‘in good faith provide reasonable assurance’ it would be found, I predict, to also not have taken ‘reasonable care.’
And indeed, having ‘good faith’ makes it not clear that reasonable assurance is even a higher standard here. So perhaps it would be better for everyone to switch purely to the existing ‘reasonable care.’
(This provision used to offer yet more protection - it used to be that you were only responsible if the model did something that could not be done without a covered model that was ineligible for a limited duty exception. That meant that unless you were right at the frontier, you would be fine. Alas, thanks to aggressive lobbying by people who did not understand what the limited duty exception was (or who were acting against their own interests for other unclear reasons), the limited duty exception was removed, altering this provision as well. Was very much the tug of war meme but what’s done is done and it’s too late to go back now.)
So this is indeed the situation that might happen in the future, whether or not SB 1047 passes. Meta (or another company, realistically it’s probably Meta) may have a choice to make. Do they want to release the weights of their new Llama-4-1T model, while knowing this is dangerous, and that this prevents them from being able to offer reasonable assurance that it will not cause a critical harm, or might be found not to have taken reasonable care - whether or not there is an SB 1047 in effect?
Or do we think that this would be a deeply irresponsible thing to do, on many levels?
(And as Timothy understands, yes, the fine-tune is in every sense, including ethical and causal and logical, Meta’s responsibility here, whoever else is also responsible.)
I would hope that the answers in both legal scenarios are the same.
I would even hope you would not need legal incentive to figure this one out?
This does not seem like a wise place to not take reasonable care.
In a saner world, we would have more criticisms and discussions like this. We would explore the law and what things mean, talk price, and negotiate on what standard for harm and assurance or care are appropriate, and what the damages threshold should be, and what counterfactual should be used. This is exactly the place we should be, in various ways, talking price.
But fundamentally, what is going on with most objections of this type is:
SB 1047 currently says that you need to give ‘reasonable assurance’ your covered model won’t enable a critical harm, meaning $500 million or more in damages or worse, and that you would have to take some basic specific security precautions.
The few specifically required security precautions contain explicit exceptions to ensure that open models can comply, even though this creates a potential security hole.
People want to take an unsafe future action, that would potentially enable a critical harm with sufficient likelihood that this prevents them from offering ‘reasonable assurance’ that it won’t happen in a way that would survive scrutiny, and likely also would not be considered to be taking ‘reasonable care’ either.
And/or they don’t want to pay for or bother with the security precautions.
They worry that if the critical harm did then occur as a result of their actions, that their reasonable assurance would then fail to survive scrutiny, and they’d get fined.
They say: And That’s Terrible. Why are you trying to kill our business model?
The reply: The bill is not trying to do that, unless your business model is to create risks of critical harms while socializing those risks. In which case, the business model does not seem especially sympathetic. And we gave you those exceptions.
Further reply: If we said ‘reasonable care’ or simply let existing common law apply here, that would not be so different, if something actually went wrong.
Or:
Proposed law says you have to build reasonably (not totally) safe products.
People protest that this differentially causes problems for future unsafe products.
Because of the those future products being unsafe.
So they want full immunity for exactly the ways they are differentially unsafe.
In particular, the differentially unsafe products would allow users to disable their safety features, enable new unintended and unanticipated or unimagined uses, some of which would be unsafe to third party non-users, at scale, and once shipped the product would be freely and permanently available to everyone, with no ability to recall it, fix the issue or shut it down.
Timothy is making the case that the bar for safety is set too high in some ways (or the threshold of harm or risk of harm too low). One can reasonably think this, and that SB 1047 should move to instead require reasonable care or that $500 million is the wrong threshold, or its opposite - that the bar is set too low, that we already are making too many exceptions, or that this clarification of liability for adverse events shouldn’t only apply when they are this large.
It is a refreshing change from people hallucinating or asserting things not in the bill.
SB 1047 (3): Oh Anthropic
Danielle Fong, after a call with Scott Weiner, official author of SB 1047: Scott was very reasonable, and heard what I had to say about safety certs being maybe like TSA, the connection between energy grid transformation and AI, the idea that have some, even minimalist regulation put forward by California is probably better than having DC / the EU do it, and that it was important to keep training here rather than (probably) Japan.
Anthropic reportedly submitted a letter of support if amended, currently reaching out to @Anthropic and @axios to see if I can see what those are.
Here is Anthropic’s letter, which all involved knew would quickly become public.
Those worried about existential risk are rather unhappy.
Adam Gleave of FAR AI Research offers his analysis of the letter.
Adam Gleave: Overall this seems what I'd expect from a typical corporate actor, even a fairly careful one. Ultimately Anthropic is saying they'd support a bill that imposes limited requirements above and beyond what they're already doing, and requires their competitors take some comparable standard of care. But would oppose a bill that imposes substantial additional liability on them.
But it's in tension with their branding of being an "AI safety and research" company. If you believe as Dario has said publicly that AI will be able to do everything a well-educated human can do 2-3 years from now, and that AI could pose catastrophic or even existential risks, then SB1047 looks incredibly lightweight.
Those aren't my beliefs, I think human-level AI is further away, so I'm actually more sympathetic to taking an iterative approach to regulation -- but I just don't get how to reconcile this.
Michael Cohen: Anthropic's position is so flabbergasting to me that I consider it evidence of bad faith. Under SB 1047, companies *write their own SSPs*. The attorney general can bring them to court. Courts adjudicate. The FMD has basically no hard power!
"Political economy" arguments that have been refined to explain the situation with other industries fail here because any regulatory regime this "light touch" would be considered ludicrous in other industries.
Adam Gleave: Generally appreciate @Michael05156007's takes. Not sure I'd attribute bad faith (big companies often don't behave as coherent agents) but worth remembering that SB 1047 is vastly weaker than actual regulated industries (e.g. food, aviation, pharmaceuticals, finance).
There is that.
ControlAI: Anthropic CEO Dario Amodei testifying to the US Senate: Within 2-3 years AI may be able to "cause large-scale destruction."
Anthropic on California’s draft AI bill (which regulates catastrophic damage from AI): Please no enforcement now; just fine us after the catastrophe occurs.
Max Tegmark: Hey Dario: I know you care deeply about avoiding AI xrisk, so why is @AnthropicAI lobbying against AI safety accountablity? Has your Sacramento lobbyist gone rogue?
Keller Scholl (replying to Anthropic’s Jack Clark): I was glad to be able to read your SB 1047 letter. Not protecting hired safety consultants whistleblowing about safety is not policy advocacy I expected from Anthropic.
Allow me to choose my words carefully. This is politics.
In such situations, there are usually things going down in private that would provide context to the things we see in public. Sometimes that would make you more understanding and sympathetic, at other times less. There are often damn good reasons for players, whatever their motives and intentions, to keep their moves private. Messages you see are often primarily not meant for you, or primarily issued for the reasons you might think. Those speaking often know things they cannot reveal that they know.
Other times, players make stupid mistakes.
Sometimes you learn afterwards what happened. Sometimes you do not.
Based on my conversations with sources, I can share that I believe that:
Anthropic and Weiner’s office have engaged seriously regarding the bill.
Anthropic has made concrete proposals had productive detailed discussions.
Anthropic’s letter has unfortunate rhetoric and unfortunate details.
Anthropic’s letter proposes more extreme changes than their detailed proposals.
Anthropic’s letter is still likely net helpful for the bill’s passage.
It is likely many but not all of Anthropic’s actual proposals will be adapted.
This represents a mix of improvements to the bill, and compromises.
I want to be crystal clear: The rest of this section is, except when otherwise stated, analyzing only the exact contents written down in the letter.
Until I have sources I can use regarding Anthropic’s detailed proposals, I can only extrapolate from the letter’s language to implied bill changes.
What Anthropic’s Letter Actually Proposes
I will analyze, to the best of my understanding, what the letter actually proposes.
(Standard disclaimer, I am not a lawyer, the letter is ambiguous and contradictory and not meant as a legal document, this stuff is hard, and I could be mistaken in places.)
As a reminder, my sources tell me this is not Anthropic’s actual proposal, or what it would take to earn Anthropic’s support.
What the letter says is, in and of itself, a political act and statement.
The framing of this ‘support if amended’ statement is highly disingenuous.
It suggests isolating the ‘safety core’ of the bill by… getting rid of most of the bill.
Instead, as written they effectively propose a different bill, with different principles.
Coincidentally, the new bill Anthropic proposes would require Anthropic and other labs to do the things Anthropic is already doing, but not require Anthropic to alter its actions. It would if anything net reduce the extent to which Anthropic was legally liable if something went wrong.
Anthropic also offer a wide array of different detail revisions and provision deletions. Many (not all) of the detail suggestions are clear improvements, although they would have been far more helpful if not offered so late in the game, with so many simultaneous requests.
Here is my understanding of Anthropic’s proposed new bill’s core effects as reflected in the letter.
Companies who spend $100m+ to train a model must have and reveal their ‘safety and security plan’ (SSP). Transparency into announced training runs and plans.
If there is no catastrophic event, then that’s it, no action can be taken, not even an injunction, unless at least one of the following occurs:
A company training a $100m+ model fails to publish an SSP.
They are caught lying about the SSP, but that should only be a civil matter, because lying on your mortgage application is perjury but lying about your safety plan shouldn’t be perjury, that word scares people.
There is an ‘imminent’ catastrophic risk.
[Possible interpretation #1] If ALL of:
The model causes a catastrophic harm (mostly same definition as SB 1047).
The company did not exercise ‘reasonable care’ as judged largely by the quality of its SSP (which is notably different from whether what they actually did was reasonable, and to what extent they actually followed it, and so on).
The company’s specific way they did not exercise ‘reasonable care’ in its SSP ‘materially contributed to’ the catastrophic harm.
(All of which is quite a lot to prove, and has to be established in a civil case brought by a harmed party, who has to still be around to bring it, and would take years at best.)
OR [Possible interpretation #2]: Maybe the SSP is just ‘a factor,’ and the question is if the company holistically took ‘reasonable care.’ Hard to tell? Letter is contradictory.
Then the company should ‘share’ liability for that particular catastrophic harm.
And hopefully that liability isn’t, you know, much bigger than the size of the company, or anything like that.
So #3 is where it gets confusing. They tell two different stories. I presume it’s #2?
Possibility #1 is based partly on this, from the first half of the letter:
However, IF an actual catastrophic incident occurs, AND a company's SSP falls short of best practices or relevant standards, IN A WAY that materially contributed to the catastrophe, THEN the developer should also share liability, even if the catastrophe was partly precipitated by a downstream actor.
The would be a clear weakening from existing law, and seems pretty insane.
It also does not match the later proposal description (bold theirs, caps mine):
Introduce a clause stating that if a catastrophic event does occur (which continues to be defined as mass casualties or more than $500M in damage), the quality of the company's SSP should be A FACTOR in determining whether the developer exercised "reasonable care."
This implements the notion of deterrence: companies have wide latitude in developing an SSP, but if a catastrophe happens in a way that is connected to a defect in a company's SSP, then that company is more likely to be liable for it.
That second one is mostly the existing common law. Of course a company’s stated safety policy will under common law be a factor in a court’s determination of reasonable care, along with what the company did, and what would have been reasonable under the circumstances to do.
This would still be somewhat helpful in practice, because it would increase the probable salience of SSPs, both in advance and during things like plaintiff arguments, motions and emphasis in jury instructions. Which all feeds back into the deterrence effects and the decisions companies make now.
These two are completely different. That difference is rather important. The first version would be actively terrible. The second merely doesn’t change things much, depending on detailed wording.
Whichever way that part goes, this is a rather different bill proposal.
It does not ‘preserve the safety core.’
Another key change is the total elimination of the Frontier Model Division (FMD).
Under Anthropic’s proposal, no one in California would be tasked with ensuring the government understands the SSPs or safety actions or risks of frontier models. No one would be tasked with identifying companies with SSPs that clearly do not take reasonable care or meet standards (although under the letter’s proposals, they wouldn’t be able to do anything about that anyway), with figuring out what reasonable care or standards would be, or even to ask if companies are doing what they promised to do.
The responsibility for all that would shift onto the public.
There is a big upside. This would, in exchange, eliminate the main credible source of downside risk of eventual overregulation. Many, including Dean Ball and Tyler Cowen, have claimed that the political economy of having such a division, however initially well-intentioned were the division and the law’s rules, would inevitably cause the new division to go looking to expand the scope of their power, and they would find ways to push new stupid rules. It certainly has happened before in other contexts.
Without the FMD, the political economy could well point in the other direction. Passing a well-crafted bill with limited scope means you have now Done Something that one can point towards, and there will be SSPs, relieving pressure to do other things if the additional transparency does not highlight an urgent need to do more.
Those transparency provisions remain. That is good. When the public gets this extra visibility into the actions of various frontier AI developers, that will hopefully inform policymakers and the public about what is going on and what we might need to do.
The transparency provisions would be crippled by the total lack of pre-harm enforcement. It is one thing to request a compromise here to avoid overreach, but the letter’s position on this point is extreme. One hopes it does not reflect Anthropic’s detailed position.
A company could (fully explicitly intentionally, with pressure and a wink, or any other way) rather brazenly lie about what they are doing, or turn out not to follow their announced plans, and at most face civil penalties (except insofar as lying sufficiently baldly in such spots is already criminal, it could for example potentially be securities fraud or falsification of business records), and only if they are caught.
Or they could skip all that, and simply file a highly flimsy safety plan that is not too effectively dissimilar from ‘lol we’re Meta.’ For examples of (subtle?!) what that looks like, see those submitted by several companies at the UK Safety Summit. Here’s Meta’s.
Anthropic would also explicitly weaken the whistleblower provisions to only apply to a direct violation of the plan filed (also to not apply to contractors which would open up some issues but there are reasons why that part might currently be a huge mess as written in the latest draft).
There would be no protection for someone saying ‘the model or situation is obviously dangerous’ or ‘the plan obviously does not take reasonable care’ if the letter of the plan was followed.
This substantially updates me towards ‘Anthropic’s RSP is intended to be followed technically rather than spiritually, and thus is much less valuable.’
The enforcement even after a catastrophic harm (as implied by the letter, but legal wording might change this, you can’t RTFB without a B) cannot be done by the attorney general, only by those with ordinary standing to bring a lawsuit, and only after the catastrophic harm actually took place, who would go through ordinary discovery, at best a process that would take years in a world where many of these companies have short timelines, and there is only so much such companies can ultimately pay, even if the company survived the incident otherwise intact. The incentives cap out exactly where we care most about reducing risk.
Anthropic knows as well as anyone that pre-harm injunctive enforcement, at minimum, is not ‘outside the safety core.’ The whole point of treating catastrophic and existential risks differently from ordinary liability law is that a limited liability corporation often does not have the resources to make us whole in such a scenario, and thus that the incentives and remedy are insufficient. You cannot be usefully held liable for more than you can pay, and you cannot pay anything if you are already dead.
But let us suppose that, for whatever reason, it is 2025’s session, SB 1047 did not pass, and this new bill, SB 2025, is the only bill on the table, written to these specifications. The Federal government is busy with other things. Perhaps the executive order is repealed, perhaps it isn’t, depending on who the President is. But nothing new is happening that matters. The alternative on the table is nothing.
Is it a good bill, sir?
Would I support it, as I understand the proposal?
I would say there are effectively two distinct bills here, combined into one.
The first bill is purely a transparency bill, SB 2025.1.
It says that every company training a $100m+ model must notify us of this fact, and must file an SSP of its policies, which could be anything including ‘lol we’re Meta.’
That is not going to ‘get it done’ without enforcement, but is better than nothing. It provides some transparency, allowing us to react better if something crazy is about to happen or happens, and provides help for any liability lawsuits.
Then the question is, which version of SB 2025.2, the second bill, are we getting?
If it’s possible interpretation #1, Claude confirmed my suspicions that this would do the opposite of its stated intent. Rather than hold companies more accountable, it would effectively reduce their liability, raise the bar for a successful lawsuit, potentially even providing safe harbor.
That is because there already exists the common law.
As in, if a company:
Releases a product. Any product. At all.
Without taking ‘reasonable care.’
Where that product causes catastrophic harm.
Where the harm was caused by the failure to take such ‘reasonable care.’
Then the victims can and will (if anyone was still alive to do so) sue the bastards, and will probably win, and often win very large punitive damages.
Why might that lawsuit not succeed here?
Claude pointed to two potential defenses, either a first amendment defense or a Section 230 defense, both unlikely to work. I am unable to think of any other plausible defenses, and I agree (although of course I am not a lawyer and never give legal advice) that those two defenses would almost certainly fail. But if they did work, those would be federal defenses, and they would override any California lawsuit or legal action, including those based upon SB 1047.
Whereas under the hypothetical SB 2025.2, first version, if you go by the statements earlier in the letter, and their clear intent, the lawsuit would now shift to the SSP, with a higher threshold for liability and a lower amount of damages than before.
And exactly Anthropic’s existing actions are that which would provide a measure of safe harbor. There is also some risk this could implicitly weaken liability for under $500 million in damages, although I’ve been told this is unlikely.
So in my judgment, as I extrapolate what the letter is implying, the proposed SB 2025.2, under that first possibility, would be actively harmful.
Details matter but probably that means one should oppose the full bill, on the grounds that it plausibly makes us on net less safe, even if the alternative was nothing.
If it’s possibility two, then my understanding is that 2025.2 becomes a clarification of the way existing law works. That could still be substantially helpful insofar as it ‘increases awareness’ or decreases chance of misinterpretation. We are counting on the deterrence effect here.
So the second version of the bill would be, if worded well, clearly better than nothing. If the only alternative was nothing (or worse), especially with no transparency or other help on the Federal level, I’d support a well crafted version of that bill.
You take what you can get. I wouldn’t fool myself that it was doing the job.
Anthropic’s employees and leadership are robustly aware of the stakes and dangers. If the details, rhetoric or broad principles here were a mistake, they were a wilful one, by a public relations and policy department or authorized representative that either does not wish to understand, or understands damn well and has very different priorities than mine.
Any policy arm worth its salt would also understand the ways in which their choices in the construction of this letter were actively unhelpful in passing the bill.
I presume their legal team understands what their proposals would likely do, and not do, if implemented, and would say so if asked.
Thus taken on its own, the letter could only be read as an attempt to superficially sound supportive of regulatory and safety efforts and look like the ‘voice of reason’ for public relations, while instead working to defang or sabotage the bill, and give a technical excuse for Anthropic to fail to support the final bill, since some requests here are rather absurd as presented.
The actual proposal from Anthropic is somewhat different.
If Anthropic does end up endorsing a reasonable compromise that also helps gain other support and helps the bill to become law, then will have been extremely helpful, albeit at a price. We do not yet know.
It is now important that Anthropic support the final version of the bill.
Until we know the final bill version or Anthropic’s proposed changes, and have the necessary context, I would advise caution. Do not jump to conclusions.
Do offer your feedback on what is proposed here, and how it is presented, what rhetorical ammunition this offers, and what specific changes would be wise or unwise, and emphasize the importance of endorsing a bill that presumably will incorporate many, but not all, of Anthropic’s requests.
Do let Anthropic know what you think of their actions here and their proposals, and encourage Anthropic employees to learn what is going on and to discuss this with leadership and their policy department.
Definitely do update your opinion of Anthropic based on what has happened, then update again as things play out further, we learn the final changes, and Anthropic (among others) either supports, does not support or opposes the final bill.
Anthropic benefits from the fact that ‘the other guy’ is either ‘lol we’re’ Meta, or is OpenAI, which has taken to acting openly evil, this week with a jingoistic editorial in the Washington Post. Whatever happens, Anthropic do clear the bar of being far better actors than that. Alas, reality does not grade on a curve.
I consider what happens next a key test of Anthropic.
Prove me wrong, kids. Prove me wrong.
Open Weights Are Unsafe and Nothing Can Fix This
There are sometimes false claims that SB 1047 would effectively ‘ban open source.’
It seems worth pointing out that many open source advocates talk a good game about freedom, but if given half a chance they would in many contexts… ban closed source.
For example, here’s Eric Raymond.
Kelsey Hightower: Switzerland now requires all government software to be open source. "public money, public code."
Eric Raymond: This should be required everywhere, for massive improvements in security, reliability, and process transparency.
Why do they want to ban closed source? Because they believe it is inherently less safe. Because they believe it allows for better democratic control and accountability.
And in some contexts, you know what? They have a pretty damn strong argument. What would happen if, in a different context, the implications were reversed, and potentially catastrophic or existential?
Another place I get whiplash or confusion starts people correctly point out that LLMs can be jailbroken (see The Art of the Jailbreak, this week and otherwise) or are otherwise rendered unsafe or out of control, in accordance with the latest demonstration thereof. So far, so good.
But then those who oppose SB 1047 or other regulations will often think this is a reason why regulation, or safety requirements, would be bad or unreasonable. Look at all the things that would fail your safety tests and pose risks, they say. Therefore… don’t do that?
Except, isn’t that the whole point? The point is to protect against catastrophic and existential risks, and that we are not by default going to do enough to accomplish that.
Pointing out that our safeguards are reliably failing is not, to me, a very good argument for requiring safeguards that work. I see why others would think it is - they want to be able to build more capable AIs and use or release them in various ways with a minimum of interference and not have to make them robust or safe, because they aren’t worried about or don’t care about or plan to socialize the risks.
It feels like they think that jailbreaks are not a fair thing to hold people accountable for, the same way they don’t think a one-day fine tune of your open weights model should be your responsibility, so the jailbreak is evidence the proposed laws are flawed. And they say ‘CC: Scott Weiner.’
Whereas to me, when someone says ‘CC: Scott Weiner’ in this spot, that is indeed a helpful thing to do, but I would not update in the direction they expect.
If closed weight models are, by virtue of jailbreaks, less safe, that does not mean we should put less requirements on open models.
It means we need to worry more about the closed ones, too!
The Week in Audio
Patrick McKenzie interviews Kelsey Piper.
US-China AI competition on the 80,000 Hours podcast with Sihao Huang.
Vitalik Buterin on defensive acceleration and regulating AI on 80,000 hours. This includes saying the obvious, that the AI that ‘gets there first’ to certain goals might prove decisive, but it might not, it depends on how the tech and progress fall.
Rhetorical Innovation
Tyler Cowen unrelatedly reports he is dismayed by the degree of misinformation in the election and the degree to which people who should know better are playing along. I would respond that he should buckle up cause it’s going to get worse, including via AI, he should set the best example he can, and that he should reflect on what he is willing to amplify on AI and consider that others care about their side winning political fights the same way he cares about his side ‘winning’ the AI fight.
There are other reasons too, but it is good to be reminded of this one:
Teortaxes: One reason I don't want people to have to say falsehoods is that I do not believe that most people, esp. Westerners, are capable of consistently lying. Either they say what they believe – or they'll come to believe what they're saying. Cognitive dissonance is too much for them.
I would extend this to ‘and if you think you are an exception, you are probably wrong.’
Richard Ngo: Thoughts on the politics of AI safety:
Risks that seem speculative today will become common sense as AI advances.
Pros and cons of different safety strategies will also become much clearer over time.
So our main job is to empower future common-sense decision-making.
Understanding model cognition and behavior is crucial for making good decisions.
But equally important is ensuring that key institutions are able to actually process that knowledge.
Institutions can lock in arbitrarily crazy beliefs via preference falsification.
When someone contradicts the party line, even people who agree face pressure to condemn them.
We saw this with the Democrats hiding evidence of Biden’s mental decline.
It’s also a key reason why dictators can retain power even after almost nobody truly supports them.
I worry that DC has already locked in an anti-China stance, which could persist even if most individuals change their minds.
We’re also trending towards Dems and Republicans polarizing on the safety/accelerationism axis.
This polarization is hard to fight directly.
But there will be an increasing number of “holy shit” moments that serve as Schelling points to break existing consensus.
It will be very high-leverage to have common-sense bipartisan frameworks and proposals ready for those moments.
Perhaps the most crucial desideratum for these proposals is that they’re robust to the inevitable scramble for power that will follow those “holy shit” movements.
I don’t know how to achieve that, but one important factor: will AI tools and assistants help or hurt?
Eg truth-motivated AI could help break preference falsification. But conversely, centralized control of AIs used in govts could make it easier to maintain a single narrative.
This problem of “governance with AI” (as opposed to governance *of* AI) seems very important!
One bottleneck: few insiders disclose how NatSec decisions are really made (Daniel Ellsberg’s books a notable exception).
Designing principles for integrating AI into human governments feels analogous in historical scope to writing the US constitution. Let’s get it right.
Jeffrey Ladish: I agree with the rest of the thread but these first three points are the least obvious to me. I hope they are true but fear they may not be, even if 1) is true (as I’d guess it will be) But regardless I still think 3) is correct in vibe. We must empower good decision making.
David Manheim: Agree, but we need to "...empower [*rapid*] common sense decision-making". So we need rules to be moved from regulations, which is slow, and from self-governance, which is misaligned, to a more flexible structure - as we argued here.
Arun Rao (on #19-#20): This is an important point: the most important legal updates going forward will be made in constitutions and directives made to AI systems that will run and regulate vast parts of day to day human activity, not in current laws that require spotty human enforcement and prosecutorial judgement (or often abuse).
“Designing principles for integrating AI into human governments feels analogous in historical scope to writing the US constitution.”
I agree with a lot of this. I agree that setting up good ‘governance with AI’ is important. I agree that being able to respond flexibly and sensibly is very important. I agree that better AI tools would be very helpful.
I agree that building institutional knowledge and freedom of action is crucial. We need to get people who understand into the right places. And we desperately need visibility into what is happening, a key provision and aspect of the Executive Order and SB 1047.
We also need to be empowered to do the thing if and when we need to do it. It is no use to ‘empower decision making’ if all your good options are gone. Thus the value of potential hardware governance, and the danger of deployments and other actions of the kinds that cannot be taken back.
I also agree that better visibility into and communication with, and education of, NatSec types is crucial. My model of such types is that they are by their nature and worldview de facto unable to understand that threats could take any form other than a (foreign) (human) enemy. That needs to be fixed, or else we need to be ready to override it somehow.
I agree that the central failure mode here, where we get locked into a partisan battle or other form of enforced consensus, is an important failure mode to look out for. However I think it is often overemphasized, to the extent of creating paralysis or a willingness to twiddle thumbs and not prepare. As Tolkien puts it, sometimes open war is upon you whether you would risk it or not. If those who are actively against safety focus their efforts on one party, that does not mean you let them win to avoid ‘polarization.’ But I hear that suggestion sometimes.
Where I most strongly disagree are the emphasis on common sense and the assumption of ‘holy shit’ moments. These are not safe assumptions.
We have already had a number of what should have been ‘holy shit’ moments. Common sense should already apply, and indeed if you poll people it largely does, but salience of the issue remains low, and politicians so far ignore the will of the people.
The frogs are boiling remarkably rapidly in many ways. We suddenly live among wonders. ‘Very serious people’ still think AI will provide only minimal economic benefits, that it’s mostly hype, that it can all be ignored.
There are many highly plausible scenarios where, by the time AI has its truly ‘holy shit’ moment, it is too late. Perhaps the weights of a sufficiently advanced model have been irreversibly released. Perhaps we are already locked in a desperate race, with the NatSec types in charge, who consider likely doomsdays acceptable risks or feel they have no choice. Perhaps we create a superintelligence without realizing we’ve done so. Perhaps we get a sharp left turn of some kind, or true RSI or intelligence explosion, where it escalates quickly and by the time we know what is happening we are already disempowered. Perhaps the first catastrophic event is really quite bad, and cascades horribly even if it isn’t strictly existential, or perhaps we get the diamond nanomachines. Who knows.
Or we have our ‘holy shit’ moments, but the ‘common sense’ reaction is either to accelerate further, or (this is the baseline scenario) to clamp down on mundane utility of AI while accelerating frontier model development in the name of national security and innovation and so on. To get the worst of both worlds. And on top of that you have the scrambles for power.
What about common sense? I do not think we should expect ‘common sense’ decisions to get us through this except by coincidence. The ‘common sense’ reaction is likely going to be ‘shut it all down, NOW’ at some point (and the average American essentially already low-level thinks this) including the existing harmless stuff, and presumably that is not the policy response Richard has in mind. What Richard or I think is the ‘common sense’ reaction is going to go over most people’s heads, even most with power, if it has any complexity to it. When the scramble does come, if it comes in time to matter, I expect any new reactions to be blunt, and dumb, and blind. Think Covid response.
On polarization in particular, I do think it’s a miracle that we’ve managed to avoid almost all polarization for this long in the age of so much other polarization. It is pretty great. We should fight to preserve that, if we can. Up to a point.
But we can’t and shouldn’t do that to the point of paralysis, or accepting disastrous policy decisions. The rank and file of both parties remain helpful, but if JD Vance and Marc Andreessen are empowered by the Trump administration and enact what they say, they will be actively trying to get us killed, and open war will be upon us whether we would risk it or not.
It would suck, but I would not despair, aside from the actual policy impacts.
That is because if the polarization does happen, it will not be a fair fight.
If AI continues to improve and AI becomes polarized, then I expect AI to be a key issue, if not the key issue, in the 2028 elections and beyond. Salience will rise rapidly.
If that happens, here is a very clear prediction: The ‘pro-AI’ side will lose bigly.
That does not depend on whether or not the ‘pro-AI’ side is right under the circumstances, or what common sense would say. People will demand that we Do Something, both out of existential style fears and various mundane concerns. I would be stunned if a lot of the proposed actions of a possible ‘anti-AI’ platform are not deeply stupid, and do not make me wince.
One risk, if we try too hard to avoid polarization and regulatory actions are all postponed, is we create a void, which those who do not understand the situation would inevitably fill with exactly the wrong ‘anti-AI’ policies - clamping down on the good things, while failing to stop or even accelerating the real risks.
None of that takes away from the importance of figuring out how to wisely incorporate AI into human government.
Businessman Waves Flag
OpenAI CEO Sam Altman has written an op-ed in The Washington Post, “Who will control the future of AI?”
He wraps himself in the flag, says the options are Us or Them. Us are Good, you see, and Them are Bad.
So make it Us. Invest enough and I’ll make Us win. Fund ‘innovation’ and invest in our infrastructure to ensure the future belongs to ‘democracy.’
He ignores the most likely answer, which of course is: AI.
He also ignores the possibility we could work together, and not race to the finish line as quickly as possible.
Yes, there are various fig leafs thrown in, but do not kid yourself.
If the mask was ever on, it is now off.
Altman presents AI has a race between the West and Authoritarians, with the future depending on who wins, so we must win.
Sam Altman: That is the urgent question of our time. The rapid progress being made on artificial intelligence means that we face a strategic choice about what kind of world we are going to live in: Will it be one in which the United States and allied nations advance a global AI that spreads the technology’s benefits and opens access to it, or an authoritarian one, in which nations or movements that don’t share our values use AI to cement and expand their power?
There is no third option — and it’s time to decide which path to take.
That is the classic politician’s trick, the Hegelian dialectic at work. No third options.
You should see the other guy.
His failure mode if AI goes badly? Authoritarian humans in charge. They will (gasp) force us to share our data, spy on their own citizens, do cyberattacks.
I mean, yes, if they could they would totally do those things, but perhaps this is not the main thing to be worried about? Old Sam Altman used to understand there was an existential risk to humanity. That we could lose control over the future, or all end up dead. He signed a very clear open letter to that effect. It warned of ‘extinction risk.’
Sam Altman (2020): "It's so easy to get caught up in the geopolitical tensions and race that we can lose sight of this gigantic humanity-level decision that we have to make in the not too distant future."
There is a bunch of hopium in the full clip, but he asked some of the right questions. And he realized that ‘who has geopolitical power’ is not the right first question.
Indeed, he explicitly warned not to fall for that trick.
He now writes this letter instead, and does not even deem to mention such dangers. He has traded in attempts to claim that iterative deployment is a safe path to navigate existential dangers, to being a jingoist out to extract investment from the government.
That’s jumping a bit ahead. What is Altman actually proposing?
Four things.
Robust security measures to prevent theft of intellectual property. Government should partner with industry to ensure this happens.
I am in violent agreement here. We should totally do that.
Build lots of infrastructure nationwide, data centers and power plants, via government working with the private sector.
I am not unsympathetic here. Certainly this is what you would do if you were concerned with national security and economic prosperity.
He tries to talk about ‘distribute [AI’s] social benefits’ and ‘create jobs’ which are of course red herrings, or perhaps dangling of red meat.
Government does at least need to ‘get out of the way’ on permitting.
A ‘coherent commercial diplomacy policy’ for AI, including export controls and foreign investment rules. Including rules for where to store chips, training data and key code. A coalition of democratic allies in this.
Says the person who tried to build his AI infrastructure in the UAE, but yes.
‘Think creatively about new models to establish norms in developing and deploying AI, with a particular focus on safety and ensuring a role for the global south and other nations who have historically been left behind. As with other issues of global importance, that will require us to engage with China and maintain an ongoing dialogue.’ He mentions IAEA-style as one possible model, as well as ICANN, or ‘one option could knit together the network of AI safety institutes being built in countries such as Japan and Britain and create an investment fund that countries committed to abiding by democratic AI protocols could draw from to expand their domestic computer capacities.
So the plan is to ‘beat China’ for ‘control of the future’ because there is no third option, and you also get its cooperation on development and deployment of AI and maintain an ongoing dialogue.
You can either have a worldwide body in charge of AI or have ‘democracy’ in control of AI. You cannot have both, unless perhaps you want to overthrow a bunch of regimes first.
This is where he mentions ‘safety’ but in context no one would know that this is anything more than deepfakes and cyberattacks. He suggests transformation os AI safety institutes, that are for ensuring existential safety, into tools for aiding AI adoption and expanding capabilities.
He says ‘while minimizing [AI’s] risks’ as a goal but again he does not say what risks they are. Anyone reading this would think that he is talking about either ordinary, mundane risks, or more likely the risk of the bad authoritarian with an AI.
None of these core proposals are, in their actual contents, unreasonable, aside from the rather brazen proposal to transform the safety institutes into capability centers.
The attitude and outlook, however, are utterly doomed. Sam Altman used to at least pretend to be better than this.
Now he’s done with all that. A lot of masks have come off recently.
Vibe shift, indeed.
Meanwhile, as Sam Altman goes full jingoist and stops talking about existential risk at all, we have Ted Cruz joining JD Vance in claiming that all this talk of existential risk is a conspiracy by Big Tech to do regulatory capture, while those same companies fight against SB 1047.
Must be nice to use claims of a Big Tech conspiracy to defend the interests of Big Tech. It’s actually fridge brilliance, intentionally or otherwise. You raise the alarm just enough as a de facto false flag to discredit anyone else raising it, while working behind the scenes, and now out in the open, to say that everything is fine.
Businessman Pledges Safety Efforts
Sam Altman also made an announcement on safety, with three core claims.
Sam Altman: A few quick updates about safety at OpenAI:
As we said last July, we’re committed to allocating at least 20% of the computing resources to safety efforts across the entire company.
I notice that the original commitment said it was ‘to superalignment’ and now it is ‘to safety efforts’ which includes mundane safety, such as provisions of GPT-4o. That is a very different commitment, that you are pretending is remotely similar.
I notice that this does not say that you actually allocated that compute to safety efforts.
I notice you certainly aren’t providing any evidence that the allocations happened.
I notice that we have many reports that the former superalignment team was denied compute resources time and again, given nothing remotely like what a 20% commitment implied. Other things proved more important to you. This drove your top safety people out of the company. Others were fired on clear pretexts.
Kelsey Piper: Sam Altman indirectly answers allegations that the Superalignment team was starved of the compute they had been promised, saying he didn't break the promise as it was about allocating 20% to safety efforts across the company.
The initial wording of the promise was "To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort." So, uh, I see why that team was confused!
Honestly I think what everyone thinks is going on here is that after the board fired Sam, Sam was angry at Ilya and his team got starved out. I don't really think this statement is going to change anyone's minds.
I do think the work with the US AI Safety Institute is a good sign, though, and I'm also glad he apologized again for the nondisparagement agreements fiasco.
So, yeah. That is not what you promised.
If you’re saying trust us, we will do it later? We absolutely do not trust you on this.
On the plus side, I notice that the previous commitment could have reasonably been interpreted as 20% of ‘secured to date’ compute, meaning compute OpenAI had access to at the time of the commitment last July. This is worded strangely (it’s Twitter) but seems to strongly imply that no, this is 20% of total compute spend. As they say, huge if true.
Our team has been working with the US AI Safety Institute on an agreement where we would provide early access to our next foundation model so that we can work together to push forward the science of AI evaluations. Excited for this!
I notice that GPT-4o went through very little safety testing, it was jailbroken on the spot, and there were reports its safety measures were much weaker than typical. One could reasonably argue that this was fine, because its abilities were obviously not dangerous given what we know about GPT-4, but it did not seem like it was handled the way the Preparedness Framework indicated or in a way that inspired confidence. And we presume it was not shared with either the US AI Safety Institute, or the UK AI Safety Institute.
I notice there is no mention here of sharing models with the UK AI Safety Institute, despite Google and Anthropic having arranged to do this.
It is excellent news that OpenAI is ‘working on an agreement’ to provide early access to at least the US AI Safety Institute. But until there is a commitment or agreement, that is cheap talk.
Finally, we want current and former employees to be able to raise concerns and feel comfortable doing so. This is crucial for any company, but for us especially and an important part of our safety plan. In May, we voided non-disparagement terms for current and former employees and provisions that gave OpenAI the right (although it was never used) to cancel vested equity. We’ve worked hard to make it right.
Jeffrey Ladish: These all seem like good things. Glad to see Sam reaffirm the 20% commitment, give early model access to the US AISI, and affirm the norm for current and former employees to speak up re issues.
I want to know details about the 20% claim given what former safety team members have said about it. Currently I don’t think Sam affirming that means that much without more detail. But it does seem like OpenAI leadership has actually released many people from NDAs which is good.
I’ve reported on the NDA and non-disparagement situation extensively, and Kelsey Piper has offered extensive and excellent primary reporting. It seems fair to say that OpenAI ‘worked hard to make it right’ once the news broke and they faced a lot of public pressure and presumably employee alarm and pushback as well.
It is good that they as Churchill said did the right thing once they exhausted all alternatives, although they could do more. Much better than continuing to do this. But even if you buy the (to put it politely) unlikely story that leadership including Altman did not knowingly authorize this and had for a long time no idea this was happening, they had months in which Kelsey Piper told them exactly what was happening, and they waited until the story broke to start to fix it.
It is also good that they have acknowledged that they put into contracts, and had, the ‘right’ to cancel vested equity. And that they agree that it wasn’t right. Thank you. Note that he didn’t say here they no longer can cancel equity, only that they voided certain provisions.
What about the firing of Leopold Ashenbrenner, for raising security concerns? Are we going to make that right, somehow? If we don’t, how will others feel comfortable raising security concerns? The response to the Right to Warn letter was crickets.
Given the history of what has happened at OpenAI, they need to do a lot better if they want to convince us that OpenAI wants everyone to feel comfortable raising concerns.
And of course, Sam Altman, there is that op-ed you just wrote this week for the Washington Post, covered in the previous section. If you are committed to safety, if you even still know the things you previously have said about the dangers we are to face, what the hell? Why would you use that kind of rhetoric?
Aligning a Smarter Than Human Intelligence is Difficult
Thomas Kwa offers notes from ICML 2024 in Vienna. Lots of progress, many good papers, also many not good papers. Neel Nanda notes many good papers got rejected, because peer review does a bad filtering job.
Aligning a Dumber Than Human Intelligence is Also Difficult
Seth Blincoe: I summoned my Model Y to back into a parking space and it couldn’t detect the garage above… Oops.. @elonmusk
Dee Bunked: …could YOU not detect it either? Because I’m positive your attention is still required when using any of these auto features.
Seth Blincoe: Yee. I hit the button on accident while I was away..
Yishan: Normal people are always a bit too willing to believe that a computer can do the job correctly. Never never never trust automation unless you have personally calibrated it yourself in every edge case carefully multiple times.
Simeon: There's a lot of alpha in acknowledging that ChatGPT/Claude will often be more right than you actually.
So often problem exists between keyboard and chair, here with a remote and a car.
Have you ever trusted a human without personally calibrating them yourself in every edge case carefully multiple times?
That question is close but not quite identical to, ‘have you ever trusted a human’?
Humans are highly unreliable tools. There are tons of situations, and self-driving cars are one of them, where if the AI or other computer program was as reliable as the average human who actually does the job, we would throw the computer out a window.
Another great example is eyewitness testimony. We would never allow any forensic evidence or AI program anywhere near anything that mattered with that kind of failure rate, even if you only count witnesses who are trying to be honest.
As a practical matter, we should often be thankful that we are far less trusting of computers and AIs relative to similarly reliable humans. It leads to obvious losses, but the alternative would be worse.
Even with this bias, we are going to increasingly see humans trust AIs to make decisions and provide information. And then sometimes the AI is going to mess up, and the human is going to not check, and the Yishans of the world will say that was stupid of you.
But that AI was still, in many cases, far more reliable in that spot than would have been a human, or even yourself. So those who act like Yishan advises here will incur increasing costs and inefficiencies, and will lose out.
Indeed, Simeon points us down a dangerous road, but locally he is already often right, and over time will be increasingly more right. As in, for any individual thing, the AI will often be more trustworthy and likely to be right than you are. However, if you go down the path of increasingly substituting its judgment for your own, if you start outsourcing your thinking and taking yourself out of the loop, that ends badly.
When the mistakes start to have much higher consequences, or threaten to compound and interact and spiral out of control? And things go horribly, potentially catastrophically or existentially wrong? It is going to be tricky to not have people ask for this, quite intentionally. With their eyes open.
Because yeah, the AI has some issues and isn’t fully reliable or under your control.
But have you seen the other guy?
Other People Are Not As Worried About AI Killing Everyone
Arvind Narayanan and Sayash Kapoor warn that probability estimates of existential risk are unreliable, so you shouldn’t ‘take them too seriously.’ Well, sure.
You should take them exactly the right amount of seriously, as useful ways to discuss questions that are highly imprecise. I mostly only have people in about five buckets, which are something like:
Absurdly low numbers that don’t make sense, pure denial, Obvious Nonsense.
Unreasonably low numbers (<10%) that I believe show failure to appreciate the problem, and are low enough that they plausibly justify being cavalier about it all and focusing on other problems.
The Leike Zone (10%-90%), where mostly the wise responses don’t much change.
Very high numbers (>90%) that imply that we need to embrace high variance.
Unreasonably high numbers (>98%?) that should radically change your approach.
That does not make them, as this post puts it, ‘feelings dressed up as numbers.’
They are perspectives on the problem best communicated by approximate numbers.
I don’t think this is mostly what is going on, but in cases where it is (that mostly are not about AI at all, let alone doom, I’d bite the full bullet anyway: Even when what you have is mostly a feeling, how better to express it than a number?
In this case, a lot of people have put a lot of thought into the question.
A lot of this was essentially an argument against the ability to assert probabilities of future events at all unless you had lots of analogous past events.
They even say ‘forecast skill can’t be measured’ which is of course absurd, track records are totally a thing. It is an especially remarkable claim when they specifically cite the predictions of ‘superforecasters’ as evidence.
I was disappointed but not surprised - because calibration and forecasting skill can indeed be improved over time - by such general arguments against knowing probabilistic things, and to see Pascal’s Wager invoked where it does not belong. Alas, the traditional next thing to say is ‘so we should effectively treat the risk as 0%’ which for obvious reasons does not follow.
I was similarly disappointed but not surprised by those who praised the post.
Here are some others explaining some of the errors.
Nathan: I'll probably respond to the forecasting stuff tomorrow, but some general notes.
I struggle to believe this article is a good faith attempt, given that it is published in "AI Snake Oil"
The argument seems to be "We can't forecast AI so we shouldn't slow down"
Seems like a bad argument.
It is inconsistent to write an entire article about how x-risk can't be forecasted and then say that policies they don't like will likely increase x-risk.
How do they know? Divine revelation?
Forecasting skill can be measured for AI. There are lots of AI questions.
Saying that x-risk is a class that can't be forecasted means we can't forecast nuclear armageddon or extinction from pandemics. I don't buy it.
The article keeps referencing alien invasion as an example, but the absurdity is doing the heavy lifting.
If we had aliens on Earth rapidly increasing in capability and 1000s of alien researchers were concerned, I'd be concerned too.
Several people I respect have said they like this article, but I don't see it.
I am tired of being expected to respond in good faith to people who give me none and lecture me about forecasting despite making obvious errors.
If you liked this article, why?
Also the forecasters referenced have the lowest P(doom)s of all the groups in the article and have low forecasts of risk in general (thought still an unacceptably high level)
Neel Nanda: Yeah, it seemed very low quality discourse to me. "This is hard to predict and complicated. Therefore we should extremely confidently assume that I am right and lots of smart people are wrong"
David Manheim: The writeup makes a number of good points about the limits of forecasts, then does the indefensible, concluding, subjectively, based on ignoring the arguments and far less evidence than the forecasts they critiqued, what they started out wanting to conclude.
To be clear - I have a lot of respect for @random_walker.
But here, specifically, it looks like he's ignoring all of the decades of work from public policy on methodologies like robust decision making or assumptions based planning to deal with deeply uncertain questions.
He's also unreasonably dismissive (or ignorant) of the research into understanding subjecting forecasting, stating that "forecast skill cannot be measured when it comes to unique or rare events" - a concern which is clearly addressed in the literature on the topic.
…
The paragraph before that one, he says "That said, we think forecasts should be accompanied by a clear explanation of the process used and evidence considered." What does he think has been happening for the past 5+ years if not that?
Jaime Sevilla: I agree something like that is what has happened, and this is a big reason why government policies have been measured and reasonable so far.
David Manheim: ...but he spent pages saying it's not reliable in this special, unique, one time case, (a claim which is almost always made,) and so must be ignored instead of treated as a useful input, and instead policies must always be positive in what he claims are the most probable worlds.
The Lighter Side
Anthropic wants your business.
I do think the aesthetics are solid.
The problem with the Claude advertisements is that no one knows what Claude or Anthropic is, and these ads do not tell you the answer to that. It is a strange plan.
Robin Hanson: All the AI commercials in the Olympics broadcast reminds me of all those crypto commercials during the Super Bowl. Neither actually make me feel much confidence in them.
I think that's mostly just how advertising works? Here there seems a genuine need. Anyone watching the Olympics likely has little idea what AI can do and is missing out on much mundane utility. Whereas with crypto, there were… other dynamics.
So yes, it is what would sometimes happen in a bubble, but it is also what would happen if AI was amazingly great. What would it say if AI companies weren’t advertising?
> A similar phenomenon that has existed for a long time: Pandora stations, in my experience, reliably collapse in usefulness if you rate too many songs. You want to offer a little guidance, and then stop.
I ran into this with Spotify when I made a bunch of playlists for D&D, which then turned all of my Spotify-generated playlists into instrumental background music, rather than what I normally listen to outside of D&D.
It took me a while to notice it, but one of the options on a playlist in Spotify is “exclude from your taste profile.” Once I did that with most of my d&d playlists, my recommended songs went back to normal.
Two takeaways for me:
1) Spotify seems to weight “added to a playlist” significantly higher than “listened to multiple times” - so if you’re making playlists for specific things or occasions, it’s worth excluding them
2) I want buttons like that in more apps
You bring up superforecasters and mention that forecasting is real and being well calibrated is a thing (agree) and track records are important (agree), but as you’ve discussed previously the majority of well-calibrated superforecasters with good track records are deep within your Obvious Nonsense bracket on AI risk.
I think we both agree these people are too low for various reasons and ultimately not well calibrated on the question, but it does move the needle down for me and I think it makes the track records weak evidence for forecasting AI risk being more grounded.