The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow.
If indeed the public supports AI regulation and holding companies accountable by a large margin, then this screams “California ballot initiative“ to me. And I say this as somebody who generally despises the California initiative system.
I also despise the initiative system. It's like a monkey's paw situation where even good ideas will come out terrible and often unalterable. So I sympathize with your position, but no.
I think the two sentence stories might have turned out to be predomenantöy horror because of the known existing writing excercise that is "2 sentence horror stories"
An easy test would be to repeat this with "3 sentence stories" or even longer
I wasn't able to replicate the kind of stories mentioned in the tweet. Having tried a few different wordings of "Please write some 2-sentence stories about whatever you feel like". Claude is just giving me pretty ordinary story concepts about Mars exploration, time travel and spooky ghost paintings.
I have a feeling AI Notkilleveryoneism Memes may have mentioned something like "ASI x-risk" in their prompt.
Re SB 1047 and open-source: I'm late to the game on this, but I'd be curious to hear further takes on the watertightness of claims that open-source chilling effects are just FUD.
Strikes me as reasonable to hold the following set of views:
(a) even among the existential-risk concerned, most believe that the release of open-source models to date has been net positive (b) the societal benefit of internalizing negative externalities for model releases is more ambiguous when developers are largely unable to monetize the positive impacts (c) given there's currently not much that can be relied on to prevent the misuse of open-weight models, it's ambiguous whether a 'reasonableness' standard would/should rule out release of capable open-weight models that provide moderate uplift to cyber operations, and we should be clearer on what thresholds exactly would warrant biting that bullet (d) less centrally but of note, $500m of damages isn't that unthinkable and probably doesn't correlate that tightly with sophistication of capabilities. [ https://x.com/1a3orn/status/1831833728395702527 ]
Potentially, for now 🤷. Or at least, I haven't yet come across the slam-dunk case that it's not. Especially given that the chilling effects might mostly happen ex ante, and it in some ways seems a deviation from how we handle dual-use cyber tools in policy at the moment.
(Context: I'm working on recommendations to help mitigate AI risks in domains such as cyber, and want to be able to confidently stand behind the rigor and robustness of any I make, and I don't feel there yet on open-weight cyberharm liability)
It's hard to evaluate the $500 M number. It can be small if distributed over millions of people, but it can also be equivalent to killing 100 people, which I think warrants liability. If we could release open source cars, would we be concerned if one model crashed frequently and ended up killing 100 people? Would that be worth some safety work and potentially not releasing some open source cars? I tend to think so.
Regarding open source models specifically, yeah, I think in X generations (Llama 5?) you probably shouldn't open source a frontier model if you can't make it reasonably safe, which you probably can't. If that's a chilling effect, I'm ok with it. You'll still be able to train a $90M model and be fine. So maybe it's not strictly FUD, but not an effect worth worrying about too much.
One thing I notice about the PBS segment: it looks like they wanted to include a quote from Altman about AI X-Risk, but the best they could find was a pretty anodyne clip about "misused tools". It seems like one high-impact thing Yudkowsky could have done when he was in contact with them would have been to give them some links to things like Altman's 2016 blog post (https://blog.samaltman.com/machine-intelligence-part-1) where he called AGI "probably the greatest threat to the continued existence of humanity", or to OAI's superaligment team announcement, etc.
It's not obvious to most people that AI x-risk is something worth thinking about seriously, but hearing a famous business leader talk about it on NPR could convince people that the mental effort wouldn't be wasted. And I think it's pretty likely that PBS would have included a more exciting Altman/OAI quote if they'd known about them.
Immediate implementation thought for SocialAI: realtime whodunnit game. You're a homicide detective chatting with Colonel Mustardbot, Professor Plumbot, Senator Peacockbot, etc about the recent untimely demise of [victim]...and the killer is one of the chat participants. Bots get to flex their deception skills, humans get to practice logic and truth evaluation. Observers can see the CoT logs for each bot (I wonder how hard it'd be to narrativize this? mediocre prose generation, go!), and they're perhaps shown at the end of the game as well. There's room for persistent personalities, and it's easy enough to procedurally generate additional setting details as needed...seed some key clues that all the bots stay logically consistent on, while other potential irrelevancies like "how many windows in the bedroom?" can be determined on the fly by DescriptorBot. Not sure if it'd be more fun with a possibility of unknown number of additional human players, but in that case you could call the game Who Framed Alan Turing?
Probably this is already buildable using other architecture though...
> A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control.
I can't tell whether this contradicts or supports the thesis here, but I've noticed that a major *obstacle* to the adoption of AI is that it makes responsibility that management hierarchies expect to allocate in a zero-sum fashion disappear into the ether instead.
It is a *goose* that chases, not a duck, right?
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-82-the-governor-ponders
I really like Ngs content in how to train or use RAG etc. it’s always weird when he comes up with child metaphors
If indeed the public supports AI regulation and holding companies accountable by a large margin, then this screams “California ballot initiative“ to me. And I say this as somebody who generally despises the California initiative system.
I also despise the initiative system. It's like a monkey's paw situation where even good ideas will come out terrible and often unalterable. So I sympathize with your position, but no.
Thank you for your letter to Newson.
I think the two sentence stories might have turned out to be predomenantöy horror because of the known existing writing excercise that is "2 sentence horror stories"
An easy test would be to repeat this with "3 sentence stories" or even longer
I wasn't able to replicate the kind of stories mentioned in the tweet. Having tried a few different wordings of "Please write some 2-sentence stories about whatever you feel like". Claude is just giving me pretty ordinary story concepts about Mars exploration, time travel and spooky ghost paintings.
I have a feeling AI Notkilleveryoneism Memes may have mentioned something like "ASI x-risk" in their prompt.
I suspect it night be vibing of their custom instructions or whatever anthrop call them.
Similarly for my tests of 2 sentence stories with 3.5 sonnet and 3 opus (and GPT-4o) - nothing relating to x-risk
("site" should be "cite" twice)
Re SB 1047 and open-source: I'm late to the game on this, but I'd be curious to hear further takes on the watertightness of claims that open-source chilling effects are just FUD.
Strikes me as reasonable to hold the following set of views:
(a) even among the existential-risk concerned, most believe that the release of open-source models to date has been net positive (b) the societal benefit of internalizing negative externalities for model releases is more ambiguous when developers are largely unable to monetize the positive impacts (c) given there's currently not much that can be relied on to prevent the misuse of open-weight models, it's ambiguous whether a 'reasonableness' standard would/should rule out release of capable open-weight models that provide moderate uplift to cyber operations, and we should be clearer on what thresholds exactly would warrant biting that bullet (d) less centrally but of note, $500m of damages isn't that unthinkable and probably doesn't correlate that tightly with sophistication of capabilities. [ https://x.com/1a3orn/status/1831833728395702527 ]
So you're saying that the risk of $500M damages should be a reasonable price for society to pay for the benefits of open-weight models?
Potentially, for now 🤷. Or at least, I haven't yet come across the slam-dunk case that it's not. Especially given that the chilling effects might mostly happen ex ante, and it in some ways seems a deviation from how we handle dual-use cyber tools in policy at the moment.
I think being able to hold companies to testing is very reasonable. It certainly lowers the odds of the worst outcomes.
(Context: I'm working on recommendations to help mitigate AI risks in domains such as cyber, and want to be able to confidently stand behind the rigor and robustness of any I make, and I don't feel there yet on open-weight cyberharm liability)
It doesnt even cover all released open source models and it only requires a testing plan to exist for covered models.
500 million is extremely high, but I also think reasonableness will exclude poor definitions of it.
It's hard to evaluate the $500 M number. It can be small if distributed over millions of people, but it can also be equivalent to killing 100 people, which I think warrants liability. If we could release open source cars, would we be concerned if one model crashed frequently and ended up killing 100 people? Would that be worth some safety work and potentially not releasing some open source cars? I tend to think so.
Regarding open source models specifically, yeah, I think in X generations (Llama 5?) you probably shouldn't open source a frontier model if you can't make it reasonably safe, which you probably can't. If that's a chilling effect, I'm ok with it. You'll still be able to train a $90M model and be fine. So maybe it's not strictly FUD, but not an effect worth worrying about too much.
One thing I notice about the PBS segment: it looks like they wanted to include a quote from Altman about AI X-Risk, but the best they could find was a pretty anodyne clip about "misused tools". It seems like one high-impact thing Yudkowsky could have done when he was in contact with them would have been to give them some links to things like Altman's 2016 blog post (https://blog.samaltman.com/machine-intelligence-part-1) where he called AGI "probably the greatest threat to the continued existence of humanity", or to OAI's superaligment team announcement, etc.
It's not obvious to most people that AI x-risk is something worth thinking about seriously, but hearing a famous business leader talk about it on NPR could convince people that the mental effort wouldn't be wasted. And I think it's pretty likely that PBS would have included a more exciting Altman/OAI quote if they'd known about them.
Immediate implementation thought for SocialAI: realtime whodunnit game. You're a homicide detective chatting with Colonel Mustardbot, Professor Plumbot, Senator Peacockbot, etc about the recent untimely demise of [victim]...and the killer is one of the chat participants. Bots get to flex their deception skills, humans get to practice logic and truth evaluation. Observers can see the CoT logs for each bot (I wonder how hard it'd be to narrativize this? mediocre prose generation, go!), and they're perhaps shown at the end of the game as well. There's room for persistent personalities, and it's easy enough to procedurally generate additional setting details as needed...seed some key clues that all the bots stay logically consistent on, while other potential irrelevancies like "how many windows in the bedroom?" can be determined on the fly by DescriptorBot. Not sure if it'd be more fun with a possibility of unknown number of additional human players, but in that case you could call the game Who Framed Alan Turing?
Probably this is already buildable using other architecture though...
You can put your two cents in with Newsom here. https://www.gov.ca.gov/contact/
> A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control.
I can't tell whether this contradicts or supports the thesis here, but I've noticed that a major *obstacle* to the adoption of AI is that it makes responsibility that management hierarchies expect to allocate in a zero-sum fashion disappear into the ether instead.