22 Comments

It is a *goose* that chases, not a duck, right?

Expand full comment

I really like Ngs content in how to train or use RAG etc. it’s always weird when he comes up with child metaphors

Expand full comment

If indeed the public supports AI regulation and holding companies accountable by a large margin, then this screams “California ballot initiative“ to me. And I say this as somebody who generally despises the California initiative system.

Expand full comment

I also despise the initiative system. It's like a monkey's paw situation where even good ideas will come out terrible and often unalterable. So I sympathize with your position, but no.

Expand full comment

Thank you for your letter to Newson.

Expand full comment

I think the two sentence stories might have turned out to be predomenantöy horror because of the known existing writing excercise that is "2 sentence horror stories"

An easy test would be to repeat this with "3 sentence stories" or even longer

Expand full comment

I wasn't able to replicate the kind of stories mentioned in the tweet. Having tried a few different wordings of "Please write some 2-sentence stories about whatever you feel like". Claude is just giving me pretty ordinary story concepts about Mars exploration, time travel and spooky ghost paintings.

I have a feeling AI Notkilleveryoneism Memes may have mentioned something like "ASI x-risk" in their prompt.

Expand full comment

I suspect it night be vibing of their custom instructions or whatever anthrop call them.

Expand full comment

Similarly for my tests of 2 sentence stories with 3.5 sonnet and 3 opus (and GPT-4o) - nothing relating to x-risk

Expand full comment

("site" should be "cite" twice)

Expand full comment
Sep 19·edited Sep 19

Re SB 1047 and open-source: I'm late to the game on this, but I'd be curious to hear further takes on the watertightness of claims that open-source chilling effects are just FUD.

Strikes me as reasonable to hold the following set of views:

(a) even among the existential-risk concerned, most believe that the release of open-source models to date has been net positive (b) the societal benefit of internalizing negative externalities for model releases is more ambiguous when developers are largely unable to monetize the positive impacts (c) given there's currently not much that can be relied on to prevent the misuse of open-weight models, it's ambiguous whether a 'reasonableness' standard would/should rule out release of capable open-weight models that provide moderate uplift to cyber operations, and we should be clearer on what thresholds exactly would warrant biting that bullet (d) less centrally but of note, $500m of damages isn't that unthinkable and probably doesn't correlate that tightly with sophistication of capabilities. [ https://x.com/1a3orn/status/1831833728395702527 ]

Expand full comment

So you're saying that the risk of $500M damages should be a reasonable price for society to pay for the benefits of open-weight models?

Expand full comment

Potentially, for now 🤷. Or at least, I haven't yet come across the slam-dunk case that it's not. Especially given that the chilling effects might mostly happen ex ante, and it in some ways seems a deviation from how we handle dual-use cyber tools in policy at the moment.

Expand full comment

I think being able to hold companies to testing is very reasonable. It certainly lowers the odds of the worst outcomes.

Expand full comment
Sep 19·edited Sep 19

(Context: I'm working on recommendations to help mitigate AI risks in domains such as cyber, and want to be able to confidently stand behind the rigor and robustness of any I make, and I don't feel there yet on open-weight cyberharm liability)

Expand full comment

It doesnt even cover all released open source models and it only requires a testing plan to exist for covered models.

500 million is extremely high, but I also think reasonableness will exclude poor definitions of it.

Expand full comment

It's hard to evaluate the $500 M number. It can be small if distributed over millions of people, but it can also be equivalent to killing 100 people, which I think warrants liability. If we could release open source cars, would we be concerned if one model crashed frequently and ended up killing 100 people? Would that be worth some safety work and potentially not releasing some open source cars? I tend to think so.

Regarding open source models specifically, yeah, I think in X generations (Llama 5?) you probably shouldn't open source a frontier model if you can't make it reasonably safe, which you probably can't. If that's a chilling effect, I'm ok with it. You'll still be able to train a $90M model and be fine. So maybe it's not strictly FUD, but not an effect worth worrying about too much.

Expand full comment
Sep 19·edited Sep 19

One thing I notice about the PBS segment: it looks like they wanted to include a quote from Altman about AI X-Risk, but the best they could find was a pretty anodyne clip about "misused tools". It seems like one high-impact thing Yudkowsky could have done when he was in contact with them would have been to give them some links to things like Altman's 2016 blog post (https://blog.samaltman.com/machine-intelligence-part-1) where he called AGI "probably the greatest threat to the continued existence of humanity", or to OAI's superaligment team announcement, etc.

It's not obvious to most people that AI x-risk is something worth thinking about seriously, but hearing a famous business leader talk about it on NPR could convince people that the mental effort wouldn't be wasted. And I think it's pretty likely that PBS would have included a more exciting Altman/OAI quote if they'd known about them.

Expand full comment

Immediate implementation thought for SocialAI: realtime whodunnit game. You're a homicide detective chatting with Colonel Mustardbot, Professor Plumbot, Senator Peacockbot, etc about the recent untimely demise of [victim]...and the killer is one of the chat participants. Bots get to flex their deception skills, humans get to practice logic and truth evaluation. Observers can see the CoT logs for each bot (I wonder how hard it'd be to narrativize this? mediocre prose generation, go!), and they're perhaps shown at the end of the game as well. There's room for persistent personalities, and it's easy enough to procedurally generate additional setting details as needed...seed some key clues that all the bots stay logically consistent on, while other potential irrelevancies like "how many windows in the bedroom?" can be determined on the fly by DescriptorBot. Not sure if it'd be more fun with a possibility of unknown number of additional human players, but in that case you could call the game Who Framed Alan Turing?

Probably this is already buildable using other architecture though...

Expand full comment

You can put your two cents in with Newsom here. https://www.gov.ca.gov/contact/

Expand full comment

> A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control.

I can't tell whether this contradicts or supports the thesis here, but I've noticed that a major *obstacle* to the adoption of AI is that it makes responsibility that management hierarchies expect to allocate in a zero-sum fashion disappear into the ether instead.

Expand full comment