15 Comments

As I'm reading through this, I would suggest a 5th level above Critical, such as "Emergency" or "Fatal".

Suggested remediations at that stage include options like:

- Deleting the Model Weights

- Alerting the National Guard to be on the watch for potential further issues

- Starting a fire in the server room before the model escapes.

It might sound like a joke, but keeping a can of flammable oil in the server room with "Break glass in case of rampancy" seems like something we should honestly consider making a standard expectation as we get to more capable AIs.

Expand full comment

"Starting a fire in the server room" is a bit harder than it sounds considering how distributed and out of OpenAI's direct hands their servers are. Besides, if we get to that point, it's likely the AI got copies of itself outside its original servers, right?

Expand full comment

No need to start a fire, just mount powerful electromagnets inside your server racks, with a "nuke" button available to the OpenAI board.

Expand full comment

There are world experts on "how to reliably and rapidly destroy a server when, e.g., your military base is suddenly overrun" and the US government is likely willing to let OpenAI borrow them.

Expand full comment

"I say we take off and nuke the entire site from orbit. It's the only way to be sure."

Expand full comment

This whole system of having three safety teams, safety, preparedness, and superalignment, that’s the sort of structure that won’t be stable as a company grows. Leadership will just be constantly reviewing which of these teams is effective, and as the company grows in headcount it’s very likely that some reorg is needed. So this whole document, to me, smacks of “we will redo this whole thing next year when a different person is in charge. And it will all be redone at any point if Sam changes his mind about it.”

Which isn’t necessarily a bad thing. It’s just, you can fundamentally only plan for so much of the future when your company is growing this fast.

Expand full comment

I don't entirely disagree. But also it is a useful thing to say to one's own culture: "we think there are probably 3 things we need to get right here to avoid killing us all, watch very carefully if one of the three goes away." And particularly as value of compartmentalization goes up, you're going to need to de facto do this anyways, so might as well start doing so formally now.

Expand full comment

That would be a great thing to say! I don't think that's what this document will do, though. To say three things to a company of hundreds of people, you need to list those three things in large font and repeat them over and over. If you release a 25 page "living document" that is dependent on your current org chart, it's going to be partly ignored, partly rendered irrelevant by later institutional changes.

Expand full comment

I don't disagree that there are multiple things you should do here. I'm saying that the _current org chart_ is somewhat conveying that message, whether the C-suite of OpenAI wants it to or not

Expand full comment

I agree with functionally all of this, and would like once again to put into the world the idea that a meaningful "carrot" governments can offer frontier labs is, "we will literally send our world experts on how to build airgapped sytems to help you airgap your systems." The US government should basically have an unlimited travel budget for its SCIF experts to go to SF for the next decade.

Expand full comment

Another carrot, if Makin’s point above about distributed servers outside direct control of labs is true, might be “we will pay for in-house servers under your direct control, IF you agree to the following protocols: [insert clever Big Red Button Idea here]”.

Expand full comment

I was thinking that the data accessibility for different threat levels was not addressed in any adequate way. Such as a model hitting a "critical" threshold - who will be able to access the live data, the live model once the assessment comes up, and then the post-mortem data if such a model is deleted (if they even plan on deleting weights at all...). Seems like an important bit to be clear on in advance.

Expand full comment

There is something deeply weird, to me, about an organisation working tirelessly to elicit capabilities that it then vows to delete. Plus, given G in the stated goal of AGI, it just seems... odd? ... to think you can make something with general capabilities that enable superhuman persuasion, then train it to not do that, at all, ever. As a non-technical, recently interested bystander to it all, I can’t help wondering why not just go for something or a set of somethings much narrower, along the lines of AlphaFold, that could reap enormous benefits without the apparently obvious insane risks (not to mention the queasy ethical aspect to performing selective lobotomies on an AGI worthy of the name). What is the huge differential upside to General over Narrow?

Expand full comment

OpenAI have written a white paper on the topic of "agenticness" and agentic AI systems, where they describe why such a property is actually quite desirable if we humans want tools that would take care of things for us with minimal input from our side. https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf

Expand full comment

See, I think the “minimum input” part is a (pretty huge and intractable) bug, not a feature.

Expand full comment