15 Comments

As I'm reading through this, I would suggest a 5th level above Critical, such as "Emergency" or "Fatal".

Suggested remediations at that stage include options like:

- Deleting the Model Weights

- Alerting the National Guard to be on the watch for potential further issues

- Starting a fire in the server room before the model escapes.

It might sound like a joke, but keeping a can of flammable oil in the server room with "Break glass in case of rampancy" seems like something we should honestly consider making a standard expectation as we get to more capable AIs.

Expand full comment

This whole system of having three safety teams, safety, preparedness, and superalignment, that’s the sort of structure that won’t be stable as a company grows. Leadership will just be constantly reviewing which of these teams is effective, and as the company grows in headcount it’s very likely that some reorg is needed. So this whole document, to me, smacks of “we will redo this whole thing next year when a different person is in charge. And it will all be redone at any point if Sam changes his mind about it.”

Which isn’t necessarily a bad thing. It’s just, you can fundamentally only plan for so much of the future when your company is growing this fast.

Expand full comment

I agree with functionally all of this, and would like once again to put into the world the idea that a meaningful "carrot" governments can offer frontier labs is, "we will literally send our world experts on how to build airgapped sytems to help you airgap your systems." The US government should basically have an unlimited travel budget for its SCIF experts to go to SF for the next decade.

Expand full comment

I was thinking that the data accessibility for different threat levels was not addressed in any adequate way. Such as a model hitting a "critical" threshold - who will be able to access the live data, the live model once the assessment comes up, and then the post-mortem data if such a model is deleted (if they even plan on deleting weights at all...). Seems like an important bit to be clear on in advance.

Expand full comment

There is something deeply weird, to me, about an organisation working tirelessly to elicit capabilities that it then vows to delete. Plus, given G in the stated goal of AGI, it just seems... odd? ... to think you can make something with general capabilities that enable superhuman persuasion, then train it to not do that, at all, ever. As a non-technical, recently interested bystander to it all, I can’t help wondering why not just go for something or a set of somethings much narrower, along the lines of AlphaFold, that could reap enormous benefits without the apparently obvious insane risks (not to mention the queasy ethical aspect to performing selective lobotomies on an AGI worthy of the name). What is the huge differential upside to General over Narrow?

Expand full comment