On DeepMind's Frontier Safety Framework

Jun 18, 2024

On DeepMind’s Frontier Safety Framework

5 Comments

The security mitigation and capabilities threshold tables are messed up, security mitigation is missing the first four levels, capabilities threshold is missing the first big chunk, looks like the formatting may have cut off everything before the page breaks in the Framework doc

Expand full comment

Reply (1)

valencia_o

Jun 18

Otherwise looks good, I always appreciate the summaries. It's really helpful when you say whether or not something is worth reading in full.

Expand full comment

Reply (1)

Zvi Mowshowitz

Jun 18

Yep, thanks, it was a pasting issue (I was editing in Google Docs and they failed to copy back). Should be fixed now.

Expand full comment

Askwho Casts AI

Jun 18

Podcast episode for this post:

https://askwhocastsai.substack.com/p/on-deepminds-frontier-safety-framework

Expand full comment

radio

Jun 18

I think the Mary Phuong talk at EAG London 2024 (https://youtube.com/watch?v=ZTmRT2Hg1oM) can add some detail on their method and considerations. In particular, they are exploring persuasion and misalignment as well, but hadn't got far enough with those to include in the first publication (5:15-6:10, more on persuasion at 13:05-18:00). The audience questions also bring up where the comparative advantage of Deepmind is for evals (23:15-25:00) and missing threat models (26:05-26:45).

Expand full comment

Don't Worry About the Vase

On DeepMind's Frontier Safety Framework