On DeepMind’s Frontier Safety Framework Previously: On OpenAI’s Preparedness Framework, On RSPs. The First Two Frameworks To first update on Anthropic and OpenAI’s situation here: Anthropic’s RSP continues to miss the definitions of the all-important later levels, in addition to other issues, although it is otherwise promising. It has now been a number of months, and it is starting to be concerning that nothing has changed. They are due for an update.
The security mitigation and capabilities threshold tables are messed up, security mitigation is missing the first four levels, capabilities threshold is missing the first big chunk, looks like the formatting may have cut off everything before the page breaks in the Framework doc
I think the Mary Phuong talk at EAG London 2024 (https://youtube.com/watch?v=ZTmRT2Hg1oM) can add some detail on their method and considerations. In particular, they are exploring persuasion and misalignment as well, but hadn't got far enough with those to include in the first publication (5:15-6:10, more on persuasion at 13:05-18:00). The audience questions also bring up where the comparative advantage of Deepmind is for evals (23:15-25:00) and missing threat models (26:05-26:45).
The security mitigation and capabilities threshold tables are messed up, security mitigation is missing the first four levels, capabilities threshold is missing the first big chunk, looks like the formatting may have cut off everything before the page breaks in the Framework doc
Otherwise looks good, I always appreciate the summaries. It's really helpful when you say whether or not something is worth reading in full.
Yep, thanks, it was a pasting issue (I was editing in Google Docs and they failed to copy back). Should be fixed now.
Podcast episode for this post:
https://askwhocastsai.substack.com/p/on-deepminds-frontier-safety-framework
I think the Mary Phuong talk at EAG London 2024 (https://youtube.com/watch?v=ZTmRT2Hg1oM) can add some detail on their method and considerations. In particular, they are exploring persuasion and misalignment as well, but hadn't got far enough with those to include in the first publication (5:15-6:10, more on persuasion at 13:05-18:00). The audience questions also bring up where the comparative advantage of Deepmind is for evals (23:15-25:00) and missing threat models (26:05-26:45).