Discussion about this post

User's avatar
valencia_o's avatar

The security mitigation and capabilities threshold tables are messed up, security mitigation is missing the first four levels, capabilities threshold is missing the first big chunk, looks like the formatting may have cut off everything before the page breaks in the Framework doc

Expand full comment
radio's avatar

I think the Mary Phuong talk at EAG London 2024 (https://youtube.com/watch?v=ZTmRT2Hg1oM) can add some detail on their method and considerations. In particular, they are exploring persuasion and misalignment as well, but hadn't got far enough with those to include in the first publication (5:15-6:10, more on persuasion at 13:05-18:00). The audience questions also bring up where the comparative advantage of Deepmind is for evals (23:15-25:00) and missing threat models (26:05-26:45).

Expand full comment
3 more comments...

No posts