Discussion about this post

User's avatar
Johan Falk's avatar

I've started to use the term "technological sonic boom" (rings better in Swedish) to describe what could happen if technology advances much quicker than it diffuses.

I think we can end up in scenarios like the ones you can see in the Civ games, where one side has invented electricity while the other are learning to sail. Or one side has bows and arrows while the other one has tanks. Only that the actors aren't coherent civilizations or nations, but companies, agencies, labs, or even Elon Musk.

I suspect many of the most destabilizing AI scenarios look something like that, and I'd very much prefer we avoid finding out.

aoeuhtns's avatar

i think i'm actually coming around to the gov's judgement here. As stupid as it is to call "fix this code" a jailbreak, in context it did effectively bypass a guardrail. Fable was asked for information on security vulnerabilities and refused to answer. Then when asked to "fix this code", Fable gave up information it had just refused to give. If the guardrail was there for any reason at all then it's a problem that you can get around it. In the specific example, apparently Opus 4.8 or GPT 5.5 would give you the same information. But I don't think that's really relevant. The issue is the pattern demonstrated, where you can get around a guardrail. Unless Fable is somehow discerning about when it obeys "fix this code" such that it will refuse to do so on a sufficiently complicated codebase where Opus 4.8/GPT 5.5 would be unable to find the vulnerabilities?

the lawless, ad hoc, capricious nature of all this is still completely insane. But I think this was correctly flagged as an issue with Fable

23 more comments...

No posts

Ready for more?