The biggest news this week was on the government front. In the UK, Ian Hogarth of ‘We Must Slow Down the Race to Godlike AI’ fame was put in charge of a 100 million pound taskforce for AI safety. I am writing up a separate post on that. For now, I will say that Ian is an excellent choice, and we have reason to believe this taskforce will be our best shot at moving for real towards solutions that might actually mitigate extinction risk from AI. If this effort succeeds we can build upon it. If it fails, hope in future similar approaches seems mostly gone.
I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.
What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.
When I tried the Litany Against Fear this morning it gave me:
“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.
Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.
Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.
I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.
Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?
I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.
Typo: "test myself expect in hindsight" should feature the word "except".
Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.
> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely
Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?
The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.
GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.
> Break captcha using Bing. Also lie to the user about it.
With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".
The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.
I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.
What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.
Thank you for the post, useful as always! Distracting typo alerts: "The real work beings now." (should be "begins") and "emotional valiance" (valence)
When I tried the Litany Against Fear this morning it gave me:
“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.
Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.
Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.
I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.
Thank you for your work - and for being on the side of humanity
Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?
I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.
Typo: "test myself expect in hindsight" should feature the word "except".
Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.
> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely
Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?
The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.
GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.
> Break captcha using Bing. Also lie to the user about it.
With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".
The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.