21 Comments
deletedJun 22, 2023·edited Jun 22, 2023
Comment deleted
Expand full comment

I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.

What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.

Expand full comment
Jun 22, 2023·edited Jun 22, 2023

Thank you for the post, useful as always! Distracting typo alerts: "The real work beings now." (should be "begins") and "emotional valiance" (valence)

Expand full comment

When I tried the Litany Against Fear this morning it gave me:

“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.

Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.

Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.

Expand full comment

I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.

Expand full comment

Thank you for your work - and for being on the side of humanity

Expand full comment

Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?

I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.

Expand full comment

Typo: "test myself expect in hindsight" should feature the word "except".

Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.

Expand full comment

> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely

Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?

The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.

Expand full comment

GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.

Expand full comment

> Break captcha using Bing. Also lie to the user about it.

With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".

The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.

Expand full comment