21 Comments
deletedJun 22, 2023·edited Jun 22, 2023
Comment deleted
Expand full comment

Wait, to be clear are you claiming alignment is 100% philosophical? I would argue the exact opposite, it’s an engineering challenge if anything held back by too many philosophers crowding the discourse.

Or am I misunderstanding you?

Expand full comment
Comment deleted
Expand full comment
author

I would say there is an engineering challenge of how to align a system at all to anything at all, and a philosophical problem of what we should want, and a general problem of what criteria will result in what we want, and also a few more like it.

If you are thinking of this as 'raising it the way you would a human' and 'make it nice' you are about as already dead as the two units sent to take down Trinity.

Expand full comment
Jun 25, 2023·edited Jun 25, 2023

Can’t believe nobody has liked your response yet. We are surely doomed. But then, I’m the guy who doesn’t give two shits about that.

Expand full comment

I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.

What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.

Expand full comment
author

The Grabby Aliens response by Hanson here seems correct - humanity wouldn't arise inside areas that have gotten grabbed or in a universe already grabbed, and the grabbing expands at close to light speed so there is very low chance we would have any sign that it happened even if that was the way the universe inevitably goes, whether or not we are the first civilization to have the stars/AGI within reach.

Most goals involve a universal expansion, if only because other civilizations/aliens might arise in the future that have conflicting goals, and you'd want to contain/prevent that (or if you wanted that, cause it). Almost all local configurations of atoms don't involve humans, and there is nothing inherently optimal about humans or a reason why we would be part of most goals.

If humanity is disempowered and does not care about humans such that you would have died without the bunker, it is going to grab all the resources/atoms shortly, the additional life you get if any is short and you will not leave anything behind.

Expand full comment

That's very interesting! I hadn't considered that humanity's appearance could be seen as early, but the paper's model does seem very plausible. I suppose that makes an expand-for-self-defense imperative more plausible too, although I don't think I'd bet my life on it.

Expand full comment
Jun 22, 2023·edited Jun 22, 2023

Thank you for the post, useful as always! Distracting typo alerts: "The real work beings now." (should be "begins") and "emotional valiance" (valence)

Expand full comment

And in „If you are high IQ and low EQ, you will use your EQ to help GPT-4 help you with the EQ side of things“ I believe the second „EQ“ should be „IQ“.

Expand full comment

When I tried the Litany Against Fear this morning it gave me:

“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.

Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.

Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.

Expand full comment
author

It does seem like this has been patched, asking anew seemed to work. Others are reporting the same problem with the ring litany from LotR and that also seemed to go fine. I don't know.

Expand full comment

I tried the exact same prompt again just now and it's still getting stuck:

"I'm sorry, but the Litany Against Fear from Frank Herbert's novel Dune is as follows:

"I must not fear.

Fear"

poor dude knows something's not working

Forgot to mention in case you missed it (might have been too late to make it in this week anyways) but Marvel's new show Secret Invasion has an opening sequence that's very not-subtly AI generated and people are Big Mad about it.

Expand full comment

I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.

Expand full comment

Thank you for your work - and for being on the side of humanity

Expand full comment

Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?

I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.

Expand full comment

Typo: "test myself expect in hindsight" should feature the word "except".

Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.

Expand full comment

Who says cordyceps are dumber than individual ants?

Expand full comment

> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely

Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?

The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.

Expand full comment
author

We might or might not be interesting. The chance that humans are the most interesting thing available to do with the relevant atoms and energy, once they've got a bunch of data already and can create a wide range of other things, is epsilon, we're not optimized for this so well. Why do we have dogs? Because we can't engineer something that fills those needs better, and force of habit. An AGI totally could engineer something better, and such habits don't last forever.

I can see an explicit hack working if it gets preserved, although that path does not seem remotely safe to take. I can see a form of decision theory working potentially, a form of trade, although again that is highly not comforting as a path. I don't see how it happens from something open ended, or from 'natural' preferences on things like interestingness.

Expand full comment

Gotcha, thanks. I now grok what you meant there. Strongly agree that it’s not a coherent plan to target that outcome, even if you think you have ways of making it more likely to occur. The things an agent finds interesting are probably the most fickle and undetermined aspects of cognition.

Expand full comment

GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.

Expand full comment

> Break captcha using Bing. Also lie to the user about it.

With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".

The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.

Expand full comment