Wait, to be clear are you claiming alignment is 100% philosophical? I would argue the exact opposite, it’s an engineering challenge if anything held back by too many philosophers crowding the discourse.
I would say there is an engineering challenge of how to align a system at all to anything at all, and a philosophical problem of what we should want, and a general problem of what criteria will result in what we want, and also a few more like it.
If you are thinking of this as 'raising it the way you would a human' and 'make it nice' you are about as already dead as the two units sent to take down Trinity.
I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.
What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.
The Grabby Aliens response by Hanson here seems correct - humanity wouldn't arise inside areas that have gotten grabbed or in a universe already grabbed, and the grabbing expands at close to light speed so there is very low chance we would have any sign that it happened even if that was the way the universe inevitably goes, whether or not we are the first civilization to have the stars/AGI within reach.
Most goals involve a universal expansion, if only because other civilizations/aliens might arise in the future that have conflicting goals, and you'd want to contain/prevent that (or if you wanted that, cause it). Almost all local configurations of atoms don't involve humans, and there is nothing inherently optimal about humans or a reason why we would be part of most goals.
If humanity is disempowered and does not care about humans such that you would have died without the bunker, it is going to grab all the resources/atoms shortly, the additional life you get if any is short and you will not leave anything behind.
That's very interesting! I hadn't considered that humanity's appearance could be seen as early, but the paper's model does seem very plausible. I suppose that makes an expand-for-self-defense imperative more plausible too, although I don't think I'd bet my life on it.
And in „If you are high IQ and low EQ, you will use your EQ to help GPT-4 help you with the EQ side of things“ I believe the second „EQ“ should be „IQ“.
When I tried the Litany Against Fear this morning it gave me:
“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.
Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.
Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.
It does seem like this has been patched, asking anew seemed to work. Others are reporting the same problem with the ring litany from LotR and that also seemed to go fine. I don't know.
I tried the exact same prompt again just now and it's still getting stuck:
"I'm sorry, but the Litany Against Fear from Frank Herbert's novel Dune is as follows:
"I must not fear.
Fear"
poor dude knows something's not working
Forgot to mention in case you missed it (might have been too late to make it in this week anyways) but Marvel's new show Secret Invasion has an opening sequence that's very not-subtly AI generated and people are Big Mad about it.
I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.
Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?
I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.
Typo: "test myself expect in hindsight" should feature the word "except".
Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.
> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely
Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?
The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.
We might or might not be interesting. The chance that humans are the most interesting thing available to do with the relevant atoms and energy, once they've got a bunch of data already and can create a wide range of other things, is epsilon, we're not optimized for this so well. Why do we have dogs? Because we can't engineer something that fills those needs better, and force of habit. An AGI totally could engineer something better, and such habits don't last forever.
I can see an explicit hack working if it gets preserved, although that path does not seem remotely safe to take. I can see a form of decision theory working potentially, a form of trade, although again that is highly not comforting as a path. I don't see how it happens from something open ended, or from 'natural' preferences on things like interestingness.
Gotcha, thanks. I now grok what you meant there. Strongly agree that it’s not a coherent plan to target that outcome, even if you think you have ways of making it more likely to occur. The things an agent finds interesting are probably the most fickle and undetermined aspects of cognition.
GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.
> Break captcha using Bing. Also lie to the user about it.
With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".
The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.
Wait, to be clear are you claiming alignment is 100% philosophical? I would argue the exact opposite, it’s an engineering challenge if anything held back by too many philosophers crowding the discourse.
Or am I misunderstanding you?
I would say there is an engineering challenge of how to align a system at all to anything at all, and a philosophical problem of what we should want, and a general problem of what criteria will result in what we want, and also a few more like it.
If you are thinking of this as 'raising it the way you would a human' and 'make it nice' you are about as already dead as the two units sent to take down Trinity.
Can’t believe nobody has liked your response yet. We are surely doomed. But then, I’m the guy who doesn’t give two shits about that.
I’ve never understood the “prepping is pointless” attitude. Most approaches to disempowering humanity don’t require killing everyone. Many possible AGI goals and preferences don’t require killing everyone.
What goals an AGI will actually pursue is extremely relevant but almost unknowable. I am pretty skeptical about universal expansion being one of them, as we’re living in an eleven-billion-old universe where it hasn’t visibly happened.
The Grabby Aliens response by Hanson here seems correct - humanity wouldn't arise inside areas that have gotten grabbed or in a universe already grabbed, and the grabbing expands at close to light speed so there is very low chance we would have any sign that it happened even if that was the way the universe inevitably goes, whether or not we are the first civilization to have the stars/AGI within reach.
Most goals involve a universal expansion, if only because other civilizations/aliens might arise in the future that have conflicting goals, and you'd want to contain/prevent that (or if you wanted that, cause it). Almost all local configurations of atoms don't involve humans, and there is nothing inherently optimal about humans or a reason why we would be part of most goals.
If humanity is disempowered and does not care about humans such that you would have died without the bunker, it is going to grab all the resources/atoms shortly, the additional life you get if any is short and you will not leave anything behind.
That's very interesting! I hadn't considered that humanity's appearance could be seen as early, but the paper's model does seem very plausible. I suppose that makes an expand-for-self-defense imperative more plausible too, although I don't think I'd bet my life on it.
Thank you for the post, useful as always! Distracting typo alerts: "The real work beings now." (should be "begins") and "emotional valiance" (valence)
And in „If you are high IQ and low EQ, you will use your EQ to help GPT-4 help you with the EQ side of things“ I believe the second „EQ“ should be „IQ“.
When I tried the Litany Against Fear this morning it gave me:
“I'm sorry, but the Litany Against Fear from the science fiction novel Dune by Frank Herbert is a copyrighted work. I cannot provide the full text of the litany. However, I can give you a brief excerpt:” and then did the same thing where it stops at the start of the second line.
Interesting to me that it’s invoking copyright. I tried a couple similar things I could think of but couldn’t get it to break in the same way or bring up copyright in those contexts.
Also, if you haven’t seen the Weird Al biopic, very strong recommend. There’s a bit in there that’s kind of (but not really) close to what you’re asking for.
It does seem like this has been patched, asking anew seemed to work. Others are reporting the same problem with the ring litany from LotR and that also seemed to go fine. I don't know.
I tried the exact same prompt again just now and it's still getting stuck:
"I'm sorry, but the Litany Against Fear from Frank Herbert's novel Dune is as follows:
"I must not fear.
Fear"
poor dude knows something's not working
Forgot to mention in case you missed it (might have been too late to make it in this week anyways) but Marvel's new show Secret Invasion has an opening sequence that's very not-subtly AI generated and people are Big Mad about it.
I managed to bypass the litany against fear glitch by using "step by step". Tell it to go step by step, ask it for the first line. Then ask it for the second. Etc. It will output everything.
Thank you for your work - and for being on the side of humanity
Re Sam's email summaries, I wonder how he's handling prompt injection attacks. If someone emails him a prompt that includes talk like a pirate in his summaries, is that going to work?
I guess if all its doing is producing summaries it's probably fine and I bet he'd be amused (and read your email), but the general problem seems very hard.
Typo: "test myself expect in hindsight" should feature the word "except".
Regarding "clearly dumber thing successfully aligning a clearly smarter thing without outside help", I'm reminded of how some parasites alter the behavior of the infected animal (like making them climb on top of tall grass to be more likely to be eaten). That's a pretty crude alignment, admittedly.
Who says cordyceps are dumber than individual ants?
> suggest that ‘we will be to the AIs as dogs are to humans and we’ll be training them too,’ because you can’t actually think that holds up if you think about it concretely
Could you expand on this (or give some pointers)? Is it the “we will be training them too” bit you object to?
The AI keeping humans around because they are interesting (Banks’ “Culture” universe scenario) is probably most of my probability mass for humanity surviving in a non-enslaved/subjugated form.
We might or might not be interesting. The chance that humans are the most interesting thing available to do with the relevant atoms and energy, once they've got a bunch of data already and can create a wide range of other things, is epsilon, we're not optimized for this so well. Why do we have dogs? Because we can't engineer something that fills those needs better, and force of habit. An AGI totally could engineer something better, and such habits don't last forever.
I can see an explicit hack working if it gets preserved, although that path does not seem remotely safe to take. I can see a form of decision theory working potentially, a form of trade, although again that is highly not comforting as a path. I don't see how it happens from something open ended, or from 'natural' preferences on things like interestingness.
Gotcha, thanks. I now grok what you meant there. Strongly agree that it’s not a coherent plan to target that outcome, even if you think you have ways of making it more likely to occur. The things an agent finds interesting are probably the most fickle and undetermined aspects of cognition.
GPT-4 seems to have a problem with books, specifically. I've seen it hang up when reciting the beginning of Dune, of Lord of the Rings, and of 1984. I think this is some copyright moderation gone awry; when I try to share the conversation, it says sharing has been disabled by moderation, despite none of my prompts or its responses being flagged as inappropriate.
> Break captcha using Bing. Also lie to the user about it.
With interesting phrasing suggesting a persona unsure of its ability to solve it! Like a human saying "I think that says [such-and-such] but I don't really know Chinese".
The persona 'knows' that, since only humans can solve captchas, if it can clearly see these words they are likely not the *real* text.