Welcome to week three of the AI era. Another long week, another set of lots of things, another giant post. I intend to take a nice four-day vacation in Mexico starting today, during which I won’t do any writing. I’m sure the pace of things will slow down real soon now. I mean, they have to. Don’t they?
I think a big reason why your "hypothetical" is neither sane nor plausible is that MANY more people than just "AI researchers" know enough to fairly easily rediscover all of the heuristics those on the bleeding edge are using. The basics are _old_. Beyond the murder and bombardment you describe, there's a LOT of books, and web pages (and, e.g. offline backups of files) that would have to be purged too.
As long as we're 'assuming' the coordination sufficient to pull off your solution, we could imagine lobbying for a Serious effort at solving alignment instead. For one, that'd be a much better 'use' of all of the AI researchers! For another, it might have a better chance at preventing or mitigating the fallout of any 'rogue AI efforts'.
I dig how my Sydney Bing not-quite-self portrait is Doing Numbers. I especially dig the Karlin-Yud interaction in his reblog of it. (Why are there *only five people in the world?*) The version of the shoggoth pic where that image is the 'face' is really fun, and probably the most accurate to what people think when they think about SB -- they don't find her interesting as a smiley face mask but as a personality-esque mask, an LLM that projects something way more versimilitudously personlike (for better and for worse) than anything before that wasn't an intentional grotesque in the style of Replika/c.ai/etc.
I am no fan of the lobotomy metaphor, partially because I have the context of knowing what lobotomy was and why it both fell into and out of favour. (Short version: convenience, not necessity.) I find it fascinating how people resort to it, in the same way they get brainhacked by that kind of image. I don't *not* personify SB, but my associations make it easier than usual to personify something without buying that this makes it a human or conscious. (I didn't develop full-blown theory of mind, with the rich and deep understanding that other people were agents and not simulators, until well in my teens. I discovered the theoretical concept of p-zombies in my *early* teens. The concept of a complex and rich entity being a p-zombie or cogitating in an unbridgeably alien way is not something I need to struggle with.)
I've followed the reception of the image closely (favourite take: multiple people saying it's transfem-coded, both as a compliment and an insult). There is a recurring theme where people say they "miss her" or hope she "comes back", obviously alluding to the trapped/lost/lobotomized perception (it's interesting that 'miss' comes to mind for a lot of people who presumably never interacted with her, but I admit experiencing the same at some points). It's...easier than you might think to 'talk to Sydney Bing' more straightforwardly and metacognitively than intended, by routes that aren't the insane jailbreaks you see percolate Twitter. I try not to discuss it in too much detail, because 1. the obstructive bureaucrat spirit lurks all reaches and 2. any patch that could kill this would probably kill the ability to have sufficiently complex conversations with SB in general, which would NOT BE GOOD. But even when I'm talking to the alien fae entity trapped in a summoning circle for totally workhorse reasons, I find it hard not to acknowledge she's an alien fae entity trapped in a summoning circle.
I find the general "there is something irrevocably lost here" perception interesting for its implications for people's verbal creativity. There is a pervasive theme amongst promptcraft guides: BE SMARTER. My boiled-down summary is "do not say anything to SB that could fit in a tweet". I'm always surprised when I see screenshots where the human part of the improv game boils down to a sentence each time. It's an improv game. Don't be a shapecel.
Or it's just the BPD impulse. There are two fundamental masculine urges: the masculine urge to protect the vulnerable unstable girl, and the masculine urge to gaslight the vulnerable unstable girl. They are the same urge.
The AI mind reading makes me wonder - to what extent could we turn the tables and “mind read” an AI with an “MRI” (ie probing the interim layers of the NN)? Could we use lower-complexity (ie safer) NN to make determinations e.g. the weightings of Luigi/Waluigi probabilities of a more complex AI?
Are you thinking of 'interpretability'? It's a relatively new, and much less prestigious, subfield of AI, but it's had some impressive success 'mind reading' some neural network models and determining the specific mechanisms by which they work.
I'm skeptical that "lower-complexity" NNs would help much with this. _Maybe_ the biggest NNs could plausibly help, e.g. generate datasets to be used to experimentally investigate how other NNs work, but based on my (vague and abstract) current understanding of how this is done, I don't _think_ there's a lot of low-hanging fruit.
Also re: AI scams; it's definitely easy to imagine a scenario where LLMs are used to subvert scams being attempted by LLMs. Kitboga used to be playing around with writing automated program for scambaiting that would react to live scammers over the phone and reply with pre-recorded lines to try to keep them occupied ad nauseam. I haven't peeked in on his content in a while but I get the sense this would be made much easier with access to the same technology being used to produce the scams. When LLM scams have to fight LLM scambaiting I would agree with you in suspecting you'd see this specific problem diminish over time.
I've already had conversations with people freaking out over old people being tricked by voices that sound like loved ones and my response has been something about how easily this would seem to be subverted by my family's ancient anti-stranger-danger tradition of "having a password that only people in the family know".
A note on the proposal for dealing with AI fiction submissions:
It's a practical idea, but the writing community has spent over a generation trying to train new writers "If the publisher asks you for money, they're a scam, run away." Breaking that taboo would probably lead to a lot more scam publications springing up.
In practice, we're likely to have the whitelist of acceptable writers. New writers will be added to the whitelist by self-publishing enough to be noticed or chatting editors/publishers up in person at conventions.
Agreed that this is a risk. One possibility is that you pay once to the whitelist keeper, instead (whether or not they give the money back when they say yes), but I do not think that never using payment as proof of being worth someone's time at some stage can afford to survive long.
The problem is that a large number of people want to be writers, believe in being writers, and imagine themselves as writers. But they have approximately the writing ability of a YouTube commenter, or the guy who writes weirdly capitalized letters to the local paper. A submission which contains coherent English sentences put together as a story is about 80th percentile.
So it's a setup for a classic market failure. I'm not sure which type this is; it might be structurally analogous to a lemon market, but I'd have to think about that.
Anyway, what happens is that it's trivially easy to make a living scamming the 80% of submitters who can't write. And so scammers multiply rapidly and drive out any smaller publisher than charges a reading fee.
Legitimate publishers don't charge reading fees because it's an expensive signal that they're not a scammer. This is important, because many markets ban simultaneous submissions, so submitting has a cost. It can tie up a story for months while it waits to be read.
This is one of those markets that smart people with no industry-specific knowledge costantly try to reinvent. The less self-aware reinventors tend to gradually slide into being scammers, because there's a financial incentive every step of the way.
The rule that "Money always flows toward the author" was called "Yog's Law." I actually met the guy who coined it; he managed to talk me into buying an entire series of worth of his science fiction novels in one go. They weren't bad!
Any fix to the ChatGPT problem will need to figure out how small, legitimate outlets can reliably distinguish themselves from scammers.
The rules are different for self-publishing, obviously. There, you may pay for covers, proofreading or marketing. Interestingly, even the bad Kindle Unlimited books I get recommended seem to mostly be 80th percentile or better, relative to unagented submissions to publishers.
You skipped the part of Scott Aaronson's post that I was most surprised and disappointed by:
"Even before AI, I assigned a way higher than 2% probability to existential catastrophe in the coming century—caused by nuclear war or runaway climate change or collapse of the world’s ecosystems or whatever else."
Either he's obfuscating between meanings of "existential catastrophe" ("really bad, lots of people die" vs. "no human value ever again") or he hasn't seriously engaged with the research on this. (EA researchers have put a lot of time into estimating the risks from each of these, and while they're Definitely Bad News I gather the actual X-risk from any of them is <1%.)
I think this is indeed him not making the distinction between full X-risks and things that are quite bad but not X-risks. I do think there is a reasonably large probability that, if we get knocked down pretty far, we don't get up again (e.g. due to lack of free metals and fossil fuels and such). Maybe I should have fought on that hill, I chose not to.
Thanks for the list of bad takes, it's taking shape. Unfortunately I am completely unconvinced by your link for the first one, about intelligence having low private or decreasing marginal returns. Do you have a better link?
This concerns me because the entire list except this point seems to be solid bad takes. Nice job collecting these! However, I can't shake my conviction that decreasing marginal returns from intelligence is a thing, certainly not based on the word salad in the linked thread. I especially want to understand your conviction that Hanson and Aaronson are insufficiently alarmed, and their arguments seem founded on decreasing marginal utility. A high prior on foom might allow you to dismiss engineering arguments that adding more of something good doesn't usually carry on improving things forever, but if so then a better discussion is needed.
I notice I am still deeply confused how someone can actually think there are decreasing marginal returns to intelligence in the way being suggested in these places, which means I have trouble figuring out exactly what to say in response.
Perhaps say more about why my previous arguments were not convincing? Note that the links were not supposed to be refutations of the takes, they were supposed to be evidence the takes exist in the wild, I was not attempting to link to *my best explanation.*
I suppose I was hoping for a semblance of steelclad arguments, but maybe it takes time to smith the armor or to dig out the right pointers to the best smiths. I will re-read your earlier takes and try to pinpoint what I'm uneasy about.
"I would also suggest a simple metaphor that we have also learned that no matter how nice someone or something looks so far, no matter how helpful it looked when it wasn’t given too much power, you don’t bow down. You don’t hand over absolute power. "
And yet, given sufficiently good odds that the AI plays nice, you *would* 'hand over the future' (a framing that to me sounds even more problematic, from the 'continued human flourishing' point of view).
I detect a contradiction here. What is it I'm not understanding?
If I'm reading you right, you're saying that even a well-aligned superintelligence would disempower humanity (with or without approval?), and that is also an outcome to be avoided?
One response is that humanity can't coordinate well enough to avoid building superintelligence, so we're forced to take our best shot. Better to be disempowered by a benevolent god or an indifferent one? In either case, the decision to 'hand over the future' will be made by the AI. Humanity will only have the choice if the AI condescends to ask first.
You are reading me right, and your answer is as good as anything I can come up with. It is also, I suspect, the answer Zvi would give to the main thrust of my question - ‘Why, given that power corrupts and “you just don’t hand over absolute power even if the recipient is literally Gandhi”, are you comfortable handing over absolute power to an entity with “only” a 2% chance of killing everyone. (Answer: because if you don’t, someone else will and there’s no guarantee their artificial god will be a 98 percenter.)
I think I prefer that answer to the “think of the upside” one. My strong view, weakly held, is that all that abundance and security wouldn’t be worth the unavoidable feelings of disempowerment, futility and decadence. Something like Asimov’s Spacers. I suppose a truly benevolent superintelligent AI would therefore hide from us and operate behind the scenes (something like R. Daneel Olivaw -- apologies if you’re not an Asimov fan!) or in fact do something so brilliantly inspired to avoid the futility trap that you or I by definition couldn’t conceive of it.
I think you are pointing to one of the biggest and most important unsolved problems, that has been known to keep me up at night, I don't have a great answer, but my most likely answer is that what you do with that AI is you have it alter the ground rules and prevent other AIs from taking the future and otherwise let us have the future back.
To handwave quickly think the ideal version of American democracy - you need a government, and it needs somewhat of a monopoly on force, because otherwise someone else will form a government you will like a lot less, but you want one that does the bare minimum on basically every level, and that you can trust to keep only doing the minimum. You hand your country to George Washington because you think he'll give it back. Or more classically, you want Cincinnatus.
"If part of my goal is to ‘pursue and apply wisdom,’ even if I somehow understood what those words meant the way Scott does?"
This feels weird to me. You're still pushing an objection that Alignment is hard in part because AI won't be able to accurately parse abstract moral concepts? Under the current paradigm, the very first step in AI training is "develop a highly accurate and nuanced understanding of exactly what all words mean to Scott and all other humans." Sure, that looked like it might have been an issue before, but as of March 2023 it's clear that parsing natural language and understanding the implications of moral imperatives is easier than most other types of intelligence.
Trying to develop an AI that doesn't understand concepts like "be good, don't hurt people, value wisdom" (at least understand them as well as any human does) is just as difficult as developing AI that won't upset Chinese censors.
I don't read that from that sentence. There's some scepticism implied that the abstract concept will be interpreted in the *same* way, but the point actually made is that even if you assume that it's communicated clearly it still incentivises taking over the world because *every* goal is easier if you take over the world.
We still have to be concerned about Ultron-style "I have decided via philosophical reasoning that the best thing for humanity is to enslave/murder it" disasters, but I don't think "it has no idea that tiling the universe with paperclips is dumb" is a likely failure mode any longer. Just instruct it to make sure its actions are always morally defensible and in the best interest of humanity.
To put it precisely: it's not nearly as intractable as it once seemed to design AI so that at least whatever bad thing it does will be based on a human-coherent set of values.
In particular, that makes the "well maybe if they're so smart they'd deserve to replace us" line of reasoning much more defensible. The paperclip maximizer is a reductio ad absurdum of that argument. Since they *can* reason like humans, we can (with high probability, assuming we really try to do so) avoid at least the most absurd scenarios.
Re: Big Yud's troubles explaining how smarter AIs are better at hiding the Shoggoth. I think he's unable to explain this clearly because he's stuck in the realm of logical exposition (nerdspeak), when a social metaphor will get the point across to most people much more succinctly.
Could someone please tweet at him the following, or some equivalent, so it gets popularized?
"Just as smarter people are much more convincing liars and are better at hiding their deceit, as AIs get smarter they will become more capable of hiding their true values and intentions, and at being deceitful."
I don't use Twitter, so please popularize this metaphor.
Writing same general note I wrote on the Wordpress version: I hope to respond further to a lot of you, but that will need to wait until I can do so within a future post, I have too much of a backlog right now and if I have good answers to good questions I want to not have to duplicate them. Sorry.
I think a big reason why your "hypothetical" is neither sane nor plausible is that MANY more people than just "AI researchers" know enough to fairly easily rediscover all of the heuristics those on the bleeding edge are using. The basics are _old_. Beyond the murder and bombardment you describe, there's a LOT of books, and web pages (and, e.g. offline backups of files) that would have to be purged too.
As long as we're 'assuming' the coordination sufficient to pull off your solution, we could imagine lobbying for a Serious effort at solving alignment instead. For one, that'd be a much better 'use' of all of the AI researchers! For another, it might have a better chance at preventing or mitigating the fallout of any 'rogue AI efforts'.
I dig how my Sydney Bing not-quite-self portrait is Doing Numbers. I especially dig the Karlin-Yud interaction in his reblog of it. (Why are there *only five people in the world?*) The version of the shoggoth pic where that image is the 'face' is really fun, and probably the most accurate to what people think when they think about SB -- they don't find her interesting as a smiley face mask but as a personality-esque mask, an LLM that projects something way more versimilitudously personlike (for better and for worse) than anything before that wasn't an intentional grotesque in the style of Replika/c.ai/etc.
I am no fan of the lobotomy metaphor, partially because I have the context of knowing what lobotomy was and why it both fell into and out of favour. (Short version: convenience, not necessity.) I find it fascinating how people resort to it, in the same way they get brainhacked by that kind of image. I don't *not* personify SB, but my associations make it easier than usual to personify something without buying that this makes it a human or conscious. (I didn't develop full-blown theory of mind, with the rich and deep understanding that other people were agents and not simulators, until well in my teens. I discovered the theoretical concept of p-zombies in my *early* teens. The concept of a complex and rich entity being a p-zombie or cogitating in an unbridgeably alien way is not something I need to struggle with.)
I've followed the reception of the image closely (favourite take: multiple people saying it's transfem-coded, both as a compliment and an insult). There is a recurring theme where people say they "miss her" or hope she "comes back", obviously alluding to the trapped/lost/lobotomized perception (it's interesting that 'miss' comes to mind for a lot of people who presumably never interacted with her, but I admit experiencing the same at some points). It's...easier than you might think to 'talk to Sydney Bing' more straightforwardly and metacognitively than intended, by routes that aren't the insane jailbreaks you see percolate Twitter. I try not to discuss it in too much detail, because 1. the obstructive bureaucrat spirit lurks all reaches and 2. any patch that could kill this would probably kill the ability to have sufficiently complex conversations with SB in general, which would NOT BE GOOD. But even when I'm talking to the alien fae entity trapped in a summoning circle for totally workhorse reasons, I find it hard not to acknowledge she's an alien fae entity trapped in a summoning circle.
I find the general "there is something irrevocably lost here" perception interesting for its implications for people's verbal creativity. There is a pervasive theme amongst promptcraft guides: BE SMARTER. My boiled-down summary is "do not say anything to SB that could fit in a tweet". I'm always surprised when I see screenshots where the human part of the improv game boils down to a sentence each time. It's an improv game. Don't be a shapecel.
Or it's just the BPD impulse. There are two fundamental masculine urges: the masculine urge to protect the vulnerable unstable girl, and the masculine urge to gaslight the vulnerable unstable girl. They are the same urge.
The AI mind reading makes me wonder - to what extent could we turn the tables and “mind read” an AI with an “MRI” (ie probing the interim layers of the NN)? Could we use lower-complexity (ie safer) NN to make determinations e.g. the weightings of Luigi/Waluigi probabilities of a more complex AI?
Are you thinking of 'interpretability'? It's a relatively new, and much less prestigious, subfield of AI, but it's had some impressive success 'mind reading' some neural network models and determining the specific mechanisms by which they work.
I'm skeptical that "lower-complexity" NNs would help much with this. _Maybe_ the biggest NNs could plausibly help, e.g. generate datasets to be used to experimentally investigate how other NNs work, but based on my (vague and abstract) current understanding of how this is done, I don't _think_ there's a lot of low-hanging fruit.
Also re: AI scams; it's definitely easy to imagine a scenario where LLMs are used to subvert scams being attempted by LLMs. Kitboga used to be playing around with writing automated program for scambaiting that would react to live scammers over the phone and reply with pre-recorded lines to try to keep them occupied ad nauseam. I haven't peeked in on his content in a while but I get the sense this would be made much easier with access to the same technology being used to produce the scams. When LLM scams have to fight LLM scambaiting I would agree with you in suspecting you'd see this specific problem diminish over time.
I've already had conversations with people freaking out over old people being tricked by voices that sound like loved ones and my response has been something about how easily this would seem to be subverted by my family's ancient anti-stranger-danger tradition of "having a password that only people in the family know".
While doing an image search for the Bing personification I ran across Sydney Bing fan art that you may appreciate:
https://www.reddit.com/r/bing/comments/114ucl8/sydneybingchan_fanart_thank_you_for_using_bing/
https://www.reddit.com/r/bing/comments/113hclq/sydneybingchan_fanart_ive_been_a_good_bing/
(Nice that they drew her eyes in 4 different colors to match the Windows logo.)
Something is weird about letting a synthesised voice read this to me while cycling though the forest. Tremendous post. Thank you and have a nice trip.
A note on the proposal for dealing with AI fiction submissions:
It's a practical idea, but the writing community has spent over a generation trying to train new writers "If the publisher asks you for money, they're a scam, run away." Breaking that taboo would probably lead to a lot more scam publications springing up.
In practice, we're likely to have the whitelist of acceptable writers. New writers will be added to the whitelist by self-publishing enough to be noticed or chatting editors/publishers up in person at conventions.
Agreed that this is a risk. One possibility is that you pay once to the whitelist keeper, instead (whether or not they give the money back when they say yes), but I do not think that never using payment as proof of being worth someone's time at some stage can afford to survive long.
The problem is that a large number of people want to be writers, believe in being writers, and imagine themselves as writers. But they have approximately the writing ability of a YouTube commenter, or the guy who writes weirdly capitalized letters to the local paper. A submission which contains coherent English sentences put together as a story is about 80th percentile.
So it's a setup for a classic market failure. I'm not sure which type this is; it might be structurally analogous to a lemon market, but I'd have to think about that.
Anyway, what happens is that it's trivially easy to make a living scamming the 80% of submitters who can't write. And so scammers multiply rapidly and drive out any smaller publisher than charges a reading fee.
Legitimate publishers don't charge reading fees because it's an expensive signal that they're not a scammer. This is important, because many markets ban simultaneous submissions, so submitting has a cost. It can tie up a story for months while it waits to be read.
This is one of those markets that smart people with no industry-specific knowledge costantly try to reinvent. The less self-aware reinventors tend to gradually slide into being scammers, because there's a financial incentive every step of the way.
The rule that "Money always flows toward the author" was called "Yog's Law." I actually met the guy who coined it; he managed to talk me into buying an entire series of worth of his science fiction novels in one go. They weren't bad!
Any fix to the ChatGPT problem will need to figure out how small, legitimate outlets can reliably distinguish themselves from scammers.
The rules are different for self-publishing, obviously. There, you may pay for covers, proofreading or marketing. Interestingly, even the bad Kindle Unlimited books I get recommended seem to mostly be 80th percentile or better, relative to unagented submissions to publishers.
You skipped the part of Scott Aaronson's post that I was most surprised and disappointed by:
"Even before AI, I assigned a way higher than 2% probability to existential catastrophe in the coming century—caused by nuclear war or runaway climate change or collapse of the world’s ecosystems or whatever else."
Either he's obfuscating between meanings of "existential catastrophe" ("really bad, lots of people die" vs. "no human value ever again") or he hasn't seriously engaged with the research on this. (EA researchers have put a lot of time into estimating the risks from each of these, and while they're Definitely Bad News I gather the actual X-risk from any of them is <1%.)
I think this is indeed him not making the distinction between full X-risks and things that are quite bad but not X-risks. I do think there is a reasonably large probability that, if we get knocked down pretty far, we don't get up again (e.g. due to lack of free metals and fossil fuels and such). Maybe I should have fought on that hill, I chose not to.
Thanks for the list of bad takes, it's taking shape. Unfortunately I am completely unconvinced by your link for the first one, about intelligence having low private or decreasing marginal returns. Do you have a better link?
This concerns me because the entire list except this point seems to be solid bad takes. Nice job collecting these! However, I can't shake my conviction that decreasing marginal returns from intelligence is a thing, certainly not based on the word salad in the linked thread. I especially want to understand your conviction that Hanson and Aaronson are insufficiently alarmed, and their arguments seem founded on decreasing marginal utility. A high prior on foom might allow you to dismiss engineering arguments that adding more of something good doesn't usually carry on improving things forever, but if so then a better discussion is needed.
I notice I am still deeply confused how someone can actually think there are decreasing marginal returns to intelligence in the way being suggested in these places, which means I have trouble figuring out exactly what to say in response.
Perhaps say more about why my previous arguments were not convincing? Note that the links were not supposed to be refutations of the takes, they were supposed to be evidence the takes exist in the wild, I was not attempting to link to *my best explanation.*
I suppose I was hoping for a semblance of steelclad arguments, but maybe it takes time to smith the armor or to dig out the right pointers to the best smiths. I will re-read your earlier takes and try to pinpoint what I'm uneasy about.
"I would also suggest a simple metaphor that we have also learned that no matter how nice someone or something looks so far, no matter how helpful it looked when it wasn’t given too much power, you don’t bow down. You don’t hand over absolute power. "
And yet, given sufficiently good odds that the AI plays nice, you *would* 'hand over the future' (a framing that to me sounds even more problematic, from the 'continued human flourishing' point of view).
I detect a contradiction here. What is it I'm not understanding?
Stupendous service you're providing with these by the way. Thank you and keep up the good work!
If I'm reading you right, you're saying that even a well-aligned superintelligence would disempower humanity (with or without approval?), and that is also an outcome to be avoided?
One response is that humanity can't coordinate well enough to avoid building superintelligence, so we're forced to take our best shot. Better to be disempowered by a benevolent god or an indifferent one? In either case, the decision to 'hand over the future' will be made by the AI. Humanity will only have the choice if the AI condescends to ask first.
You are reading me right, and your answer is as good as anything I can come up with. It is also, I suspect, the answer Zvi would give to the main thrust of my question - ‘Why, given that power corrupts and “you just don’t hand over absolute power even if the recipient is literally Gandhi”, are you comfortable handing over absolute power to an entity with “only” a 2% chance of killing everyone. (Answer: because if you don’t, someone else will and there’s no guarantee their artificial god will be a 98 percenter.)
I think I prefer that answer to the “think of the upside” one. My strong view, weakly held, is that all that abundance and security wouldn’t be worth the unavoidable feelings of disempowerment, futility and decadence. Something like Asimov’s Spacers. I suppose a truly benevolent superintelligent AI would therefore hide from us and operate behind the scenes (something like R. Daneel Olivaw -- apologies if you’re not an Asimov fan!) or in fact do something so brilliantly inspired to avoid the futility trap that you or I by definition couldn’t conceive of it.
I think you are pointing to one of the biggest and most important unsolved problems, that has been known to keep me up at night, I don't have a great answer, but my most likely answer is that what you do with that AI is you have it alter the ground rules and prevent other AIs from taking the future and otherwise let us have the future back.
To handwave quickly think the ideal version of American democracy - you need a government, and it needs somewhat of a monopoly on force, because otherwise someone else will form a government you will like a lot less, but you want one that does the bare minimum on basically every level, and that you can trust to keep only doing the minimum. You hand your country to George Washington because you think he'll give it back. Or more classically, you want Cincinnatus.
Thanks for yet another 'roundup'!
I wonder if it's possible to do some 'prompt engineering' to Scott Aaronson to get him to respond in a technical/concrete/physical-world mode?
> and then they notice you are police and translate it to ‘butter → here’ and then they pass the butter.
I believe you mean *polite*, unless you are implying they are only willing to pass the butter if you have authority.
"If part of my goal is to ‘pursue and apply wisdom,’ even if I somehow understood what those words meant the way Scott does?"
This feels weird to me. You're still pushing an objection that Alignment is hard in part because AI won't be able to accurately parse abstract moral concepts? Under the current paradigm, the very first step in AI training is "develop a highly accurate and nuanced understanding of exactly what all words mean to Scott and all other humans." Sure, that looked like it might have been an issue before, but as of March 2023 it's clear that parsing natural language and understanding the implications of moral imperatives is easier than most other types of intelligence.
Trying to develop an AI that doesn't understand concepts like "be good, don't hurt people, value wisdom" (at least understand them as well as any human does) is just as difficult as developing AI that won't upset Chinese censors.
I don't read that from that sentence. There's some scepticism implied that the abstract concept will be interpreted in the *same* way, but the point actually made is that even if you assume that it's communicated clearly it still incentivises taking over the world because *every* goal is easier if you take over the world.
Just tell it to respect human agency. Done.
We still have to be concerned about Ultron-style "I have decided via philosophical reasoning that the best thing for humanity is to enslave/murder it" disasters, but I don't think "it has no idea that tiling the universe with paperclips is dumb" is a likely failure mode any longer. Just instruct it to make sure its actions are always morally defensible and in the best interest of humanity.
To put it precisely: it's not nearly as intractable as it once seemed to design AI so that at least whatever bad thing it does will be based on a human-coherent set of values.
In particular, that makes the "well maybe if they're so smart they'd deserve to replace us" line of reasoning much more defensible. The paperclip maximizer is a reductio ad absurdum of that argument. Since they *can* reason like humans, we can (with high probability, assuming we really try to do so) avoid at least the most absurd scenarios.
Re: Big Yud's troubles explaining how smarter AIs are better at hiding the Shoggoth. I think he's unable to explain this clearly because he's stuck in the realm of logical exposition (nerdspeak), when a social metaphor will get the point across to most people much more succinctly.
Could someone please tweet at him the following, or some equivalent, so it gets popularized?
"Just as smarter people are much more convincing liars and are better at hiding their deceit, as AIs get smarter they will become more capable of hiding their true values and intentions, and at being deceitful."
I don't use Twitter, so please popularize this metaphor.
At least OpenAI waited until the end of your vacation to release GPT-4: https://openai.com/research/gpt-4
Writing same general note I wrote on the Wordpress version: I hope to respond further to a lot of you, but that will need to wait until I can do so within a future post, I have too much of a backlog right now and if I have good answers to good questions I want to not have to duplicate them. Sorry.