Maybe important to note that Joshua Achiam seems to be conflating the gray goo story with Eliezer's nanotech AI takeover scenario.
"even if it does not create exotic nanotech gray goo Eliezerdoom"
Eliezer's scenario is not the same as gray goo, seems like many people just hear about some corrupted version of Eliezer's arguments and then conclude that it must be false. For the record, Eliezer believes that AGI might initially bootstrap from something like protein folding, but Eliezer also believes that it will build a dyson sphere and capture all the light from the sun and spread through the universe. Gray go is just some self-replicating nanotech that is not intelligent but just consumes everything on earth.
Paragraph copied twice: "I think it is a tangible obstacle to success for the agenda of many alignment researchers. you often seem like you don't know what you are actually trying to protect. this is why so many alignment research agendas come across as incredibly vague and underspecified."
I discuss his clarification of it, which reiterates the claim. The original post was a little vague and was in the context of an FTX thread, so I figured better to use the version that made his exact claims more clear.
> Do people ‘notice’ that you are insufficiently focused on these questions? Oh, sure. They notice that you are not focused on those political fights and arguments.
The political fight over values isn't what I read Joshua as pointing at in the excerpted thread. He seems like he's making a more vibes-based assertion that alignment people aren't out there living life, or else out there living life in the way most people think is groovy.
(I feel the shadow of this point, personally. One of the reasons I like DWATV is that you take the time to continue to care about Magic, and games generally, and film, and other hobby pursuits; it feels in some lizard-brain sense like you "remember how to have fun."
This is a lazy conclusion; even glancing at other big Rat authors, Eliezer and Scott obviously have fun with it in their fiction, but they spend their weirdness points more obviously.)
For example:
> A lot of alignment researchers seem to live highly out-of-distribution lives, with ideas and ideals that reject much of what "human values" really has to offer. Feels incongruous. People notice this.
I think the OOD-ness is a real thing, and I think probably some people do consider it incongruous; I disagree about the incongruity and think the line of argument he pursues regarding it is too vibes-based for its own good.
(Counterargument sketch: "the veteran stoner and the author of the Great American Novel each know a great truth about human values, but their perspectives are myopic; same as an expert pilot can't drive a race car or a chemist might flounder in quantum physics. The thing we need to know to align AI is not learned via the traditional human ways of understanding our values better.")
But, I think he at least makes the line of argument and it's worth explicitly objecting to.
I feel like you might be overrating "general best safety practices" in other industries. Consider nuclear submarines, for example. General Dynamics more or less has a monopoly here. A simple bug in the software could be disastrous - you can imagine accidentally launching a nuclear attack, but also bugs elsewhere in the system, like bugs in communication systems that lead people to make decisions, or systems that can be hacked by an adversary, or a privilege escalation from a Snowden type internal adversary.
But, it's not like General Dynamics uses very high quality safety practices. They use just about the same "government software engineering" safety practices that are used elsewhere. And do those practices actually make software engineering safer? Many software engineers think that government safety policies are overall a net negative. They can lead to things like reliance on CrowdStrike, practices that check the checkboxes but detract from real safety.
To me, the most underrated category of existential risk is that a nuclear war is started, more or less unintentionally. Another case like the Stanislav Petrov incident, except this time we don't get so lucky.
"Indeed, I would go further. The market wants the AIs to be given as much freedom and authority as possible, to send them out to compete for resources and influence generally, for various ultimate purposes. And the outcome of those clashes and various selection effects and resource competitions, by default, dooming us."
I agree, and share this intuition, but I think you should make it as explicit and clear as possible, ideally in its own post that can be linked to later.
This is not great phrasing, but as a first pass I would put it like this: You have replicators that are both (a) more effective at everything relevant than humans and (b) can be replicated trivially (vs humans taking decades to do extremely lossy replication). The first part means people lose meaningful control, because the AI makes better decisions ~everywhere. The second part creates a Malthusian situation where any resources that might be used for people can be better (from the perspective of competing with other AIs for resources) used for further replication (unlike human reproduction, which is slow enough to be outpaced by post-Industrial economic growth).
I didn't have any particular issue with the original statements - it's an easy mistake to make, given the company involved - but appreciate the thoroughness of making even apologetic corrections and retractions heroic-strength. If only other mea culpas were as thorough.
Some irony at "basic safety practices are all you need!" -> OpenAI helps kill anodyne basic safety bill SB 1047 + RSP handwaviness + grudging cooperation with AISI etc. I'm sure moving away from nonprofit status will help correct these flaws.
Maybe important to note that Joshua Achiam seems to be conflating the gray goo story with Eliezer's nanotech AI takeover scenario.
"even if it does not create exotic nanotech gray goo Eliezerdoom"
Eliezer's scenario is not the same as gray goo, seems like many people just hear about some corrupted version of Eliezer's arguments and then conclude that it must be false. For the record, Eliezer believes that AGI might initially bootstrap from something like protein folding, but Eliezer also believes that it will build a dyson sphere and capture all the light from the sun and spread through the universe. Gray go is just some self-replicating nanotech that is not intelligent but just consumes everything on earth.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/joshua-achiam-public-statement-analysis
Thanks as always for the deep dive.
Paragraph copied twice: "I think it is a tangible obstacle to success for the agenda of many alignment researchers. you often seem like you don't know what you are actually trying to protect. this is why so many alignment research agendas come across as incredibly vague and underspecified."
I'm confused that you don't discuss the tweets that kicked this whole storm off: P(Misaligned AGI doom by 2032): <1e-6%
I discuss his clarification of it, which reiterates the claim. The original post was a little vague and was in the context of an FTX thread, so I figured better to use the version that made his exact claims more clear.
The claims are somewhat different - for a start, one is <1e-6 while the other is two orders of magnitude more confident at <1e-6%. But that's fair.
> Do people ‘notice’ that you are insufficiently focused on these questions? Oh, sure. They notice that you are not focused on those political fights and arguments.
The political fight over values isn't what I read Joshua as pointing at in the excerpted thread. He seems like he's making a more vibes-based assertion that alignment people aren't out there living life, or else out there living life in the way most people think is groovy.
(I feel the shadow of this point, personally. One of the reasons I like DWATV is that you take the time to continue to care about Magic, and games generally, and film, and other hobby pursuits; it feels in some lizard-brain sense like you "remember how to have fun."
This is a lazy conclusion; even glancing at other big Rat authors, Eliezer and Scott obviously have fun with it in their fiction, but they spend their weirdness points more obviously.)
For example:
> A lot of alignment researchers seem to live highly out-of-distribution lives, with ideas and ideals that reject much of what "human values" really has to offer. Feels incongruous. People notice this.
I think the OOD-ness is a real thing, and I think probably some people do consider it incongruous; I disagree about the incongruity and think the line of argument he pursues regarding it is too vibes-based for its own good.
(Counterargument sketch: "the veteran stoner and the author of the Great American Novel each know a great truth about human values, but their perspectives are myopic; same as an expert pilot can't drive a race car or a chemist might flounder in quantum physics. The thing we need to know to align AI is not learned via the traditional human ways of understanding our values better.")
But, I think he at least makes the line of argument and it's worth explicitly objecting to.
Loved the discussion about CEV. Been reading your work for a long time but the Values sequence was before my time.
I feel like you might be overrating "general best safety practices" in other industries. Consider nuclear submarines, for example. General Dynamics more or less has a monopoly here. A simple bug in the software could be disastrous - you can imagine accidentally launching a nuclear attack, but also bugs elsewhere in the system, like bugs in communication systems that lead people to make decisions, or systems that can be hacked by an adversary, or a privilege escalation from a Snowden type internal adversary.
But, it's not like General Dynamics uses very high quality safety practices. They use just about the same "government software engineering" safety practices that are used elsewhere. And do those practices actually make software engineering safer? Many software engineers think that government safety policies are overall a net negative. They can lead to things like reliance on CrowdStrike, practices that check the checkboxes but detract from real safety.
To me, the most underrated category of existential risk is that a nuclear war is started, more or less unintentionally. Another case like the Stanislav Petrov incident, except this time we don't get so lucky.
"Indeed, I would go further. The market wants the AIs to be given as much freedom and authority as possible, to send them out to compete for resources and influence generally, for various ultimate purposes. And the outcome of those clashes and various selection effects and resource competitions, by default, dooming us."
I agree, and share this intuition, but I think you should make it as explicit and clear as possible, ideally in its own post that can be linked to later.
This is not great phrasing, but as a first pass I would put it like this: You have replicators that are both (a) more effective at everything relevant than humans and (b) can be replicated trivially (vs humans taking decades to do extremely lossy replication). The first part means people lose meaningful control, because the AI makes better decisions ~everywhere. The second part creates a Malthusian situation where any resources that might be used for people can be better (from the perspective of competing with other AIs for resources) used for further replication (unlike human reproduction, which is slow enough to be outpaced by post-Industrial economic growth).
I didn't have any particular issue with the original statements - it's an easy mistake to make, given the company involved - but appreciate the thoroughness of making even apologetic corrections and retractions heroic-strength. If only other mea culpas were as thorough.
Some irony at "basic safety practices are all you need!" -> OpenAI helps kill anodyne basic safety bill SB 1047 + RSP handwaviness + grudging cooperation with AISI etc. I'm sure moving away from nonprofit status will help correct these flaws.