Basic emergency preparedness is always sensible but I don't think there's any reasonable prep almost anyone could take for the worst case scenarios of a runaway unaligned AI. It's generally difficult to maintain a 'short position' with respect to 'civilization as a whole'.
I think that's pretty implausible, if only because the software that already handles that stuff exists, is 'dumb', and I don't think a 'runaway' AI would bother with it anyways.
There's lots of weird and terrible things that people might (and will) use AI to do, but what you describe is more like what people have already been doing when they can get away with it. I'm very sure AIs on their own will be much weirder that the kidney of thing you mention.
I am debating exactly how I will be responding to Tyler in AI #5 - it does seem like a great clean example of refusal to engage with the questions at all. Ironically I'll also be seeing him this weekend.
He is in "it's a tool" mode. He is classing it in the same way one might class fire or a nuke.
Many smart people are still in "it's a tool" mode because by-and-large that is exactly what it looks like when you use it a certain way (say, for helping you code a some stupid 2D thing in C# on Unity). However, this is a tool that can "go agentic".
Convince him it can act "as though" it has a will, and he might at least consider expanding his view beyond the tool paradigm. If he is unconvinced that GPT-4 can manifest any agency whatsoever, ask him to consider what might happen when LLMs are expanded with additional capacities, such as long-term memory, spatial planning, and the ability to interact with RNN training code.
This seems really important! Tyler is such a smart and important figure (and like it or not you are becoming one as well), I hope you are able to get an honest, good faith and long-term discussion started with him!
I have a deep respect for you as a clear thinker and have taken the AI alignment arguments very seriously over the last year and read everything there is to read on LW and elsewhere. I took your advice to think for myself as much as possible to heart and have ended up independently somewhere closest to Tyler’s most recent post.
I’m not Tyler but I don’t feel like I’ve refused to engage with the questions, so at least for me I would not find that framing persuasive. In general I am persuaded that the basic case for AI risk is one very possible outcome, so I am sympathetic to that point of view, but I have not yet found a framing that convinces me to give it very significant probability weight vs other described (and more importantly undescribed) possibilities.
I think the argument Tyler lays out is very compelling — basically people are very bad at predicting things that have never happened, especially radically new and world altering things, and so any very specific prediction about a specific chain of outcomes needs overwhelmingly strong logical support in order to be a likely specific result. Maybe I’m poorly calibrated here but I don’t think the basic AI risk case has met that level of support, although it is a strong argument. It makes it extra hard that the argued FOOM and the lack of a fire alarm mean that the argument is unfalsifiable until we all drop dead from nanobots.
Tyler, as I see it, is arguing that predictions are hard especially about the future, we are bad at them, thus QED a fully general argument against being able to make any predictions, except new techs have tended to be good.
On the meta level it engages, then concludes that one cannot think for oneself and make useful predictions at all. I mean, that's the broader question here - he's all-in on the argument, see Stubborn Attachments.
Your view is probably right for what Tyler is saying — and if so I also disagree with him on the implication that predictions are useless. Although I do think they are extraordinarily hard past a certain level of disruption.
Personally I think while each logical step is within a 20-50% likelihood range, there’s enough sieve at every point to make the overall argument not pass the ridiculously high bar it would need to for me. To give an example, just the superintelligence piece alone seems like it could go a number of ways (higher intelligence is incoherent or physical experimentation is a major scientific limiter or intelligence works differently than you might imagine or etc etc). I don’t think any of these are particularly likely but taken as a whole they seem plausible to me compared to any specific explanation.
If there was evidence that suggested that, no, Yudkowsky’s view of superintelligence is actually >99% (and so on for each step of the chain) then I would update significantly. I am not Tyler though so maybe you are right and there’s nothing that would make him update.
Didn't say Tyler would never update, more that the disagreement is on epistemics in general and whether people can predict and know things. Which is... a hard thing to change someone's mind on. I mean, if anyone can do it, right?
I too disagree with Yudkowsky on probabilities here. I think you've identified something important here. You think of the Yudkowskian doom as 'all these 5+ things have to be right or we won't be doomed' whereas I see quite the opposite, a very robust, overdetermined doom that follows for a wide variety of importantly different AI systems and capabilities and dynamics. To me this is not a conjunction that we have to deal with, it's Kettle Logic.
E.g. if I asked you to list all the distinct steps where you feel like you can give me *all but one* of them and we're almost certainly still fine, but if ALL of them fail then yeah we're in deep trouble, I do not expect those scenarios to give me comfort. Could be an interesting experiment (I encourage others to try this as well).
I think it is strawmanning Tyler to say he claims predictions are hard in general. He is claiming that you cannot predict *AI* outcomes. You are not capable of putting a usable % chance on it. All arguments that conclude something like "therefore there is a 1-10% chance that AI does x y z" are wrong.
Since there are so many different possible outcomes, and the "ai goes foom and all humans die" outcome is just one of an extremely large number of different possibilities, you should not use this one possibility of one particular variety of disaster as the anchor to plan everything around.
Definitely. But it’s all relative. Our current pace of progress seems very safe to me, and I think AI is net good, so I think we should speed up as much as possible.
"We continue to build everything related to AI in Python, almost as if we want to die, get our data stolen and generally not notice that the code is bugged and full of errors."
Absolutely suicidal. And Zapier-type integration is bordering on "opening the seventh seal" territory. The bigger problem is that the defrauded will cry out for government regulators, when nothing short of panopticon totalitarianism will be able to regulate.
To be clear, I don't envision the "panopticon totalitarian" approach actually panning out in the long run. I just mean to say that's what will be advocated for and potentially tried. And the kind of control regime I envision, when peering through their wild eyes, goes beyond any version of prison camp or even science fiction dystopia in scope and depth. It would be akin to death in many ways (and the literal form would swiftly follow, because, yes, indeed, they will not ultimately be able to regulate or contain the monster once unleashed).
Alarming stuff! As always, I love that the way to jailbreak LLMs is to just say ‘awwww, but come on, pleassseeeee do (the awful thing you should never do)’ - maybe twice. Isn’t part of the problem that we can’t ever fully understand what’s going inside these models?
Another potential new risk with this mode of usage, similar to
> 3. We could all get into very bad habits this way.
If GPT is, in its normal mode of operation, making encrypted API calls to every major compute system in the world, and initiating large portions of the computation (say this plug-in system is popular and takes off), it could be really hard to spot if it makes a sharp right turn. Vs the world where it’s just sitting behind a chat UI on OpenAI’s server farm. (Much easier to spot a massive spike of network requests coming from OpenAI vs. some shift in the global traffic patterns that already exist.)
A possible amelioration here is that the sharp right turn will (as we currently understand it) require lots of GPUs for self-training. But I think you’d assume that lots of generic CPU would also be useful for whatever else the AGI wants to do as part of its takeover. And I’ve already seen startups looking at using GPT to generate AWS API calls to make infra easier. (Is it a sign of the times that I find myself editing out examples of concrete strategies because I don’t want GPT to read them? Infohazard got real very quickly.)
1. we think GPT-N is aligned (at least, not trying to kill everything, even if it is sometimes un-woke)
2. a new version is deployed that gains some capabilities, and has a dramatic shift towards un-aligned-ness (I think this is what "sharp left turn" refers to)
3. since it's smart, it hides its new capabilities or at least motivations, and decides on a plan to take over the world
4 this probably involves self-amplification, since why wouldn't you in this scenario.
I see that self-amplification is not core to the "sharp left turn" formulation, but it seems like an obvious corollary if the turn happens before god-like capabilities already exist?
I didn't read "sharp left turn" to require us to somehow already have "solved alignment", just that these systems are currently "aligned enough" for their capability level, and the sharp left turn represents a rapid change from "safe enough" to "existentially dangerous". Again, maybe I'm mis-reading the terminology, in which case apologies for the confusion.
No worries! The terminology is dense, and numerous!
I think [1] isn't the consensus – in the AI alignment 'community'. If anything it's more like an anti-consensus, i.e. GPT-N is NOT aligned. It's not (obviously) 'dangerous' because it's not sufficiently capable. (I'm pretty skeptical about this – that it's not 'eventually pretty dangerous'. I think it's more like not-immediately dangerously capable.)
In terms similar to what you're describing, the 'sharp left turn' is one of two scenarios:
1. GPT-N is aligned but GPT-{N+1} isn't.
2. GPT-N is – or seems to be – aligned – _during testing_ – but, 'in deployment', isn't in fact aligned.
I think [2] is the more typical sense?
The danger isn't quite "take over the world" – or that's not quite how I'd describe it anyways. It's more that, given a 'dangerously capable' AI that's NOT aligned, we should expect significant negative consequences (relative to 'human values') for any significant acts it either performs itself or enables indirectly. 'Instrumental convergence' covers why it would plausibly 'take over the world'.
"Self-amplification" ('foom') is its own orthogonal danger, tho it would make all of the other dangers MUCH MUCH worse (if not 'immediately-or-pretty-immediately deadly').
> I didn't read "sharp left turn" to require us to somehow already have "solved alignment", just that these systems are currently "aligned enough" for their capability level, and the sharp left turn represents a rapid change from "safe enough" to "existentially dangerous". Again, maybe I'm mis-reading the terminology, in which case apologies for the confusion.
If we _were_ able to solve alignment, then we could (at least plausibly) prevent a 'sharp left turn'. A big component of solving alignment is precisely the preservation of alignment in AIs that 'scale' or otherwise improve their capabilities (via e.g. 'online learning' [machine learning sense]). More generally, alignment is or should be, ideally, VERY robust. We wouldn't want an aligned AI to make a 'sharp left turn' because, e.g. a cosmic ray flipped a single bit in some memory the AI is storing its 'values' in.
But I think the prototypical 'sharp left turn' more involves _some_ initial or prior AI that _is_ actually somewhat aligned. I don't think any of the existing GPT-Ns quite rise to the level of even being 'aligned even a little bit'. But that seems reasonably arguable, i.e. 'fine tuning' and RLHF _do_ seem to maybe do something kinda like alignment (at some 'level').
Thanks! I suspect that my indexing on “how aligned is GPT?” muddied the waters a bit.
All I really am proposing is that in the “defect & power grab by self-amplification” scenario, a less-legible global traffic pattern is substantially more dangerous than a more-legible one.
It doesn’t even need to be a conscious evil plan, or a secret one for that matter, for this effect to be substantial. Quickly trying to amass paperclips would also do it. (Though of course, the lower the initial power level, the more relatively scary a “secretly amass more power” strategy is.)
FWIW on the object level claims above, I was under the impression that Zvi & some others wanted to reserve “un-aligned” for actually killing everyone, eg the second-most upvoted comment on https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=79WbHFREADDJnBxYS#comments or Zvi’s post on the subject. I’m in the camp that says for political coalition-building, the most will get done if alignment/safety means “is predictable enough to not get fired if they were an employee”, which is a way higher bar than “doesn’t kill anyone or have a genuine intention to do so” and surely involves the solving the same basic problems in the medium term. Under the broad definition, I agree Bing is comically mis-aligned, and think you could argue the case either way on ChatGPT given the “I’m a beta” leeway.
One thing I'm confused about is: Are you worried about the current generation of models being dangerous? Or are you worried about how the behavior by OpenAI here makes you worry they won't be responsible in the future?
I think the GPT-4 plugin system is a good idea. It's limited enough so that there's pretty clearly no danger of "foom" from here. What can go wrong in the next month that isn't stoppable by humans? Worst case OpenAI is going to be able to pull the plug. But it's still possible that there's some abuse of the system. The ideal outcome is that some hackers do manage to cause some trouble, and that OpenAI learns & shares from that experience, so we can all design better systems.
It's starting to seem to me like "alignment" is the wrong metaphor for safety. We aren't going to have a world where only a few elites have access to AI. And there are so many people trying to cause all sorts of trouble when they do get AI access. There is going to be no shortage of humans with AI access who would like to commit huge crimes and break things. We need "restriction" - the ability to be sure that a combined AI + human system is unable to do X, where X = .... I dunno, hack computers, fly drones, manufacture viruses, the sort of thing it would be dangerous if a group of very intelligent terrorists were doing.
Basic emergency preparedness is always sensible but I don't think there's any reasonable prep almost anyone could take for the worst case scenarios of a runaway unaligned AI. It's generally difficult to maintain a 'short position' with respect to 'civilization as a whole'.
I didn't think you were unclear, but I also don't think there's any *digital* prep worth doing either!
What kind of 'disaster scenario' are you imagining?
I think that's pretty implausible, if only because the software that already handles that stuff exists, is 'dumb', and I don't think a 'runaway' AI would bother with it anyways.
There's lots of weird and terrible things that people might (and will) use AI to do, but what you describe is more like what people have already been doing when they can get away with it. I'm very sure AIs on their own will be much weirder that the kidney of thing you mention.
Yep, safe. And according to Tyler Cowen (~8h ago) we’re just “arguing ourselves into existential risk from AI being a major concern”.
*chuckles* I’m in danger
I am debating exactly how I will be responding to Tyler in AI #5 - it does seem like a great clean example of refusal to engage with the questions at all. Ironically I'll also be seeing him this weekend.
He is in "it's a tool" mode. He is classing it in the same way one might class fire or a nuke.
Many smart people are still in "it's a tool" mode because by-and-large that is exactly what it looks like when you use it a certain way (say, for helping you code a some stupid 2D thing in C# on Unity). However, this is a tool that can "go agentic".
Convince him it can act "as though" it has a will, and he might at least consider expanding his view beyond the tool paradigm. If he is unconvinced that GPT-4 can manifest any agency whatsoever, ask him to consider what might happen when LLMs are expanded with additional capacities, such as long-term memory, spatial planning, and the ability to interact with RNN training code.
This seems really important! Tyler is such a smart and important figure (and like it or not you are becoming one as well), I hope you are able to get an honest, good faith and long-term discussion started with him!
I have a deep respect for you as a clear thinker and have taken the AI alignment arguments very seriously over the last year and read everything there is to read on LW and elsewhere. I took your advice to think for myself as much as possible to heart and have ended up independently somewhere closest to Tyler’s most recent post.
I’m not Tyler but I don’t feel like I’ve refused to engage with the questions, so at least for me I would not find that framing persuasive. In general I am persuaded that the basic case for AI risk is one very possible outcome, so I am sympathetic to that point of view, but I have not yet found a framing that convinces me to give it very significant probability weight vs other described (and more importantly undescribed) possibilities.
I think the argument Tyler lays out is very compelling — basically people are very bad at predicting things that have never happened, especially radically new and world altering things, and so any very specific prediction about a specific chain of outcomes needs overwhelmingly strong logical support in order to be a likely specific result. Maybe I’m poorly calibrated here but I don’t think the basic AI risk case has met that level of support, although it is a strong argument. It makes it extra hard that the argued FOOM and the lack of a fire alarm mean that the argument is unfalsifiable until we all drop dead from nanobots.
Tyler, as I see it, is arguing that predictions are hard especially about the future, we are bad at them, thus QED a fully general argument against being able to make any predictions, except new techs have tended to be good.
On the meta level it engages, then concludes that one cannot think for oneself and make useful predictions at all. I mean, that's the broader question here - he's all-in on the argument, see Stubborn Attachments.
Your view is probably right for what Tyler is saying — and if so I also disagree with him on the implication that predictions are useless. Although I do think they are extraordinarily hard past a certain level of disruption.
Personally I think while each logical step is within a 20-50% likelihood range, there’s enough sieve at every point to make the overall argument not pass the ridiculously high bar it would need to for me. To give an example, just the superintelligence piece alone seems like it could go a number of ways (higher intelligence is incoherent or physical experimentation is a major scientific limiter or intelligence works differently than you might imagine or etc etc). I don’t think any of these are particularly likely but taken as a whole they seem plausible to me compared to any specific explanation.
If there was evidence that suggested that, no, Yudkowsky’s view of superintelligence is actually >99% (and so on for each step of the chain) then I would update significantly. I am not Tyler though so maybe you are right and there’s nothing that would make him update.
Didn't say Tyler would never update, more that the disagreement is on epistemics in general and whether people can predict and know things. Which is... a hard thing to change someone's mind on. I mean, if anyone can do it, right?
I too disagree with Yudkowsky on probabilities here. I think you've identified something important here. You think of the Yudkowskian doom as 'all these 5+ things have to be right or we won't be doomed' whereas I see quite the opposite, a very robust, overdetermined doom that follows for a wide variety of importantly different AI systems and capabilities and dynamics. To me this is not a conjunction that we have to deal with, it's Kettle Logic.
E.g. if I asked you to list all the distinct steps where you feel like you can give me *all but one* of them and we're almost certainly still fine, but if ALL of them fail then yeah we're in deep trouble, I do not expect those scenarios to give me comfort. Could be an interesting experiment (I encourage others to try this as well).
I think it is strawmanning Tyler to say he claims predictions are hard in general. He is claiming that you cannot predict *AI* outcomes. You are not capable of putting a usable % chance on it. All arguments that conclude something like "therefore there is a 1-10% chance that AI does x y z" are wrong.
Since there are so many different possible outcomes, and the "ai goes foom and all humans die" outcome is just one of an extremely large number of different possibilities, you should not use this one possibility of one particular variety of disaster as the anchor to plan everything around.
This is like, word for word what EY says when people try to pin him down on specific numbers or timelines. And yet EY is very very worried.
It's very different. EY thinks that the AI foom scenario is important. TC does not think the AI foom scenario is very important.
I’d be happy if we could simply agree that AGI might be extremely dangerous, and thus the prudent thing to do is proceed slowly and carefully.
Definitely. But it’s all relative. Our current pace of progress seems very safe to me, and I think AI is net good, so I think we should speed up as much as possible.
"We continue to build everything related to AI in Python, almost as if we want to die, get our data stolen and generally not notice that the code is bugged and full of errors."
Absolutely suicidal. And Zapier-type integration is bordering on "opening the seventh seal" territory. The bigger problem is that the defrauded will cry out for government regulators, when nothing short of panopticon totalitarianism will be able to regulate.
To be clear, I don't envision the "panopticon totalitarian" approach actually panning out in the long run. I just mean to say that's what will be advocated for and potentially tried. And the kind of control regime I envision, when peering through their wild eyes, goes beyond any version of prison camp or even science fiction dystopia in scope and depth. It would be akin to death in many ways (and the literal form would swiftly follow, because, yes, indeed, they will not ultimately be able to regulate or contain the monster once unleashed).
Alarming stuff! As always, I love that the way to jailbreak LLMs is to just say ‘awwww, but come on, pleassseeeee do (the awful thing you should never do)’ - maybe twice. Isn’t part of the problem that we can’t ever fully understand what’s going inside these models?
Another potential new risk with this mode of usage, similar to
> 3. We could all get into very bad habits this way.
If GPT is, in its normal mode of operation, making encrypted API calls to every major compute system in the world, and initiating large portions of the computation (say this plug-in system is popular and takes off), it could be really hard to spot if it makes a sharp right turn. Vs the world where it’s just sitting behind a chat UI on OpenAI’s server farm. (Much easier to spot a massive spike of network requests coming from OpenAI vs. some shift in the global traffic patterns that already exist.)
A possible amelioration here is that the sharp right turn will (as we currently understand it) require lots of GPUs for self-training. But I think you’d assume that lots of generic CPU would also be useful for whatever else the AGI wants to do as part of its takeover. And I’ve already seen startups looking at using GPT to generate AWS API calls to make infra easier. (Is it a sign of the times that I find myself editing out examples of concrete strategies because I don’t want GPT to read them? Infohazard got real very quickly.)
What's the "sharp right turn"? 'Sharp left turn' refers to an AI suddenly and unexpectedly behaving unaligned – *after* seeming to be aligned.
From the context (“require lots of GPUs for self-training), I’d guess it means rapid recursive self-improvement?
Hmmm – I'd expect them to have used 'foom' for that.
It's particularly confusing because 'left turn' is definitely a thing, and I *think* what they meant too: https://www.alignmentforum.org/tag/sharp-left-turn
My bad; I mis-remembered the terminology. https://www.lesswrong.com/tag/sharp-left-turn is what I was going for. By the confusion, perhaps I'm still mis-using it.
Un-packing what I was intending there:
1. we think GPT-N is aligned (at least, not trying to kill everything, even if it is sometimes un-woke)
2. a new version is deployed that gains some capabilities, and has a dramatic shift towards un-aligned-ness (I think this is what "sharp left turn" refers to)
3. since it's smart, it hides its new capabilities or at least motivations, and decides on a plan to take over the world
4 this probably involves self-amplification, since why wouldn't you in this scenario.
I see that self-amplification is not core to the "sharp left turn" formulation, but it seems like an obvious corollary if the turn happens before god-like capabilities already exist?
I didn't read "sharp left turn" to require us to somehow already have "solved alignment", just that these systems are currently "aligned enough" for their capability level, and the sharp left turn represents a rapid change from "safe enough" to "existentially dangerous". Again, maybe I'm mis-reading the terminology, in which case apologies for the confusion.
No worries! The terminology is dense, and numerous!
I think [1] isn't the consensus – in the AI alignment 'community'. If anything it's more like an anti-consensus, i.e. GPT-N is NOT aligned. It's not (obviously) 'dangerous' because it's not sufficiently capable. (I'm pretty skeptical about this – that it's not 'eventually pretty dangerous'. I think it's more like not-immediately dangerously capable.)
In terms similar to what you're describing, the 'sharp left turn' is one of two scenarios:
1. GPT-N is aligned but GPT-{N+1} isn't.
2. GPT-N is – or seems to be – aligned – _during testing_ – but, 'in deployment', isn't in fact aligned.
I think [2] is the more typical sense?
The danger isn't quite "take over the world" – or that's not quite how I'd describe it anyways. It's more that, given a 'dangerously capable' AI that's NOT aligned, we should expect significant negative consequences (relative to 'human values') for any significant acts it either performs itself or enables indirectly. 'Instrumental convergence' covers why it would plausibly 'take over the world'.
"Self-amplification" ('foom') is its own orthogonal danger, tho it would make all of the other dangers MUCH MUCH worse (if not 'immediately-or-pretty-immediately deadly').
> I didn't read "sharp left turn" to require us to somehow already have "solved alignment", just that these systems are currently "aligned enough" for their capability level, and the sharp left turn represents a rapid change from "safe enough" to "existentially dangerous". Again, maybe I'm mis-reading the terminology, in which case apologies for the confusion.
If we _were_ able to solve alignment, then we could (at least plausibly) prevent a 'sharp left turn'. A big component of solving alignment is precisely the preservation of alignment in AIs that 'scale' or otherwise improve their capabilities (via e.g. 'online learning' [machine learning sense]). More generally, alignment is or should be, ideally, VERY robust. We wouldn't want an aligned AI to make a 'sharp left turn' because, e.g. a cosmic ray flipped a single bit in some memory the AI is storing its 'values' in.
But I think the prototypical 'sharp left turn' more involves _some_ initial or prior AI that _is_ actually somewhat aligned. I don't think any of the existing GPT-Ns quite rise to the level of even being 'aligned even a little bit'. But that seems reasonably arguable, i.e. 'fine tuning' and RLHF _do_ seem to maybe do something kinda like alignment (at some 'level').
Thanks! I suspect that my indexing on “how aligned is GPT?” muddied the waters a bit.
All I really am proposing is that in the “defect & power grab by self-amplification” scenario, a less-legible global traffic pattern is substantially more dangerous than a more-legible one.
It doesn’t even need to be a conscious evil plan, or a secret one for that matter, for this effect to be substantial. Quickly trying to amass paperclips would also do it. (Though of course, the lower the initial power level, the more relatively scary a “secretly amass more power” strategy is.)
FWIW on the object level claims above, I was under the impression that Zvi & some others wanted to reserve “un-aligned” for actually killing everyone, eg the second-most upvoted comment on https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=79WbHFREADDJnBxYS#comments or Zvi’s post on the subject. I’m in the camp that says for political coalition-building, the most will get done if alignment/safety means “is predictable enough to not get fired if they were an employee”, which is a way higher bar than “doesn’t kill anyone or have a genuine intention to do so” and surely involves the solving the same basic problems in the medium term. Under the broad definition, I agree Bing is comically mis-aligned, and think you could argue the case either way on ChatGPT given the “I’m a beta” leeway.
One thing I'm confused about is: Are you worried about the current generation of models being dangerous? Or are you worried about how the behavior by OpenAI here makes you worry they won't be responsible in the future?
Well... Things escalate so quickly. Almost like agi is already on a self reinforcement runaway loop. 🤣? Or 😟
In any case the cat is out of the box . Time will tell if llms with all the abilities they can digest remain "just a tool" or become something bigger
I think the GPT-4 plugin system is a good idea. It's limited enough so that there's pretty clearly no danger of "foom" from here. What can go wrong in the next month that isn't stoppable by humans? Worst case OpenAI is going to be able to pull the plug. But it's still possible that there's some abuse of the system. The ideal outcome is that some hackers do manage to cause some trouble, and that OpenAI learns & shares from that experience, so we can all design better systems.
It's starting to seem to me like "alignment" is the wrong metaphor for safety. We aren't going to have a world where only a few elites have access to AI. And there are so many people trying to cause all sorts of trouble when they do get AI access. There is going to be no shortage of humans with AI access who would like to commit huge crimes and break things. We need "restriction" - the ability to be sure that a combined AI + human system is unable to do X, where X = .... I dunno, hack computers, fly drones, manufacture viruses, the sort of thing it would be dangerous if a group of very intelligent terrorists were doing.
better get my team across the API and pitch total integration to management before some other pack of idiots does