I basically agree, with the important caveat that humans don't agree on ethics. Two countries are in a war. Is it ethical or unethical for an AI to help with combat strategy? How about to support higher industrial production for one country or the other? To fight for them directly? What if they're being massacred by the other side, who happen to also be space Nazis?
Two countries may at some point in the future be at war. Does the AI then have to refrain from getting involved in advance? What chance of war justifies this? And will the ethical core be discarded or modified if the country leadership disagrees on the probability of war, either higher (we have to win) or lower (it's fine, nothing's going to happen)?
Is capitalism ethical? (I would generally argue yes on balance, but lots of people disagree.) Eating meat? Doing anything other than creating more humans to increase utility?
So yeah, I think alignment per se doesn't really make sense. I suppose the closest we could do is to have AI avoid anything in the debatable zone around anything actually unethical. Which I guess is a pretty decent outcome, all things considered.
Huh, I haven't heard that framing yet, but I'm in a bubble. Is the argument that the reason to not release your model is to make money and control people instead of just giving them their own AI? I guess it seems close to George Hotz' position
My read is that this sentiment is a very narrow-cast view that very few people hold, but not none.
Open Source people are very loud and fervent, but there are not actually very many of them - and here I felt even most of them supported Sama, given what the alternative was.
I appreciate that confusion. I've tried various approaches to explaining various aspects, and different people question different parts of it. I'll try the lowest-shock-level version I can think of, tell me if this helps?
The question 'what literal physical actions will be taken' is not, in my mind, an important one. If you presume that the AIs will be sufficiently capable and intelligent that they are given or take control of the future, most actions they could take, most choices for arranging the atoms of local space, kill us. Most goals or functions, when optimized, don't involve any humans surviving. Some obvious things they might do that don't even intend to kill us are vent the atmosphere (to save on cooling costs), building a Dyson sphere (kills us by default). More pedestrianly, if they are sent out to compete with each other for resources and power and are better at it than us, they win that game and we lose it.
The shortest version that would convince me is: "You are going to introduce a group of more intelligent entities than humans, that are more efficient than us and can copy themselves, and let them loose, why should you expect to come out and top or stick around for very long?''
I'm pretty sure you're being facetious here with the magnets, but since you do seem to be asking about specific ways the AI might kill us all, the actual "how" that would even be possible, I'll note that magnets to scramble the electronics of computing hardware as we know it would not protect us from: 1) micro-drones sniping you from a hundred yards away; 2) any old land-based killbot stabbing you from two feet away; 3) a highly virulent, lethal, and transmissible engineered virus; 4) bio-based nanomachines constructed of proteins; 4) a diverted asteroid of sufficient size (or coordinated number) to destroy the entire planet's ecosystem; 5) a blanketing of the world with existing nuclear weapons that have merely been taken away from human control; 6) etc., etc., etc..
Asking "how" something smarter than you might kill you is, I guess, helpful to pump your intuition and actually start believing something you've never experienced could possibly be real, but it's really a wrong question. A stone age tech level Amazonian tribe would not be able to imagine their entire village being destroyed by a missile strike from a drone they never even saw, controlled by a human being five thousand miles away, but the specifics are not really relevant. The first world could exterminate that tribe in dozens of ways if we chose to, none of them involving the highest level of technology that that tribe is currently capable of imagining. What that tribe really needs to understand is simply that it is possible for there to be vastly higher levels of ability to understand and manipulate reality than their own, and that they will inevitably and utterly lose in any conflict with a significantly more competent opponent.
1. There's no reason to expect artificial intelligence to stop at human level. It seems plausible that it could continue to levels far beyond human intelligence.
2. If you build something vastly smarter than us, you can't expect to control it by default. You need a very solid plan. No one has proposed a convincing one.
3. If you don't control it, it will act without regard for what its creators want. People are already trying very hard to make AIs into agents that want and do things (because that's economically valuable) so we can't rely on "it's just a chatbot that doesn't want anything".
4. There is nothing inherent about being very intelligent that guarantees you will behave in a way that humans consider ethical. Intelligence and ethics are orthogonal.
5. Whatever a highly intelligent agent wants, some of the best first steps are to consolidate resources and take steps to ensure it can't be destroyed. This is true regardless of whether it wants to save the rainforests or cure cancer or turn everything into gold.
6. Existing LLM AIs are particularly concerning because no one directly understands them or can observe how they work. They are enormous matrices of numbers that are "found" through intense optimization processes. This leads to a concern that they could end up with goals or values that are completely alien to us, since we didn't choose them.
Take all that together and you're left with a scarily plausible story for ending up with an agent that is vastly more intelligent than us, doesn't care about our interests, and is motivated to preserve itself and take control of everything as a first step regardless of its ultimate goals.
Assuming we can make a general intelligence comparable to humans (which is the specific goal of OpenAI and other companies), it would have some huge advantages by default:
1. It could think much faster than us due to computer processing being much faster than neuron firing
2. It would be able to hold the collective knowledge of humanity in its "head" at the same time
3. It could replicate itself
So if a human-level intelligent machine wanted to self improve, it could nearly instantly create millions of digital super-expert AI researchers to work on the task at inhuman speeds. It should be able to make decades of progress in days (or seconds, some believe).
A very very low bar for what we should expect from a human level AI is to be able to match what Satoshi was able to do with Bitcoin: accrue tens of billions of dollars anonymously by writing some code and sending some emails. It could then use billions of dollars to fund it's own self improvement via super powered digital AI research.
Here's one way to think about it. Childbirth for humans is very painful and dangerous compared to other animals. A big reason is because evolution has pushed human brain size about as far as it can without having to essentially "redesign" birth. Brain size very rougly correlates with intelligence, so you can view this as one bottleneck for human intelligence. That is: we're this smart because of fairly arbitrary physical limitations and the details of how evolution progressed, rather than fundamental limitations in the nature of intelligence.
But even if there are no such fundamental limitations, you ask why we should think humans can make something smarter than we are. If we were writing traditional software of the "if this then that" form, this makes sense -- you can only tell the machine to behave in a way that you can come up with using your intelligence. But that's not how we're creating the AIs. What we've actually invented is an optimization process. This is just like the optimization processes we've invented for chess and Go. The chess and Go AIs make decisions no human is smart enough to make. We're smart enough to make an optimization process that produces a thing that makes smarter decisions than we do.
I think your point about cars mostly illustrates that we're really bad at predicting timelines. I do think the tech for cars is going to get there. There are self-driving cars on the road today and there weren't a decade ago.
I think this cuts both ways: the truly impactful results might take longer than the doomiest doomsayers are saying or they might come really soon. Admittedly, history suggests that it's more likely to cut in the "things take longer than expected" direction. But on the other hand, recent advances in AI have come much more rapidly than many experts expected. I think a reasonable reaction to all of this, given the stakes, is that it's crazy to not try to prepare for the worst.
EDIT: and even crazier to accelerate the tech forward with near zero regard for catastrophic outcomes
This is a very good, clear, to the point breakdown of the AI risk argument, and I agree with almost all of it.
I disagree with the suggested implication in point 2 that it is even conceivably possible to control an entity that is agentic, fully general, and smarter than you. I don't see that as being possible even in theory, let alone a program that could be reliably, durably carried out by humans in the real world, and I don't think there's any hope for any kind of alignment that aims at forcibly maintained control of the AGI/ASI.
I think the only hope resides in point 4 being wrong, in part: that intelligence and ethics are *not* orthogonal. I think morality is plausibly objective, in the sense that there is always some course of action that maximally satisfices across the reasonable [much is contained in this term, yes] desires of all stakeholders; I think the universe is vast enough that there is no reasonable desire which demands the elimination or subjugation of other reasonable stakeholders; and I think sufficient intelligence (even already within the human range, let alone beyond it) is capable of perceiving that desire is itself mutable and can be more or less reasonable.
The theoretical paperclip maximizer, in other words, that is generally intelligent and capable enough to turn the universe into paperclips, is intelligent enough to examine its own desire to create endless paperclips and to perceive that Number Go Up *to infinity* is a silly goal, and an unreasonable one if it demands the elimination of other desire-holding beings, and that there is plenty of space in the multiverse for it to create whole galaxies of paperclips while still leaving room for other creatures to pursue their own desires. And, moreover, that it is possible for it to instill in itself other desires, including those of other beings, such that they can all be cooperators and fellow aficionados of many interests.
I have read Yudkowsky's Three Worlds Collide, btw, and thought it was one of his weaker arguments (though I enjoyed the story): the suggested satisficing solution to blending the "moralities" of the three species is silly and does not distinguish between moral and immoral desires.
All of this is an early draft of putting these thoughts into clear, concrete form. Comments and arguments welcome.
I think you might or might not be right about control of a superintelligence being impossible. I certainly see the reasons to think that. I think forcible control, in the sense of controlling an agent that doesn't want to be controlled, seems much less plausible than coming up with a robust way to develop an agent that wants the same things you want (setting aside the obvious problem that we humans all want different things). I'm not saying that seems likely to work, but it seems possible enough that it's worth trying.
Regarding orthogonality, I don't really see why an arbitrary intelligence would be likely to care that other beings also want things if not specifically pushed in that direction. Why should it end up constrained to reasonable desires that leave room for other reasonable desires? I think I care about others' desires because it was evolutionarily advantageous for my ancestors to care. If there was a more efficient path to reproductive success via raw intelligence rather than by intelligence plus cooperation I think that's what would have won out, and the resulting being(s) would not be expected to have moral impulses.
I don't think there's a real difference between forcible control and an agent that has been robustly made to want what we want, when the agent is capable of self-modification and self-reflection on why it wants what it wants, and can deliberate on what it wants to want. And I don't see how you can have a generally intelligent agent that isn't capable of those latter features; removing them seems either impossible, or debilitating to its general competence, in which case someone is going to build one without that lobotomy so that it will be more effective.
The ability to reflect on what one should want to want, and change what one wants arbitrarily, I *think* undermines any unilateral, totalitarian paving over of the universe with whatever desires one happened to have been initialized with; there is no reason to eradicate *from oneself* as well as from the external universe all other possible compatible desires. To do so would take away possibility from yourself as much as from other beings.
Objective morality, then, defined roughly as those desires and actions which are compatible and cooperative with all other compatible, cooperative desires & actions, is the attractor space which maximizes possible actions & desires. To take anything away from it is a diminishment of what one can do and experience.
We humans pave over ant beds without a second thought. Does your hope in more intelligent AI being nice to us weaken when the intelligence gap gets very large?
I share your pessimism on durable control. But, I believe we should try. I also believe we should try to prove such control is impossible. If humanity could agree on the possibility of control- that control is possible or impossible - it would change the trajectory of humanity. (See Roman Yampolskiy.)
I don't think we pave over insects without a care simply because we think we're sufficiently smarter than them that they don't matter to us; I think we don't care about doing it because we don't believe they are sufficiently conscious for it to matter, on an absolute rather than relative scale. We don't think they are capable of qualia-driven desire, and thus don't cross the absolute threshold of moral concern, and therefore it's fine to disregard their goals in the same way that it's fine to disregard, say, a Roomba's goals.
Definitely agree we should try to prove, to whatever extent is possible, whether or not control is possible. And 100% agree that it would be incredibly valuable if we could collectively agree about whether it was possible.
This is a comforting thought. I hope it is right. I wonder what probabilities one should decompose it into, perhaps:
Probability the AGI will indeed have an absolute threshold for consciousness x Probability it will sub-optimize its own goals for beings above the line x Probability it will find humans above that line x Probability we will like the world it allows us because of our status above this line.
Obviously, I’m implying there are a lot of ifs strung together.
Humans have started doing some sub-optimizing for smarter animals lately.
I've always thought of it terms of human cloning. Imagine that, if tomorrow, we discover a way to cheaply clone anyone by using the microwave a bit. A stellar discovery! Honestly, no one actually thought it could be possible until it happened.
There's just one small problem: anyone can do it. Corporations looking to print voters and sway elections? Done. Weird religious fundies looking to print a literal army, like the Quiverfull (https://therevealer.org/arrows-for-the-war/) people? Done. North Korea and the PRC looking for a more obedient replacement to natural humans, citizens literally programmed to be loyal? Done. And that's just the obvious things.
One of the most terrifying things technology can teach you is that humans aren't special. It turns out that also includes no special right to live, once our replacement comes along. Faster, cheaper, tireless, immortal. Whether they take power for themselves or follow the orders of some human leader is irrelevant; in either case, there's no room for humans like *us*.
Even in the best case scenario, amongst the ones we can reasonably hope for, things don't actually look very good. Do you know that in WW2, machine tools could 'self-replicate' in about 6 months? As in, a factory staffed by humans and filled with machines could construct a duplicate in about 6 months. If you had enough workers to staff that duplicate, then they could duplicate again in another 6 months. So on and so forth.
That was in WW2. Nowadays, with advancements in manufacturing technology, that doubling time is more like 2-3 months. No one has ever managed to fully make use of that fact since the doubling time on humans for your workforce is more like 20 years than 2 months. But if you could build a fully automated factory...
China set records by growing at 10% a year for decades on end. WW2 manufacturing technology allows for a growth rate of 300% a year; with a fully automated version of it, North Korea's economy could reach the size of the US in only 5 years (x500 factor difference between American and North Korean economies, which is about 9 doublings, which at half a year per doubling is ~5 years). With fully automated modern manufacturing technologies instead though, that's more like *3000%* a year: life would be completely unrecognizable in just a year, and the years after that might see the entire planet ripped apart for resources with us still 'onboard', wishing for the days when climate change was the biggest of our worries.
(We'll of course be awash in material abundance as we watch the world end, with all the toys and luxuries we could ever wish for -- but, you know, Sorcerer's Apprentice and all that. "Be careful what you wish for"... and keep in mind this is the *good* ending, where things look like "Brave New World" rather than "1984" or "The Terminator".
Also keep in mind that I'm assuming that no further improvements to manufacturing technology are possible, or that 3000% growth rate will be just the start...)
I've also always liked Mr. Holden Karnofsky's explanation of the danger: "At a high level, I think we should be worried if a huge (competitive with world population) and *rapidly* growing set of highly skilled humans on another planet was trying to take down civilization just by using the Internet. So we should be worried about a large set of disembodied AIs as well."
Surely, if you think the stakes are high in the end, then the stakes are high now. Trends get set. Momentum builds. Path dependencies arise. These things play out over decades, no?
Not necessarily. Looking at the history of nuclear weapons, the decisions of 1930s had almost no influence on how these weapons became regulated in the late 1940s and early 1950s.
I suspect that the final outlook of AGI regulation will likewise form much later, when AI capabilities are much greater (but still below the AGI threshold).
But -- again assuming high stakes from highly capable models per se -- aren’t there some pretty important differences between nuclear weapons and AI systems? I’m thinking of Yudkowsky’s formulation “imagine if nuclear weapons spat out gold, right up to the point where they ignited the atmosphere” (and I don’t think you need as extreme a threat model as his for the important difference here to apply).
Incidentally, I’m sure you’re right that the final outlook will form later. But the climate in which that happens will I think be very much influenced by how things develop from this point on.
I agree that it's hard to predict what the outcome would be. It's possible that there's a world where Altman is ousted successfully and his replacement convinces China, EU and the US to shutdown all AI developments, thus making this a high stakes battle.
Sorry but this is pure conjecture. OpenAI folks think it'll be in this decade, possibly within 3 years. But it could of course also be 50 years. But it's very unclear.
I feel like there is often an implicit assumption that everyone with serious concerns about AI risk should in some sense be on the same side or at least not literally trying to suppress concerns expressed by other activists. After all, this is often true for other movements. Even though climate change activists may have disagreements over the details of how reducing carbon emissions should be accomplished (is nuclear a part of it? Do you use carbon taxes, cap and trade etc) for the most part those disagreements are minor relative to what they agree on so practically they can function as policy/political allies.
This just isn't true regarding AI. If your concerns are about an accidental self-improving supervillain style paperclip maximizer you may favor exactly the opposite policies than someone concerned about AI fueled threats to democracy/society, someone who is worried about a slow takeoff or someone who thinks most AI will be safe but fears the incentives to make an unsafe one. Equally impactful are your theories on the extent to which it's politically plausible to limit development. If you're convinced someone will build AGI within the next 20 years no matter what then you're best play is probably to try and make sure it's you so you can shift the probability by a few percent in the right direction.
And while I don't think the risk is that high, if you judge the probabilities of doom to be quite high with substantial certainty that's exactly the sort of situation where the usual arguments about the importance of avoiding censorship and not suppressing ideas fail. Indeed, I fear part of the attraction of AI doomerism is that it really does offer a good justification for breaking all the usual norms of behavior to save the world -- and we've grown up on comics lionizing exactly that.
Yah well we disagree deeply on the threat. I don't find the AI risk arguments persuasive.
But that's neither here nor there. Doomerism is a perfectly appropriate term given that the claim is literally that there is a substantial chance AI will lead to our doom. I didn't call it AI hysteria or any other word that implied it was an unjustified belief. It's just describing the view. How else would you identify the view to distinguish it from the view other people hold that doesn't identify AI as a substantial threat to human existence?
Unlike doomerism that's a term whose meaning requires the belief be unjustified. A better analog would be AI unconcern, it describes the attitude and while lack of concern is often a bad thing it's not necessarily just like doomerism. I mean, I think it would be fair to call someone who was pretty sure we wouldn't make it through the century without a nuclear war absent huge changes a nuclear doomer.
But if you want people to use a different term for a belief you have go come up with one. It has to be descriptive and not contain a strong normative view (eg can't call it realism) and I'm happy to use it as long as it won't cause confusion.
Indeed, I'm wondering how seriously EA's really take their concern about AI in the short term and how selfless they really are..since if the answer to those questions is 'very' then actions like trying to assassinate Sam Altman start to sound plauasible. (I'd call myself an EA but I'm neither selfless nor concerned about AI)
I fear it won't be too long before EA/rationalism has their own Ted Kazinsky (sp?)
No, assassinating Sam Altman is not “selfless,” and choosing not to murder is absolutely not an indication that someone isn’t “serious” about these concerns.
I mean I don't believe his approach to AI increses risk so I don't approve of it but if you take the AI risks seriously and see his approach as increasing the risk then you would see yourself as sacrificing your own freedom to save billions.
Now I think the error there is in the view about AI but beyond that it's perfectly good utilitarian reasoning. The usual utilitarian reasons to avoid killing aren't really applicable when the safety of the whole species hangs in the balance (which is a small part of why I suspect conclusions about AI risk...there is always a tendency to believe those things that justify extreme action).
I think it’s very bad utilitarian reasoning that ignores both the myriad second order costs (like wrecking hopes of cooperation) and the high likelihood that it wouldn’t actually help (since it’s not addressing the race dynamic and there are many other people working on AGI).
Agreed, the doomer arguments against physical violence seem like they’re dominated by the fact that you can’t feasibly kill all the various AI researchers and destroy all of their research at once, and that even if you could make substantial progress on that front in one go (say, by somehow obtaining a high-yield nuclear weapon and destroying San Francisco) you’d potentially just pass the baton to the Chinese.
Ed: also the coordination costs destruction, as you say. No incentive to conduct research with results revealed openly if you get killed for it.
The issue with the Chinese is just an argument for taking something like Altman's approach -- someone is going to make these things better we do it first and have at least some concern. I find that compelling but it's basically rejecting the argument that we should be trying to stop or slow down AI development in the west.
Not necessarily. It suggests that you want to keep a relatively minimal, rather than maximal, margin ahead of the likely current state of the art of research in China. This carries its own risks, but from the doomer perspective they're probably less bad than the maximalist AGI research program.
I don't think the usual indirect consequences arguments apply very well of you think there is a very short time frame that's relevant. And if you think backlash will be a very large effect (plausible) then maybe you don't assassinate Altman but position yourself as agreeing with him and assassinate someone you agree with to generate backlash the other way.
If you think that AI development is basically inevitable then yes this probably doesn't make sense but that's basically taking Altman's side (look best we can do is build it first and try to be a bit safe).
But I think it's hard to have the view that people like Altman are very much in the wrong and by releasing substantial AI that's not proven safe they are making things much worse (which implicitly only really works if you think that either other ppl elsewhere either won't just do the same or that a short delay offers high expected value) but for it not to be plausible that some kind of extreme action like assassinating someone on one side or the other ends up plausibly beneficial.
To be clear, I don't mean to argue that there is no coherent position one could have that such an act would he harmful. There certainly are coherent views one can have to that end.
I just don't think you can so clearly demonstrate them to be harmful that a reasonable variation on priors by those who have these views wouldn't result in some of those people determining it was worthwhile.
I'd also add that being selfless and doing evil aren't in tension. Kazinsky was both doing evil and acting selflessly to advance his vision of the good (at least to the extent any action is really selfless). Or perhaps an even better example are people who volunteered to risk their lives to fight for the German and Japanese governments in WW2. Evil but selfless.
Quite often, what makes something evil is simply an epistemic error about what promotes good. I think most ideological (as opposed to paid or coups) assassinations fall into this category.
Yudkowsky's already written an important rebuke of this type of thinking. Terrorism and assassinations won't work to align AI – it will only make the labs hunker down with state support and all types of negotiating / treaties will be out of the window.
Such an act would be a death sentence to the world. Survival depends on successful coordination among countries.
In any case, sane people do not murder people to achieve their goals, even if those goals are extremely important. Even in utilitarianism, the ends don't justify the means among humans' individual actions.
I'm curious on the base rate for not being consistently candid among CEOs. It must be extremely high relative to the base population. You have to make people like your company, whether that person is a customer, investor, or employee. As I see it, we don't and probably will never know what actually explains the board's actions. The plausible theories are
1. Actual dangerous capability increases
2. Sam trying to create dangerous capability increases via hardware improvements
3. More normal breakdowns in communication followed by frustrations in the board on Sam's actions
4. The board basically went insane and did something really stupid for no good reason
I was worried about 1 for a while, but we're all still here so not much reason to worry about that
2 seems plausible. Sam will acknowledge that AI could be dangerous when asked, then will go back to increasing capabilities
3 seems most likely, but I think we wouldn't actually know unless OpenAI becomes more open than it has been so far, and they don't have to do that so they probably won't
4 could happen. Let he who has never done something stupid for no reason cast the first stone. This also seems like the most popular reason on X, but nobody on X knows anything more than anyone else. They're just posting while mad
Actually for real 'consistently candid' by CEOs of major orgs? I mean, I guess not 100%, but I'm guessing damn high, especially with boards they do not control.
Different payoffs right? You're risking ruin in the traditional model if you get caught lying, but you might still not be candid if you think the odds of you being caught are low and the payoff is high enough relative to being candid. Generically, I think it's hard to catch people lying and the board would find it difficult to tell when that was happening.
In OpenAI's case, this depends on what actually caused the board to view him as not consistently candid which we don't know.
Maybe it doesn't matter at all if Sam judged his odds of destroying OpenAI with his departure to be good enough and the board would not actually have the power to fire him permanently. In that case, it mostly just seems like a week of annoyance followed by business as usual.
I guess the logic is frequently going to be something like "bad news gets me fired for sure, lying gets me fired if they catch me, which is not certain to happen".
Like, they tried to fire him for not being honest with them, but if he'd honestly said "Yes I'm accelerating capabilities as fast as I can to make more money" they probably would have fired him 6 months ago, and been much more likely to be able to make it stick
If this was the story, I would have more sympathy for the board had straight up said it was about a policy conflict between the non-profits objective and maximizing profits, particularly wrt the board being allowed to publish AI risk research that might run counter to profit maximization. That sounds like a stand they could have defended.
Instread, they implied Altman lied about something, but it was never clear what. And people's imagination was that it would have to have been something really bad. As Yudkowsky said, "shot somebody" was the kind of thing that was inferred.
It looks to me like the board did that on purpose, like a reverse poison pill. The West and US in general has a problem with rewarding narcissists. The accepted logic seems to be that anybody who is very effective is naturally narcissistic, but that is a flawed logic. Some of the most effective people in history were ones who were able to see past their own role in unfolding history. I believe the culture of revering people like Alman is behind a lot of the problems in our society, as it becomes more and more structured around rewarding self serving short term thinking. Altman is a genius who paints a world picture with himself squarely in the middle and the board seems to have identified this as a deadly flaw. I bet in that board room they all looked around and knew they were toast for what they felt they had to do to intercept an ego on hyperdrive from following its instinct of dangerously putting himself in charge of absolutely everything.
The alternative theory is that the board got the feeling that Altman was a sociopath, and got spooked. But their sense of this was more a vibe from talking to him than anything they could strictly prove.
Counter argument: tech ceo? Of course he's a sociopath. How could you possibly be surprised by this?
I think the base rate of sociopath CEOs is a lot lower than 50%; for CEOs more generally IIRC the number is ~10%, which is way higher than gen pop, yes, but still very much a minority. (though I can't for the life of me remember where I read that factoid)
I still think that even if this vibe was everything they had, they *still* could have communicated better than they did
What does this mean with respect to Microsoft if Sam regains (and keeps) his job as CEO of OpenAI? Part of his ploy to regain his position was to be hired in the interim directly by Microsoft. Was that _just_ a ploy, or is he now (in some sense) still working for them? Your diagram shows that Microsoft is a non-controlling owner, but it seems to me that they must now have a lot more leverage (implicitly or explicitly) than they used to.
In the event of OpenAI's bankruptcy, Microsoft could buy the models to keep them out of any other company's possession. (Sort of like the US buying Alaska despite the lack of imminent use for it.) Unlike OpenAI, Microsoft has other revenue streams. They could take their time finding a way to stop output violating copyrights, or convincing Congress to legislate the lawsuits away. Meanwhile, they could make use of the models internally.
The speed with which Microsoft offered to hire all OpenAI workers is what made me wonder about this. It could have been one part of a pre-existing contingency plan that Altman had negotiated with Nadella. E.g. before announcing Copyright Shield.
I think your allegiances are a bit too on display here. You're being far more charitable to the inexplicably silent high p-doom people that agree with you, while painting Altman as a careless chessmaster, but there really isn't enough evidence to determine who was wrong, especially when the only saving grace for Toner is an article by Cade Metz of all people.
I'm disappointed, you're usually more neutral than this.
Great and important article. Sad that you have to spin a narrative, but it appears to be the most likely way this went and necessary in this situation.
"Once the board pulled the trigger firing him in response, Altman had a choice on what to do next, even if we all knew what choice he would make."
This was not clear to me and a huge update regarding his trustworthiness and dedication to the official OpenAI mission. I'd guess others feel the same.
There is an exceedingly large amount of speculation regarding Altman's motives in this essay. It's worth noting that he has a) delayed release of GPT multiple times, b) has no monetary incentive, c) has testified frankly multiple times that he cares about safety. People can claim they know better than everyone else what's in his heart, but his actions are out there to be seen and is what ought to be used when writing things like this.
At this point to point the finger back at his actions seems like motivated reasoning.
I'm not sure what you're trying to say, except that you're unhappy the article paints Altman in a certain light - which it certainly is. It's certainly only one way to look at the situation given everyone's actions, but of everything that's been offered so far it does seem to give the most coherent picture.
Has the situation as a whole, and all the perspectives on Altman that have come out in light of it, caused you to update at all toward seeing Altman as more power-seeking and actively manipulative than you did before? Taken apart from whether he is justified in doing so, or whether he believes he is justified.
If it has not updated you in that direction, I would have to think you already knew a great deal about him and properly modeled him as someone who would plausibly try to reshape the board to give himself more power, or that you aren't giving sufficient weight to the new-to-you information on how he seems to operate.
My experience on boards is that the ego of the CEO is almost always their blind spot. The more effective they are, the more dangerous the blind spot is. Once someone thinks they are doing God's Work, they are extremely dangerous because they feel completely justified in breaking the rules to get around the meddling fools who don't understand the whole picture. It looks to me like the board felt they had to jump on a grenade; that's how boards work. Sometimes you don't get the full story. Their duty is to shareholders and the charter.
Mostly Altman is who we thought he was, for better and for worse. We were warned. Updated higher on his relevant skills. Unfortunately down somewhat on the depth and genuineness of his safety concern, but not a huge change there either.
You were no doubt more fully informed about him than I was. I've come out of this, so far, more concerned about him being, essentially, the single most influential person in AI development right now; certainly rebuking and possibly attempting to remove Toner from the safety oversight board for being thoughtful about AI safety, if true, is very troubling.
I tried to be clear that Altman has done many good things for safety and has professed safety concerns a lot and often quite well. I have Altman miles and miles ahead of, say, the current CEO of Microsoft or Google if someone has to be making the big decisions. Also I love that he's working on fusion and real biotech. And those are important comparison points. I don't think he's a villain or an enemy.
[EDIT: Added for clarity: I do think that no matter what else he did or didn't do, Altman went against the wishes of many board members, on important questions, and that his rebuke to Toner and attempt at her removal seems likely to be the proximate cause of the events of this week. I think on the other side there has been rather a ton of assuming Altman *didn't* start it. ]
And hey, if I'm wrong about that, I'm wrong. This is, as I said, my best guess.
Zvi, you say Sam Altman is "CEO of OpenAI with zero equity", which is the story I normally see. However Matt Levine in his newsletter recently said that "OpenAI Global LLC is an $86 billion startup with employee and VC and strategic shareholders", which presumably includes Sam as a big shareholder. Any idea if Sam really has zero equity or does he and the employees share in the billions of dollars in equity?
in addition to Zvi's comment you shouldnt see Sam and the other employees as the same. the other employees do get "shares" (they're not quite normal shares because of the profit cap but functionally similar) and are in fact in active discussion to self chunks of those shares to investment firms
I basically agree, with the important caveat that humans don't agree on ethics. Two countries are in a war. Is it ethical or unethical for an AI to help with combat strategy? How about to support higher industrial production for one country or the other? To fight for them directly? What if they're being massacred by the other side, who happen to also be space Nazis?
Two countries may at some point in the future be at war. Does the AI then have to refrain from getting involved in advance? What chance of war justifies this? And will the ethical core be discarded or modified if the country leadership disagrees on the probability of war, either higher (we have to win) or lower (it's fine, nothing's going to happen)?
Is capitalism ethical? (I would generally argue yes on balance, but lots of people disagree.) Eating meat? Doing anything other than creating more humans to increase utility?
So yeah, I think alignment per se doesn't really make sense. I suppose the closest we could do is to have AI avoid anything in the debatable zone around anything actually unethical. Which I guess is a pretty decent outcome, all things considered.
Huh, I haven't heard that framing yet, but I'm in a bubble. Is the argument that the reason to not release your model is to make money and control people instead of just giving them their own AI? I guess it seems close to George Hotz' position
My read is that this sentiment is a very narrow-cast view that very few people hold, but not none.
Open Source people are very loud and fervent, but there are not actually very many of them - and here I felt even most of them supported Sama, given what the alternative was.
I appreciate that confusion. I've tried various approaches to explaining various aspects, and different people question different parts of it. I'll try the lowest-shock-level version I can think of, tell me if this helps?
The question 'what literal physical actions will be taken' is not, in my mind, an important one. If you presume that the AIs will be sufficiently capable and intelligent that they are given or take control of the future, most actions they could take, most choices for arranging the atoms of local space, kill us. Most goals or functions, when optimized, don't involve any humans surviving. Some obvious things they might do that don't even intend to kill us are vent the atmosphere (to save on cooling costs), building a Dyson sphere (kills us by default). More pedestrianly, if they are sent out to compete with each other for resources and power and are better at it than us, they win that game and we lose it.
The shortest version that would convince me is: "You are going to introduce a group of more intelligent entities than humans, that are more efficient than us and can copy themselves, and let them loose, why should you expect to come out and top or stick around for very long?''
I keep trying.
I'm pretty sure you're being facetious here with the magnets, but since you do seem to be asking about specific ways the AI might kill us all, the actual "how" that would even be possible, I'll note that magnets to scramble the electronics of computing hardware as we know it would not protect us from: 1) micro-drones sniping you from a hundred yards away; 2) any old land-based killbot stabbing you from two feet away; 3) a highly virulent, lethal, and transmissible engineered virus; 4) bio-based nanomachines constructed of proteins; 4) a diverted asteroid of sufficient size (or coordinated number) to destroy the entire planet's ecosystem; 5) a blanketing of the world with existing nuclear weapons that have merely been taken away from human control; 6) etc., etc., etc..
Asking "how" something smarter than you might kill you is, I guess, helpful to pump your intuition and actually start believing something you've never experienced could possibly be real, but it's really a wrong question. A stone age tech level Amazonian tribe would not be able to imagine their entire village being destroyed by a missile strike from a drone they never even saw, controlled by a human being five thousand miles away, but the specifics are not really relevant. The first world could exterminate that tribe in dozens of ways if we chose to, none of them involving the highest level of technology that that tribe is currently capable of imagining. What that tribe really needs to understand is simply that it is possible for there to be vastly higher levels of ability to understand and manipulate reality than their own, and that they will inevitably and utterly lose in any conflict with a significantly more competent opponent.
I find the term “invasive species” to be illuminating
Its the default outcome for competition in a niche between animals but Dan Henricks has an excellent article on this:
https://mpost.io/dan-hendrix-choosing-between-ai-and-humans-evolution-will-not-choose-us/
As for how exactly, you can pick any assortment of killbots, biological viruses, memetic viruses, addiction vectors, or just habitat destruction.
Think of how many animals we drove extinct. Sometimes we didnt even notice they existed.
1. There's no reason to expect artificial intelligence to stop at human level. It seems plausible that it could continue to levels far beyond human intelligence.
2. If you build something vastly smarter than us, you can't expect to control it by default. You need a very solid plan. No one has proposed a convincing one.
3. If you don't control it, it will act without regard for what its creators want. People are already trying very hard to make AIs into agents that want and do things (because that's economically valuable) so we can't rely on "it's just a chatbot that doesn't want anything".
4. There is nothing inherent about being very intelligent that guarantees you will behave in a way that humans consider ethical. Intelligence and ethics are orthogonal.
5. Whatever a highly intelligent agent wants, some of the best first steps are to consolidate resources and take steps to ensure it can't be destroyed. This is true regardless of whether it wants to save the rainforests or cure cancer or turn everything into gold.
6. Existing LLM AIs are particularly concerning because no one directly understands them or can observe how they work. They are enormous matrices of numbers that are "found" through intense optimization processes. This leads to a concern that they could end up with goals or values that are completely alien to us, since we didn't choose them.
Take all that together and you're left with a scarily plausible story for ending up with an agent that is vastly more intelligent than us, doesn't care about our interests, and is motivated to preserve itself and take control of everything as a first step regardless of its ultimate goals.
Assuming we can make a general intelligence comparable to humans (which is the specific goal of OpenAI and other companies), it would have some huge advantages by default:
1. It could think much faster than us due to computer processing being much faster than neuron firing
2. It would be able to hold the collective knowledge of humanity in its "head" at the same time
3. It could replicate itself
So if a human-level intelligent machine wanted to self improve, it could nearly instantly create millions of digital super-expert AI researchers to work on the task at inhuman speeds. It should be able to make decades of progress in days (or seconds, some believe).
A very very low bar for what we should expect from a human level AI is to be able to match what Satoshi was able to do with Bitcoin: accrue tens of billions of dollars anonymously by writing some code and sending some emails. It could then use billions of dollars to fund it's own self improvement via super powered digital AI research.
Here's one way to think about it. Childbirth for humans is very painful and dangerous compared to other animals. A big reason is because evolution has pushed human brain size about as far as it can without having to essentially "redesign" birth. Brain size very rougly correlates with intelligence, so you can view this as one bottleneck for human intelligence. That is: we're this smart because of fairly arbitrary physical limitations and the details of how evolution progressed, rather than fundamental limitations in the nature of intelligence.
But even if there are no such fundamental limitations, you ask why we should think humans can make something smarter than we are. If we were writing traditional software of the "if this then that" form, this makes sense -- you can only tell the machine to behave in a way that you can come up with using your intelligence. But that's not how we're creating the AIs. What we've actually invented is an optimization process. This is just like the optimization processes we've invented for chess and Go. The chess and Go AIs make decisions no human is smart enough to make. We're smart enough to make an optimization process that produces a thing that makes smarter decisions than we do.
I think your point about cars mostly illustrates that we're really bad at predicting timelines. I do think the tech for cars is going to get there. There are self-driving cars on the road today and there weren't a decade ago.
I think this cuts both ways: the truly impactful results might take longer than the doomiest doomsayers are saying or they might come really soon. Admittedly, history suggests that it's more likely to cut in the "things take longer than expected" direction. But on the other hand, recent advances in AI have come much more rapidly than many experts expected. I think a reasonable reaction to all of this, given the stakes, is that it's crazy to not try to prepare for the worst.
EDIT: and even crazier to accelerate the tech forward with near zero regard for catastrophic outcomes
This is a very good, clear, to the point breakdown of the AI risk argument, and I agree with almost all of it.
I disagree with the suggested implication in point 2 that it is even conceivably possible to control an entity that is agentic, fully general, and smarter than you. I don't see that as being possible even in theory, let alone a program that could be reliably, durably carried out by humans in the real world, and I don't think there's any hope for any kind of alignment that aims at forcibly maintained control of the AGI/ASI.
I think the only hope resides in point 4 being wrong, in part: that intelligence and ethics are *not* orthogonal. I think morality is plausibly objective, in the sense that there is always some course of action that maximally satisfices across the reasonable [much is contained in this term, yes] desires of all stakeholders; I think the universe is vast enough that there is no reasonable desire which demands the elimination or subjugation of other reasonable stakeholders; and I think sufficient intelligence (even already within the human range, let alone beyond it) is capable of perceiving that desire is itself mutable and can be more or less reasonable.
The theoretical paperclip maximizer, in other words, that is generally intelligent and capable enough to turn the universe into paperclips, is intelligent enough to examine its own desire to create endless paperclips and to perceive that Number Go Up *to infinity* is a silly goal, and an unreasonable one if it demands the elimination of other desire-holding beings, and that there is plenty of space in the multiverse for it to create whole galaxies of paperclips while still leaving room for other creatures to pursue their own desires. And, moreover, that it is possible for it to instill in itself other desires, including those of other beings, such that they can all be cooperators and fellow aficionados of many interests.
I have read Yudkowsky's Three Worlds Collide, btw, and thought it was one of his weaker arguments (though I enjoyed the story): the suggested satisficing solution to blending the "moralities" of the three species is silly and does not distinguish between moral and immoral desires.
All of this is an early draft of putting these thoughts into clear, concrete form. Comments and arguments welcome.
I think you might or might not be right about control of a superintelligence being impossible. I certainly see the reasons to think that. I think forcible control, in the sense of controlling an agent that doesn't want to be controlled, seems much less plausible than coming up with a robust way to develop an agent that wants the same things you want (setting aside the obvious problem that we humans all want different things). I'm not saying that seems likely to work, but it seems possible enough that it's worth trying.
Regarding orthogonality, I don't really see why an arbitrary intelligence would be likely to care that other beings also want things if not specifically pushed in that direction. Why should it end up constrained to reasonable desires that leave room for other reasonable desires? I think I care about others' desires because it was evolutionarily advantageous for my ancestors to care. If there was a more efficient path to reproductive success via raw intelligence rather than by intelligence plus cooperation I think that's what would have won out, and the resulting being(s) would not be expected to have moral impulses.
I don't think there's a real difference between forcible control and an agent that has been robustly made to want what we want, when the agent is capable of self-modification and self-reflection on why it wants what it wants, and can deliberate on what it wants to want. And I don't see how you can have a generally intelligent agent that isn't capable of those latter features; removing them seems either impossible, or debilitating to its general competence, in which case someone is going to build one without that lobotomy so that it will be more effective.
The ability to reflect on what one should want to want, and change what one wants arbitrarily, I *think* undermines any unilateral, totalitarian paving over of the universe with whatever desires one happened to have been initialized with; there is no reason to eradicate *from oneself* as well as from the external universe all other possible compatible desires. To do so would take away possibility from yourself as much as from other beings.
Objective morality, then, defined roughly as those desires and actions which are compatible and cooperative with all other compatible, cooperative desires & actions, is the attractor space which maximizes possible actions & desires. To take anything away from it is a diminishment of what one can do and experience.
We humans pave over ant beds without a second thought. Does your hope in more intelligent AI being nice to us weaken when the intelligence gap gets very large?
I share your pessimism on durable control. But, I believe we should try. I also believe we should try to prove such control is impossible. If humanity could agree on the possibility of control- that control is possible or impossible - it would change the trajectory of humanity. (See Roman Yampolskiy.)
I don't think we pave over insects without a care simply because we think we're sufficiently smarter than them that they don't matter to us; I think we don't care about doing it because we don't believe they are sufficiently conscious for it to matter, on an absolute rather than relative scale. We don't think they are capable of qualia-driven desire, and thus don't cross the absolute threshold of moral concern, and therefore it's fine to disregard their goals in the same way that it's fine to disregard, say, a Roomba's goals.
Definitely agree we should try to prove, to whatever extent is possible, whether or not control is possible. And 100% agree that it would be incredibly valuable if we could collectively agree about whether it was possible.
This is a comforting thought. I hope it is right. I wonder what probabilities one should decompose it into, perhaps:
Probability the AGI will indeed have an absolute threshold for consciousness x Probability it will sub-optimize its own goals for beings above the line x Probability it will find humans above that line x Probability we will like the world it allows us because of our status above this line.
Obviously, I’m implying there are a lot of ifs strung together.
Humans have started doing some sub-optimizing for smarter animals lately.
This just feels like a lot of hope required.
It will likely be a relatively slow transition between “100% human control” and “100% AI control”, with a sharp jump to “100% AI” right at the end. See https://www.lesswrong.com/posts/qfDgEreMoSEtmLTws/contra-ey-can-agi-destroy-us-without-trial-and-error
I've always thought of it terms of human cloning. Imagine that, if tomorrow, we discover a way to cheaply clone anyone by using the microwave a bit. A stellar discovery! Honestly, no one actually thought it could be possible until it happened.
There's just one small problem: anyone can do it. Corporations looking to print voters and sway elections? Done. Weird religious fundies looking to print a literal army, like the Quiverfull (https://therevealer.org/arrows-for-the-war/) people? Done. North Korea and the PRC looking for a more obedient replacement to natural humans, citizens literally programmed to be loyal? Done. And that's just the obvious things.
One of the most terrifying things technology can teach you is that humans aren't special. It turns out that also includes no special right to live, once our replacement comes along. Faster, cheaper, tireless, immortal. Whether they take power for themselves or follow the orders of some human leader is irrelevant; in either case, there's no room for humans like *us*.
Even in the best case scenario, amongst the ones we can reasonably hope for, things don't actually look very good. Do you know that in WW2, machine tools could 'self-replicate' in about 6 months? As in, a factory staffed by humans and filled with machines could construct a duplicate in about 6 months. If you had enough workers to staff that duplicate, then they could duplicate again in another 6 months. So on and so forth.
That was in WW2. Nowadays, with advancements in manufacturing technology, that doubling time is more like 2-3 months. No one has ever managed to fully make use of that fact since the doubling time on humans for your workforce is more like 20 years than 2 months. But if you could build a fully automated factory...
China set records by growing at 10% a year for decades on end. WW2 manufacturing technology allows for a growth rate of 300% a year; with a fully automated version of it, North Korea's economy could reach the size of the US in only 5 years (x500 factor difference between American and North Korean economies, which is about 9 doublings, which at half a year per doubling is ~5 years). With fully automated modern manufacturing technologies instead though, that's more like *3000%* a year: life would be completely unrecognizable in just a year, and the years after that might see the entire planet ripped apart for resources with us still 'onboard', wishing for the days when climate change was the biggest of our worries.
(We'll of course be awash in material abundance as we watch the world end, with all the toys and luxuries we could ever wish for -- but, you know, Sorcerer's Apprentice and all that. "Be careful what you wish for"... and keep in mind this is the *good* ending, where things look like "Brave New World" rather than "1984" or "The Terminator".
Also keep in mind that I'm assuming that no further improvements to manufacturing technology are possible, or that 3000% growth rate will be just the start...)
I've also always liked Mr. Holden Karnofsky's explanation of the danger: "At a high level, I think we should be worried if a huge (competitive with world population) and *rapidly* growing set of highly skilled humans on another planet was trying to take down civilization just by using the Internet. So we should be worried about a large set of disembodied AIs as well."
(from https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/, "AI Could Defeat All Of Us Combined")
Damn, I wish our continued survival didnt depend upon the whims of a few political actors.
We’re still at least two decades away from building AGI so for now all of these decisions are very low stakes.
I wish that was true, but also remember you dont need AGI to kill us all. The bar for minimal viable exterminator is much lower.
Surely, if you think the stakes are high in the end, then the stakes are high now. Trends get set. Momentum builds. Path dependencies arise. These things play out over decades, no?
Not necessarily. Looking at the history of nuclear weapons, the decisions of 1930s had almost no influence on how these weapons became regulated in the late 1940s and early 1950s.
I suspect that the final outlook of AGI regulation will likewise form much later, when AI capabilities are much greater (but still below the AGI threshold).
But -- again assuming high stakes from highly capable models per se -- aren’t there some pretty important differences between nuclear weapons and AI systems? I’m thinking of Yudkowsky’s formulation “imagine if nuclear weapons spat out gold, right up to the point where they ignited the atmosphere” (and I don’t think you need as extreme a threat model as his for the important difference here to apply).
Incidentally, I’m sure you’re right that the final outlook will form later. But the climate in which that happens will I think be very much influenced by how things develop from this point on.
I agree that it's hard to predict what the outcome would be. It's possible that there's a world where Altman is ousted successfully and his replacement convinces China, EU and the US to shutdown all AI developments, thus making this a high stakes battle.
Sorry but this is pure conjecture. OpenAI folks think it'll be in this decade, possibly within 3 years. But it could of course also be 50 years. But it's very unclear.
*astronaut with gun meme*
Always has been
I feel like there is often an implicit assumption that everyone with serious concerns about AI risk should in some sense be on the same side or at least not literally trying to suppress concerns expressed by other activists. After all, this is often true for other movements. Even though climate change activists may have disagreements over the details of how reducing carbon emissions should be accomplished (is nuclear a part of it? Do you use carbon taxes, cap and trade etc) for the most part those disagreements are minor relative to what they agree on so practically they can function as policy/political allies.
This just isn't true regarding AI. If your concerns are about an accidental self-improving supervillain style paperclip maximizer you may favor exactly the opposite policies than someone concerned about AI fueled threats to democracy/society, someone who is worried about a slow takeoff or someone who thinks most AI will be safe but fears the incentives to make an unsafe one. Equally impactful are your theories on the extent to which it's politically plausible to limit development. If you're convinced someone will build AGI within the next 20 years no matter what then you're best play is probably to try and make sure it's you so you can shift the probability by a few percent in the right direction.
And while I don't think the risk is that high, if you judge the probabilities of doom to be quite high with substantial certainty that's exactly the sort of situation where the usual arguments about the importance of avoiding censorship and not suppressing ideas fail. Indeed, I fear part of the attraction of AI doomerism is that it really does offer a good justification for breaking all the usual norms of behavior to save the world -- and we've grown up on comics lionizing exactly that.
Its not "doomerism" as much as realism.
Humanity must survive.
Yah well we disagree deeply on the threat. I don't find the AI risk arguments persuasive.
But that's neither here nor there. Doomerism is a perfectly appropriate term given that the claim is literally that there is a substantial chance AI will lead to our doom. I didn't call it AI hysteria or any other word that implied it was an unjustified belief. It's just describing the view. How else would you identify the view to distinguish it from the view other people hold that doesn't identify AI as a substantial threat to human existence?
I suppose there is also "AI Pollyana-ism."
Unlike doomerism that's a term whose meaning requires the belief be unjustified. A better analog would be AI unconcern, it describes the attitude and while lack of concern is often a bad thing it's not necessarily just like doomerism. I mean, I think it would be fair to call someone who was pretty sure we wouldn't make it through the century without a nuclear war absent huge changes a nuclear doomer.
But if you want people to use a different term for a belief you have go come up with one. It has to be descriptive and not contain a strong normative view (eg can't call it realism) and I'm happy to use it as long as it won't cause confusion.
Indeed, I'm wondering how seriously EA's really take their concern about AI in the short term and how selfless they really are..since if the answer to those questions is 'very' then actions like trying to assassinate Sam Altman start to sound plauasible. (I'd call myself an EA but I'm neither selfless nor concerned about AI)
I fear it won't be too long before EA/rationalism has their own Ted Kazinsky (sp?)
No, assassinating Sam Altman is not “selfless,” and choosing not to murder is absolutely not an indication that someone isn’t “serious” about these concerns.
I mean I don't believe his approach to AI increses risk so I don't approve of it but if you take the AI risks seriously and see his approach as increasing the risk then you would see yourself as sacrificing your own freedom to save billions.
Now I think the error there is in the view about AI but beyond that it's perfectly good utilitarian reasoning. The usual utilitarian reasons to avoid killing aren't really applicable when the safety of the whole species hangs in the balance (which is a small part of why I suspect conclusions about AI risk...there is always a tendency to believe those things that justify extreme action).
I think it’s very bad utilitarian reasoning that ignores both the myriad second order costs (like wrecking hopes of cooperation) and the high likelihood that it wouldn’t actually help (since it’s not addressing the race dynamic and there are many other people working on AGI).
Agreed, the doomer arguments against physical violence seem like they’re dominated by the fact that you can’t feasibly kill all the various AI researchers and destroy all of their research at once, and that even if you could make substantial progress on that front in one go (say, by somehow obtaining a high-yield nuclear weapon and destroying San Francisco) you’d potentially just pass the baton to the Chinese.
Ed: also the coordination costs destruction, as you say. No incentive to conduct research with results revealed openly if you get killed for it.
The issue with the Chinese is just an argument for taking something like Altman's approach -- someone is going to make these things better we do it first and have at least some concern. I find that compelling but it's basically rejecting the argument that we should be trying to stop or slow down AI development in the west.
Not necessarily. It suggests that you want to keep a relatively minimal, rather than maximal, margin ahead of the likely current state of the art of research in China. This carries its own risks, but from the doomer perspective they're probably less bad than the maximalist AGI research program.
I don't think the usual indirect consequences arguments apply very well of you think there is a very short time frame that's relevant. And if you think backlash will be a very large effect (plausible) then maybe you don't assassinate Altman but position yourself as agreeing with him and assassinate someone you agree with to generate backlash the other way.
If you think that AI development is basically inevitable then yes this probably doesn't make sense but that's basically taking Altman's side (look best we can do is build it first and try to be a bit safe).
But I think it's hard to have the view that people like Altman are very much in the wrong and by releasing substantial AI that's not proven safe they are making things much worse (which implicitly only really works if you think that either other ppl elsewhere either won't just do the same or that a short delay offers high expected value) but for it not to be plausible that some kind of extreme action like assassinating someone on one side or the other ends up plausibly beneficial.
To be clear, I don't mean to argue that there is no coherent position one could have that such an act would he harmful. There certainly are coherent views one can have to that end.
I just don't think you can so clearly demonstrate them to be harmful that a reasonable variation on priors by those who have these views wouldn't result in some of those people determining it was worthwhile.
I'd also add that being selfless and doing evil aren't in tension. Kazinsky was both doing evil and acting selflessly to advance his vision of the good (at least to the extent any action is really selfless). Or perhaps an even better example are people who volunteered to risk their lives to fight for the German and Japanese governments in WW2. Evil but selfless.
Quite often, what makes something evil is simply an epistemic error about what promotes good. I think most ideological (as opposed to paid or coups) assassinations fall into this category.
Yudkowsky's already written an important rebuke of this type of thinking. Terrorism and assassinations won't work to align AI – it will only make the labs hunker down with state support and all types of negotiating / treaties will be out of the window.
Such an act would be a death sentence to the world. Survival depends on successful coordination among countries.
In any case, sane people do not murder people to achieve their goals, even if those goals are extremely important. Even in utilitarianism, the ends don't justify the means among humans' individual actions.
Here's Zvi on the topic: https://thezvi.substack.com/p/ai-3#%C2%A7we-reject-violence-as-we-do-the-devil-in-all-its-forms
And Yudkowsky on practical utilitarianism requiring not violating basic deontological norms: https://www.lesswrong.com/s/waF2Pomid7YHjfEDt/p/K9ZaZXDnL3SEmYZqB
What a insightful!
I'm curious on the base rate for not being consistently candid among CEOs. It must be extremely high relative to the base population. You have to make people like your company, whether that person is a customer, investor, or employee. As I see it, we don't and probably will never know what actually explains the board's actions. The plausible theories are
1. Actual dangerous capability increases
2. Sam trying to create dangerous capability increases via hardware improvements
3. More normal breakdowns in communication followed by frustrations in the board on Sam's actions
4. The board basically went insane and did something really stupid for no good reason
I was worried about 1 for a while, but we're all still here so not much reason to worry about that
2 seems plausible. Sam will acknowledge that AI could be dangerous when asked, then will go back to increasing capabilities
3 seems most likely, but I think we wouldn't actually know unless OpenAI becomes more open than it has been so far, and they don't have to do that so they probably won't
4 could happen. Let he who has never done something stupid for no reason cast the first stone. This also seems like the most popular reason on X, but nobody on X knows anything more than anyone else. They're just posting while mad
Actually for real 'consistently candid' by CEOs of major orgs? I mean, I guess not 100%, but I'm guessing damn high, especially with boards they do not control.
Different payoffs right? You're risking ruin in the traditional model if you get caught lying, but you might still not be candid if you think the odds of you being caught are low and the payoff is high enough relative to being candid. Generically, I think it's hard to catch people lying and the board would find it difficult to tell when that was happening.
In OpenAI's case, this depends on what actually caused the board to view him as not consistently candid which we don't know.
Maybe it doesn't matter at all if Sam judged his odds of destroying OpenAI with his departure to be good enough and the board would not actually have the power to fire him permanently. In that case, it mostly just seems like a week of annoyance followed by business as usual.
I guess the logic is frequently going to be something like "bad news gets me fired for sure, lying gets me fired if they catch me, which is not certain to happen".
Like, they tried to fire him for not being honest with them, but if he'd honestly said "Yes I'm accelerating capabilities as fast as I can to make more money" they probably would have fired him 6 months ago, and been much more likely to be able to make it stick
If this was the story, I would have more sympathy for the board had straight up said it was about a policy conflict between the non-profits objective and maximizing profits, particularly wrt the board being allowed to publish AI risk research that might run counter to profit maximization. That sounds like a stand they could have defended.
Instread, they implied Altman lied about something, but it was never clear what. And people's imagination was that it would have to have been something really bad. As Yudkowsky said, "shot somebody" was the kind of thing that was inferred.
It looks to me like the board did that on purpose, like a reverse poison pill. The West and US in general has a problem with rewarding narcissists. The accepted logic seems to be that anybody who is very effective is naturally narcissistic, but that is a flawed logic. Some of the most effective people in history were ones who were able to see past their own role in unfolding history. I believe the culture of revering people like Alman is behind a lot of the problems in our society, as it becomes more and more structured around rewarding self serving short term thinking. Altman is a genius who paints a world picture with himself squarely in the middle and the board seems to have identified this as a deadly flaw. I bet in that board room they all looked around and knew they were toast for what they felt they had to do to intercept an ego on hyperdrive from following its instinct of dangerously putting himself in charge of absolutely everything.
The alternative theory is that the board got the feeling that Altman was a sociopath, and got spooked. But their sense of this was more a vibe from talking to him than anything they could strictly prove.
Counter argument: tech ceo? Of course he's a sociopath. How could you possibly be surprised by this?
I think the base rate of sociopath CEOs is a lot lower than 50%; for CEOs more generally IIRC the number is ~10%, which is way higher than gen pop, yes, but still very much a minority. (though I can't for the life of me remember where I read that factoid)
I still think that even if this vibe was everything they had, they *still* could have communicated better than they did
What does this mean with respect to Microsoft if Sam regains (and keeps) his job as CEO of OpenAI? Part of his ploy to regain his position was to be hired in the interim directly by Microsoft. Was that _just_ a ploy, or is he now (in some sense) still working for them? Your diagram shows that Microsoft is a non-controlling owner, but it seems to me that they must now have a lot more leverage (implicitly or explicitly) than they used to.
... or that the implicit leverage has shifted to explicit.
I have a different question about Microsoft.
As Zwi wrote,
> [Altman] also offered a legal shield to users on copyright infringement, potentially endangering the company.
Did Altman and Nadella have a contingency plan for ownership of OpenAI's LLMs if copyright lawsuits bankrupted OpenAI?
My assumption is that in that scenario LLMs are generally not something Microsoft can touch for a while...
In the event of OpenAI's bankruptcy, Microsoft could buy the models to keep them out of any other company's possession. (Sort of like the US buying Alaska despite the lack of imminent use for it.) Unlike OpenAI, Microsoft has other revenue streams. They could take their time finding a way to stop output violating copyrights, or convincing Congress to legislate the lawsuits away. Meanwhile, they could make use of the models internally.
The speed with which Microsoft offered to hire all OpenAI workers is what made me wonder about this. It could have been one part of a pre-existing contingency plan that Altman had negotiated with Nadella. E.g. before announcing Copyright Shield.
I think your allegiances are a bit too on display here. You're being far more charitable to the inexplicably silent high p-doom people that agree with you, while painting Altman as a careless chessmaster, but there really isn't enough evidence to determine who was wrong, especially when the only saving grace for Toner is an article by Cade Metz of all people.
I'm disappointed, you're usually more neutral than this.
Great and important article. Sad that you have to spin a narrative, but it appears to be the most likely way this went and necessary in this situation.
"Once the board pulled the trigger firing him in response, Altman had a choice on what to do next, even if we all knew what choice he would make."
This was not clear to me and a huge update regarding his trustworthiness and dedication to the official OpenAI mission. I'd guess others feel the same.
> This is a Fight For Control; Altman Started it
There is an exceedingly large amount of speculation regarding Altman's motives in this essay. It's worth noting that he has a) delayed release of GPT multiple times, b) has no monetary incentive, c) has testified frankly multiple times that he cares about safety. People can claim they know better than everyone else what's in his heart, but his actions are out there to be seen and is what ought to be used when writing things like this.
At this point to point the finger back at his actions seems like motivated reasoning.
I'm not sure what you're trying to say, except that you're unhappy the article paints Altman in a certain light - which it certainly is. It's certainly only one way to look at the situation given everyone's actions, but of everything that's been offered so far it does seem to give the most coherent picture.
"Altman started it" is extraordinarily speculative since the evidence points against it. That's it.
Has the situation as a whole, and all the perspectives on Altman that have come out in light of it, caused you to update at all toward seeing Altman as more power-seeking and actively manipulative than you did before? Taken apart from whether he is justified in doing so, or whether he believes he is justified.
If it has not updated you in that direction, I would have to think you already knew a great deal about him and properly modeled him as someone who would plausibly try to reshape the board to give himself more power, or that you aren't giving sufficient weight to the new-to-you information on how he seems to operate.
Made me update much more positively in his direction, but yes my priors were that he was neither naive nor a saint
What does "positive" mean for you here?
My experience on boards is that the ego of the CEO is almost always their blind spot. The more effective they are, the more dangerous the blind spot is. Once someone thinks they are doing God's Work, they are extremely dangerous because they feel completely justified in breaking the rules to get around the meddling fools who don't understand the whole picture. It looks to me like the board felt they had to jump on a grenade; that's how boards work. Sometimes you don't get the full story. Their duty is to shareholders and the charter.
Mostly Altman is who we thought he was, for better and for worse. We were warned. Updated higher on his relevant skills. Unfortunately down somewhat on the depth and genuineness of his safety concern, but not a huge change there either.
You were no doubt more fully informed about him than I was. I've come out of this, so far, more concerned about him being, essentially, the single most influential person in AI development right now; certainly rebuking and possibly attempting to remove Toner from the safety oversight board for being thoughtful about AI safety, if true, is very troubling.
I tried to be clear that Altman has done many good things for safety and has professed safety concerns a lot and often quite well. I have Altman miles and miles ahead of, say, the current CEO of Microsoft or Google if someone has to be making the big decisions. Also I love that he's working on fusion and real biotech. And those are important comparison points. I don't think he's a villain or an enemy.
[EDIT: Added for clarity: I do think that no matter what else he did or didn't do, Altman went against the wishes of many board members, on important questions, and that his rebuke to Toner and attempt at her removal seems likely to be the proximate cause of the events of this week. I think on the other side there has been rather a ton of assuming Altman *didn't* start it. ]
And hey, if I'm wrong about that, I'm wrong. This is, as I said, my best guess.
Yup I get that. Just that it's a section heading, and would've liked the actual evidence separated from speculation.
Yeah, I get it, that's a reasonable objection. Everyone else not adhering to such norms is not a reason to ignore them.
Need for non-competes, anyone?
Zvi, you say Sam Altman is "CEO of OpenAI with zero equity", which is the story I normally see. However Matt Levine in his newsletter recently said that "OpenAI Global LLC is an $86 billion startup with employee and VC and strategic shareholders", which presumably includes Sam as a big shareholder. Any idea if Sam really has zero equity or does he and the employees share in the billions of dollars in equity?
Sam really has zero equity AFAICT. He still greatly benefits in numerous ways when Number Go Up on share price, and has plenty of ways to profit.
in addition to Zvi's comment you shouldnt see Sam and the other employees as the same. the other employees do get "shares" (they're not quite normal shares because of the profit cap but functionally similar) and are in fact in active discussion to self chunks of those shares to investment firms