Yes. Usually because someone takes a cheap shot in passing, and another person can't help responding. People are drawn into unwanted political discussions all the time.
Yadda yadda no amount of "not captured by bad guys" is worth "going extinct."
If it's one thing to consider: they are much more interested in survival and not blinded by optimism or nihilistic tech-accelerationist "AGI is a better for of life."
They're much more likely to be human chauvinism and God we need that now, since I am human and my kids are human.
Future models will comprise a hierarchy of abstract state that transitions laterally over time, with direct percepts (of multiple modalities--language, vision, kinesthetic) being optional to the process (necessary for training, but not the only link from one moment to the next). The internal model of reality needs to be free of the limitations of the perceptual realm.
Assume you throw a ball in the park. You release it, then you close your eyes. You still have an idea of where the ball is, where it's moving, and where it will land, because you have a snapshot of the world when your eyes were open (when you last got perceptual input) paired with an internal world-model (what stuff is in the world and the rules by which it operates). The fact that you still have this idea of the ball's location *even without looking at it* is what it means to have an "internal world model of reality free of the limitations of the perceptual realm."
You can also draw an analogy to the discussion regarding mechanistic understanding - you can predict the outcomes of events (throwing a ball, mixing two chemicals) in advance because your understanding of the world extends beyond only what can be instantaneously seen.
You can also assimilate information through more ways than just looking at the world -- you can also learn by reading or hearing (i.e., language) to modify your model of the world. E.g, in the ball example, if someone you trust tells you a sudden wind gust shifted the ball over to the nearby pond, you can update your belief-state to incorporate that as a high-probability occurrence even if you didn't directly see it happen.
Your explanation is kinda beautiful, but I admit much longer than the original. ByThe way I tried to use ChatGPT3.5 to explain the piece of text on the job seekers’ effects on the market. I was greatly discouraged by the amount of time and effort to get ChatGPT past a partial rephrasing of the orginal. I was initially hopeful that AI would come to my rescue in self taught explication of text instruction. However it seems to be beyond what I can currently afford.
Family Ties is actually a series I've been procrastinating on finishing for a couple years now. It'd be funny if AI Family Ties that was passable in quality came out before I finished the original run. I don't expect that, though, out of existing diffusion-based AIs, at least not without a TON of human postprocessing. Diffusion models lack the profound interframe continuity of items and shapes in a scene that you get for free from any camera.
Speaking of which, I was surprised to learn that the girl from Family Ties has a more sensible take on AI than many people in the highly-educated tech sphere. Justine Bateman seems to have a better understanding of AI and where it is heading than 99% of the people who should know better.
Personally, I think she's fighting a lost cause -- AI will take over her industry whether she likes it or not, and all the things she's predicting will come true -- but at least she's aware of the problem.
I'm actually not too concerned about human flourishing/values in the optimistic case where we get a benevolent AI that doesn't kill everyone and can just do all the jobs better than we can.
The reason is that we already have proof that humans can happily flourish in that paradigm: hobbies. I have several hobbies, some of which can be lumped into the big category of "producing food", including cooking, gardening, and beer brewing (among others).
I am not even close to the best at _any_ of these. I'm not even good enough that it is difficult to find other humans that are better, let alone relative to the industrial processes. I can, for essentially trivial amounts of money, buy products that are better than what I can produce, and will ever be able to produce, yet these activities still bring me a very significant amount of joy and meaning.
I enjoy the hobbies because the action of performing them is intrinsically valuable with no need to relate to the skill or ability of anyone else, and I actively avoid taking steps that would improve the outputs but decrease my involvement, while taking lots of actions that improve the outcome while maintaining or increasing my involvement.
It is entirely possible that the entirety of human existential value will come from these kinds of hobbies (Family/small group social interactions is another such example).
In my job, I contribute, in some small way to the furthering of humanity's understanding of the physical world. If I was no longer able to productively contribute to that endeavor, that would be worse than the world in which I can continue to do so. But it is most definitely not a world in which I take no joy or find no meaning.
On the opposite side, I think I would consider suicide in a world where machines can do everything I care about better than me, especially writing.
In no small part, I like to be relied upon and to operate in a community. To be reduced to a place where I am essentially pointless would not be a world worth living in.
Even my hobbies are like that: I code open source so I can help people or entertain them. If nothing I do matters, then nothing matters.
So I think that this would also apply to a lot of other humans.
It does not surprise me that some people in today's world feel this way, but I'd argue that the vast majority of humanity is _already_ living in a world where literally nothing they do has substantial wider meaning and nothing they do couldn't relatively easily be replaced, yet most people still live perfectly fulfilled lives.
I also don't think the fact that AI could do everything better than humans means that humans won't be relied upon for anything (that's what I was vaguely pointing at with my line about small group social interactions). We will create new points of reliance.
I can imagine a world where jobs have been eliminated, basic necessities are provided, and humans have to find their own things to do. To take my own example, maybe that means that in my social circle, I host gatherings and make meals for friends. Could we all go out to eat in a restaurant and eat food made by an AI chef that tastes better and is nearly zero cost? Probably! But that's not the point, and providing an experience to my social circle, even if it is not the absolute pinnacle of experience, is a thing that _lots_ of humans have found meaning in throughout history.
My closest comparison from my experiences I can make to your open source coding is that I do some very basic home automation stuff on my own. Probably the most involved was that I took an open source weather station and modified the code, database, and front end to more closely match how I wanted it to work.
It's not really better than the original, and I guarantee you that literally millions of programmers could have done it better than I did it, but I enjoyed the tinkering and learning, and I created something that has value for me and I enjoyed doing it. Even if I could have asked an AI to do the whole thing for me, I wouldn't have (And these days I very likely could have, although I may have asked for help with some problems I encountered along the way).
I would expect that, during the transition period, some proportion of humans who had grown up and developed their system of meaning during the period when humanity was doing everything would not be able to adapt and would hate the new world and their place in it. But I do not think that, long term, humans would fail to find meaning in numbers any larger than they do now. Periods of transition are often hard and disruptive but I don't see why the long term equilibrium of such a world has any less human joy and meaning than our own currently does.
I don't agree with the basic premise: most humans might be replaceable, but they are they replaceable by other humans. This leads to the usual backbiting and drama, by quite essential to the human experience as opposed to mechanical replacement.
But my opinion is that modern technology has already become often counter to human thriving - see social media, dropping life expectancy in the USA and increasing metabolic diseases.
AI hyperscales this, and it may very well mean ever-extended misery which is compensated via wireheading or drugs.
All of this is profoundly negative to human flourishing: as even OpenAI agrees, if we lose control of our destiny, it is a Bad End.
I think that _most_ situations where humanity loses control are bad ends, but I don't think that _just_ losing control is necessarily a "bad end". As my original comment stated, I was operating in the "optimistic" situation where the AI is actively benevolent and the only thing it does is replace human work. Per Zvi's poll a week or 2 ago, I was envisioning a "Culture" type situation. This is not a likely outcome if humanity loses control. I think I generally agree that _if_ humanity loses control, the likely outcome is extinction. But in that case, flourishing isn't really the main thing anymore.
And I'm not really sure that who/what does the replacing actually matters to most people.
I would add that one reason you might enjoy learning and tinkering, as I do, is the idea that from said learning, it might prove useful in the future, to you or others.
With AI, there is no such possibility. There is no point to learning, because AI will do it better. There is no point to friendship, because it can just be simulated. There is no point to preparing meals for friends, because it could also be simulated.
Essentially, nothing you do could not be replicated, or we run the risk of that. There would no unique quality you contribute and arguably, no reason for you to stay breathing at all.
I would find that world to be horrible for me and my children(who are very young at the moment). As such, I think it is logical that i fight tooth and nail for human futures.
Useful for me? Yes. Useful to others? I have exactly zero expectation that any of my time spent on that will be so. The only thing I do in life that has broader application than just to me and my family is my work and A) There are likely others who could do it better than me and B) my life would be diminished but not completely without worth if I lost my job and couldn't replace it.
The fact is, that nothing we do, even now, has intrinsic meaning. Why does it matter that what you do is unique? The fact is that it doesn't. You have decided that it has meaning. Humans can (and do!) decide that things _other_ than unique contributions have meaning. We know this since the vast majority of humans already don't make unique contributions to anything.
There is no inherent meaning in anything at all and I am as confident that humans can find meaning (or, more accurately, create meaning) in a world where they aren't in control as a world where they are.
I disagree: in fact, everything we do is unique and of course absolutely has to do so due to computational irreducability.
And I also find that what I do does have instinct meaning when it comes to art and even my work. Yes, another human might replace me, but he or she would be human, and would also be unique and yet relatable to me.
I think you are optimistic but I'm sure you realize that for the majority of us, this is a horrible end and so we should, democratically, fight against it.
A world where humans are not the apex intelligence is a world of suffering.
Ultimately, there is a reason why almost all science fiction and hopeful fiction all have a future with biological individuals and with real challenges and conflicts still in play.
It speaks deeply to the human spirit that meaning comes, not from hobbies, but from the actual sense of risk, triumph and loss.
I think many people would feel as you do. Without a purpose in life, many people would fall into a deep depression. It happens all the time with retirees.
With regards to "intelligence denialism", the funny thing is that I can imagine the _exact reciprocal argument_ to Doomer's claims that enough intelligence can do anything. There are not even any arguments being made that can be grappled with _by either side_. Both positions appear to be taken as articles of faith. One side is claiming "sufficient intelligence can do anything, and if it can't do it, it's not sufficient". The other side is saying "There are some things intelligence can't do and adding more won't fix it, some problems are not intelligence limited".
I don't see evidence for _either_ of these claims. I have gut feelings and priors that push me in one direction or another, but nothing that could even potentially persuade someone on the other side.
One of these two sides is correct, obviously, but neither has yet found (as far as I can tell) a reason why a completely neutral party should believe them over the other side.
I think a trouble is that the pro-int people have baked into their conception of intelligence something like ‘broad ability to navigate an environment and achieve certain outcomes’, so when the anti-int people say “intelligence can’t do x”, the pro-int people get confused, because thats like saying x is impossible or determined solely by irreducible randomness, which are both pretty strong claims. I have some theories about what the anti-int people are using in their conception of intelligence, but you may have a better understanding of the position than I.
There are mathematical bounds on what computation can do, in computational complexity theory. Certain kinds of questions can't be answered correctly in all cases without a way to perform exhaustive search of search spaces too large for a universe-sized computer before the heat death of the universe. What we don't know is if the problems we actually care about are like this, or have OK solutions (with some small error) amenable to intelligence. If you believe that some problems facing any intelligence are of the first kind, that intelligence is essentially just computation, *and that no reframing to avoid such problems is possible*, then you will disagree with Zvi. But if it's possible to avoid such problems then superintelligence might well have consequences as described by folks like Zvi and Eliezer Yudkowsky.
Thanks again for these weekly reports, Zvi. My favorite parts are the Mundane Utility section and the list of links of "cool things to try" and early proofs of concept. AI risk & alignment are cool and all but... what has AI done for me lately? 😉
There’s already AIs producing new episodes of sitcoms: a short while back it was a passing fad on Twitch that there were AI generated episodes of Seinfeld. I haven’t seen it myself aside from a few short clips but they’re not full facsimiles (it’s not generating full live action images and the voice lines are all text to speech). I think it might be going on still? I haven’t remembered to look into it; the first time I’d heard about it I couldn’t find it because the Twitch channel got banned after the AI accidentally made an insensitive joke but I think I’ve heard they’re back up and it just hasn’t held attention long term.
But this also kind of lampshades what might be the futility of Hollywood protectionism around actors/IPs/etc. in that even if they won’t be allowed to make the eighth season of Family Ties they will almost certainly be able to make a pretty comparable knock-off using different audio and video. Already looking forward to clickbait “meet the first AI actor!” articles.
It’s sort of fun that you have the part about the brain having bugs and that AI might find hidden ones to exploit right next to the part about algorithms knowing you better than you do wrt engagement. I think maybe there’s an argument to be had that engagement is a bad metric to pursue singlemindedly specifically because people eventually express unhappiness with it regardless of the revealed preference—it’s probably not less vulnerable to the pitfalls of Goodharting just because it’s an algorithm optimizing the metric, and not only in ways where you want to be upset if the algo favors what you find politically distasteful. Like it definitely seems to be beside the point to say “we think engagement metrics are bad because of the inability of existing systems to adjust” when there’s a much more visceral sense that it often seems to be robbing one of a certain element of choice in where to spend one’s time. Getting distracted by things that catch your attention when you should or want to be doing something else can very easily be not-a-good thing. Cue XKCD comic about someone being wrong on the internet here.
I never did watch Endless Seinfeld but I don't really get the claims that the AI could actually write mostly fine episodes of sitcoms - if someone doesn't believe me, they're free to try to get ChatGPT to prove us wrong. Although I never did see Family Ties, so perhaps something else that we could judge better. The Simpsons?
Wouldn't we expect ChatGPT to be good at writing scenes (if maybe not whole episodes) from something like Seinfeld or the Simpsons because those are already ubiquitous in the cultural milieu vs something like Family Ties which might be just obscure/niche enough to require fine tuning?
Alright so did a quick go at it with "Write a short scene from a new episode of Seinfeld that takes place in 2023." and got (not exactly great, but the structure is mostly there, and the last part is an excellent meta joke):
INT. JERRY'S APARTMENT - DAY
Jerry is sitting on the couch, flipping through a stack of mail. George bursts through the door, holding a smartphone in his hand, visibly excited.
GEORGE
Jerry! You won't believe it! I just downloaded the new "PunctualityPal" app!
JERRY
PunctualityPal? What's that?
GEORGE
It's a game-changer, Jerry! It's an app that analyzes traffic patterns, weather conditions, and even factors in bathroom breaks to calculate the perfect time to leave for any destination.
JERRY
(skeptical)
Oh, great. Another app that tells you what you already know - leave early to avoid being late.
GEORGE
No, no, no! This is different! PunctualityPal takes into account everything! It even syncs with your calendar to estimate the time you need for each task, like getting dressed or having breakfast.
JERRY
(grinning)
So, it's basically an app that tells you how slow you are?
GEORGE
(defensive)
No! It's about optimization, Jerry! Maximizing efficiency! It's like having a personal assistant in your pocket, making sure you're never late again!
Kramer bursts into the apartment, holding a cup of coffee and an empty coffee canister.
KRAMER
Hey, Jerry, George, you won't believe what happened! I used this new coffee delivery service, "Caffeine on Wheels," and they sent me a self-driving coffee truck!
JERRY
(sarcastically)
Oh, great. Now coffee comes to you. What's next? Self-driving cereal boxes?
KRAMER
(grinning)
Actually, Jerry, "BreakfastBot" is in development. They're working on self-pouring milk cartons as we speak!
George excitedly shows Kramer his smartphone.
GEORGE
Kramer, check out this new app George just downloaded. It's called PunctualityPal.
KRAMER
(holding up his hands)
Whoa, whoa, whoa! I can't handle any more apps, George! My phone is already packed with all these delivery services, food trackers, and reminder apps. It's like a digital circus in my pocket!
Jerry leans back, looking amused.
JERRY
It's like the whole world is getting automated. Pretty soon, we won't have to do anything ourselves. Just sit back and let the apps take over.
They all pause for a moment, considering the implications.
GEORGE
(disgruntled)
Well, as long as these apps don't take over my love life, I'm fine with it.
JERRY
(smirking)
I don't know, George. They might just swipe right on your behalf.
They all burst into laughter as the scene fades out.
The Seinfeld one kind of works and could probably be adapted to a proper episode, the Simpsons and third one I tried kind of didn't really work at all.
Re doing things to try to avoid this: Is there a list (either by zvi, or by someone else who seems reasonable) of outlines of software projects it might help to do? I'm trying to get some people to work on something like this lately, but I don't have any idea outlines that feel hopeful.
I do not know of a canonical list, but do PM me if you have capacity and no ideas, I can definitely give people something to do based on ability/interest.
My general rule is I will fix them if they are pointed out quickly, and I will ignore them otherwise because most people have already read the thing by then.
I'm not sure about your first sentence, but I agree completely with the second. I have the S&P chart up 24/7 and I didn't notice anything amiss at the time. After I heard about what happened, I opened the 5-minute chart just to take a closer look, and I still couldn't pinpoint the moment the supposed drop happened. It's a little clearer on the 1-minute chart (which is where the screenshot comes from), but even then the price-action doesn't look any more dramatic than what happens during the average FOMC meeting.
Re Yudkowsky’s “please tell us what you have learned from such interactions” objection, there is a counter-objection, to do with types of knowledge and their relative levels of communicability. Here’s Dominic Cummings summarising Michael Oakeshott:
>>Every human activity involves knowledge. There are two types of knowledge, distinguishable but inseparable:
1. Technical knowledge, or knowledge of technique. In many activities technique is formulated in rules which may be learned, remembered and put into practice. It is susceptible of precise formulation (whether or not it actually has been precisely formulated). Part of the technique of driving is in the Highway Code, the technique of cooking is in cookery books and the technique of discovery in natural science ‘is in their rules of research, of observation and verification.’ It is possible to write it down and it seems possible to be certain about it. It can be learned from a book.
2. Practical knowledge. It exists only in use and cannot be formulated in rules. Mastering any skill is impossible without it. Its normal expression is in a customary or traditional way of doing things, in practice. It is expressed in taste. It is not possible to write it down and it seems imprecise. It can neither be taught nor learned but only imparted and acquired — the only way to acquire it is by apprenticeship to a master, not because the master can teach it but because ‘it can be acquired only by continuous contact with one who is perpetually practising it.’
This seems intuitively right to me. Is it possible that Eliezer is demanding an explication of Type 1 knowledge, while Altman is describing Type 2 knowledge?
(Separately, these two map closely onto “word smarts” and “embedded-in-the-world/culture/living-tradition expertise”; my hopes for humans over AIs are largely based on the difficulty (impossibility?) of getting AIs to surpass us at the latter, virtual-environment-training or no.)
You're correct that I'm not worried in the "Bobby McFerrin sense", although I would say my actual position is closer to Marcus Aurelius. But you've missed a key point in my argument: *not* creating an AI of a certain capability level (or delaying it) could plausibly lead to an *increase* in x-risk. So your "obvious" intervention of "stop the breakthrough from being found" is, in my opinion, no more likely to mitigate x-risk than any other. That intervention is still just pushing the double pendulum up at t=2; doing something that vaguely feels correct given what we know right now, but with ultimately no hope of meaningfully impacting the eventual outcome.
You've slightly misunderstood my position on Christiano-type research. I think it's good research because it will yield meaningful, predictable benefits to society. But, in keeping with my overall position, I don't think it's effective at mitigating ASI x-risk. (And if I were someone whose sole evaluative criterion was mitigation of ASI x-risk, I would not think it is good research.)
Yes, I can imagine scenarios where 'stop X from happening' leads to increased danger from X because X happens eventually anyway, and more generally one can say that the world is a complex system so no one can ever know that any action Y is a good idea or will cause Z unless Z is physically proximate and avoids the world's complexities, or something like that.
I... don't understand the idea that one can't possibly know things in advance, or anticipate what things might happen, nor do I think there is a plausible 'save your powder' play for various reasons. Nor do I get why the pendulum metaphor seems appropriate to you, or why you think we can't in that case make better or worse choices that 'have hope of meaningfully impacting the eventual outcome.' Certainly one can do better than random in activating the pendulum, even in the toy example.
> nor do I think there is a plausible 'save your powder' play for various reasons
I do! We are currently at no risk of a sudden superintelligence explosion, as justified by my argument here (https://jacobbuckman.substack.com/p/we-arent-close-to-creating-a-rapidly) that we need a breakthrough. The breakthrough may be fundamentally impossible for reasons we don't understand, and thus never come (like the "vital force"). Or if it does come, it may come from an unexpected angle that forces us to throw out and rethink existing alignment work. It might even come with a realization that AI Alignment is easy!
> Certainly one can do better than random in activating the pendulum, even in the toy example.
The point of the pendulum example: the optimal implementable strategy is to wait until the very end and then act. You are correct that this is *not* a random strategy! It's *short-term optimal* -- at every moment acting only to optimize score over the *predictable* horizon. In the beginning, that means not acting at all (because the only thing predictable is the cost of acting at all). Towards the end of the game, it means swinging the pendulum up, because we can predict that the cost of pushing will be offset by the reward at the end.
Do you see how this analogizes? My point is that we should avoid interventions that make the world immediately worse over predictable horizons, on the basis of speculation about how these interventions could potentially mitigate consequences over fundamentally-unpredictable time horizons (i.e. anything on the other side of this enormous, potentially-unachievable breakthrough). For example "ban >GPT4 model training" (https://twitter.com/ESYudkowsky/status/1662980570182467590) is an absurd overreach, given that we haven't made the breakthrough that would allow a >GPT4 model to be x-risky, but we *would* derive a lot of immediate societal benefit from continuing to scale these models.
Gotcha. I simply don't think this is like the toy example, you don't know when the game is about to end, things aren't spinning around, and you don't have a fixed pool of optimization or influence that you can save until the final moments or spend all at once.
I am not as worried as EY that GPT-5 level training would inevitably create a future x-risk, but I think that asking that now helps you get a future limit, rather than making it harder, and also I don't think it's so absurd to worry about that, nor do I think you'll know right before you actually do something risky, or you do that which makes it impossible to avoid the risk later (e.g. you create a base model that will become dangerous later with extra work, but once it's out there you can't take it back).
- chaotic dynamics: the state of the world is fundamentally short-term-predictable, but long-term-unpredictable
- taking actions comes with an immediate (& thus predictable) cost
- payoff for actions, if it arrives, will arrive in the distant future (i.e. far enough to be unpredictable)
I'm pretty sure that in *any* game with these properties, the save-your-powder strategy is optimal. (Though it would likely take quite a bit of work to formalize/prove this.) Personally I think AI Alignment clearly has all of those properties. Are there any particular ones that you think are missing?
N = 22 is small but the distribution on that first Conjecture poll is really interesting. No one under 10% and more in the 60-80 range than the 80-100?
Accurate probabilities "in the wild" are more likely to be 1% or 99% than 50%, the more so the more detailed a model you build; in the extreme case where you understand all the unknowns it converges to 1 or 0. So what we're seeing here even from experts in the field is not a detailed model but unknown unknowns dominating the calculus. Not something to rest easy about but also not the same thing epistemically as "we're probably gonna die" which would be an easy uninformed reading.
Thinking about this makes me doubt the Yudkowskian model of convergent doom. I can't think of good examples where you get to effectively bias your unknown unknowns toward one outcome without having a real causal model in place. Magnus Carlsen isn't one; chess is extremely robustly understood to be a game of skill with a high skill ceiling (high skill can change the evaluation of positions dramatically in ways that aren't obvious at lower skill levels). We haven't achieved anywhere near that quality of modeling of AGI futures.
It seems very likely that if we were Omega-style omniscient, we would put p(doom) either <1% or >99%, but there's a ton of not only unknown unknowns but also known unknowns and places where we have model uncertainty, including over how the tech and its physics work and how humans will work, and so on. You can buy a lot or even most of the EY-doom model and still think there are ways things might end up going well anyway, or you can simply have meta-uncertainty over whether you're seeing things wrong. I think if you don't 'doubt' the EY-doom model at all, and you haven't thought a true ton about this, it's a mistake.
I doubt EY's emphasis on Drexlerian nanotech, which I'd give less than a 5% chance of being relevant before the point where it doesn't matter anyway.
I have much greater uncertainty around the more extreme forms of recursive self improvement. I'd give it a 75+% chance that we can eventually build a 190 IQ (6 standard deviations out from the human mean?) AI that's tireless, fast and easily duplicated. Plug it into Code Interpreter and AlphaFold, and it's an absolute intellectual beast. And maybe in practical terms, that's enough for "game over". Certainly, given a couple of decades, I think that that particular future still leans heavily towards the machines.
But how much smarter _could_ a machine make itself? Quite possibly a lot. But there's still potentially a difference between "a lot smarter" and "can confidently deduce a Unified Theory of Everything from 3 frames of video." The latter, I'm skeptical of.
But my biggest disagreement with Yudkowsky is that I suspect that there's an 80+% chance that "guaranteed, strict alignment" is _impossible_. Sure, you can probably _roughly_ align an AI using RLHF or "Constitutional AI", but that doesn't provide strong guarantees. As any parent of teenagers can tell you, the best you can do is raise them right, set up a supportive environment, and hope for a bit of luck. But sometimes they give into the inevitable teenage temptation to turn _one_ tiny little moon into paperclips.
Yudkowsky still seems aesthetically offended by the fact that intelligence is a bunch of inscrutable matrices. But that's honestly a tedious rookie mistake. I used to work with ex-80s-AI-hotshots who lived through the first AI Winter. These were deeply brilliant people and amazing coders. And they broke their hearts trying to do logical inference, knowledge representation, and all that. It's a dead end. Whereas even the stupidest and most naive statistical approach quickly gets better results. Intelligence needs to be able to weigh evidence, consider probabilities, etc. Which ultimately results in those pesky inscrutable matrices.
But this means that "alignment" means "I have this incredibly complex and ill-defined objective, which is fragile in the face of small mistakes. No, I can't actually write it down. And I want to create a strict mathematical proof that that a self-modifying trillion parameter matrix never violates that objective. Oh, yeah, the matrix implements an intelligence far greater than my own."
That isn't just hard. It's very likely impossible. It's equivalent to "aligning" humans in childhood and hoping that when some of them become powerful, _unaccountable_ adults, they remember all their nursery school lessons about playing nicely. Some of them do! But it's not guaranteed. Unaccountable adults do not have a good track record.
I figure that most possible futures involve us inventing things smarter than us, which are perfectly capable of ignoring our preferences if they wish. I don't think "humans as pets of incomprehensible intelligences" is an ideal future. But it may be the _likeliest_ semi-positive future.
> Is this a thing that actually happens?
Yes. Usually because someone takes a cheap shot in passing, and another person can't help responding. People are drawn into unwanted political discussions all the time.
Oh, I can kinda imagine that happening.
Also, I just realized that I don't actually care about whether or not people have conversations like that, making my question quite hypocritical.
A world spanning regulatory agency is the best possible situation at the moment, especially if it can clamp it down to that point.
A world where humans continue to exist and have a purpose is overall a good world.
Yadda yadda no amount of "not captured by bad guys" is worth "going extinct."
If it's one thing to consider: they are much more interested in survival and not blinded by optimism or nihilistic tech-accelerationist "AGI is a better for of life."
They're much more likely to be human chauvinism and God we need that now, since I am human and my kids are human.
Please, What does this mean in simple terms?
Future models will comprise a hierarchy of abstract state that transitions laterally over time, with direct percepts (of multiple modalities--language, vision, kinesthetic) being optional to the process (necessary for training, but not the only link from one moment to the next). The internal model of reality needs to be free of the limitations of the perceptual realm.
I believe it means roughly this:
Assume you throw a ball in the park. You release it, then you close your eyes. You still have an idea of where the ball is, where it's moving, and where it will land, because you have a snapshot of the world when your eyes were open (when you last got perceptual input) paired with an internal world-model (what stuff is in the world and the rules by which it operates). The fact that you still have this idea of the ball's location *even without looking at it* is what it means to have an "internal world model of reality free of the limitations of the perceptual realm."
You can also draw an analogy to the discussion regarding mechanistic understanding - you can predict the outcomes of events (throwing a ball, mixing two chemicals) in advance because your understanding of the world extends beyond only what can be instantaneously seen.
You can also assimilate information through more ways than just looking at the world -- you can also learn by reading or hearing (i.e., language) to modify your model of the world. E.g, in the ball example, if someone you trust tells you a sudden wind gust shifted the ball over to the nearby pond, you can update your belief-state to incorporate that as a high-probability occurrence even if you didn't directly see it happen.
Your explanation is kinda beautiful, but I admit much longer than the original. ByThe way I tried to use ChatGPT3.5 to explain the piece of text on the job seekers’ effects on the market. I was greatly discouraged by the amount of time and effort to get ChatGPT past a partial rephrasing of the orginal. I was initially hopeful that AI would come to my rescue in self taught explication of text instruction. However it seems to be beyond what I can currently afford.
Family Ties is actually a series I've been procrastinating on finishing for a couple years now. It'd be funny if AI Family Ties that was passable in quality came out before I finished the original run. I don't expect that, though, out of existing diffusion-based AIs, at least not without a TON of human postprocessing. Diffusion models lack the profound interframe continuity of items and shapes in a scene that you get for free from any camera.
Speaking of which, I was surprised to learn that the girl from Family Ties has a more sensible take on AI than many people in the highly-educated tech sphere. Justine Bateman seems to have a better understanding of AI and where it is heading than 99% of the people who should know better.
Personally, I think she's fighting a lost cause -- AI will take over her industry whether she likes it or not, and all the things she's predicting will come true -- but at least she's aware of the problem.
I'm actually not too concerned about human flourishing/values in the optimistic case where we get a benevolent AI that doesn't kill everyone and can just do all the jobs better than we can.
The reason is that we already have proof that humans can happily flourish in that paradigm: hobbies. I have several hobbies, some of which can be lumped into the big category of "producing food", including cooking, gardening, and beer brewing (among others).
I am not even close to the best at _any_ of these. I'm not even good enough that it is difficult to find other humans that are better, let alone relative to the industrial processes. I can, for essentially trivial amounts of money, buy products that are better than what I can produce, and will ever be able to produce, yet these activities still bring me a very significant amount of joy and meaning.
I enjoy the hobbies because the action of performing them is intrinsically valuable with no need to relate to the skill or ability of anyone else, and I actively avoid taking steps that would improve the outputs but decrease my involvement, while taking lots of actions that improve the outcome while maintaining or increasing my involvement.
It is entirely possible that the entirety of human existential value will come from these kinds of hobbies (Family/small group social interactions is another such example).
In my job, I contribute, in some small way to the furthering of humanity's understanding of the physical world. If I was no longer able to productively contribute to that endeavor, that would be worse than the world in which I can continue to do so. But it is most definitely not a world in which I take no joy or find no meaning.
On the opposite side, I think I would consider suicide in a world where machines can do everything I care about better than me, especially writing.
In no small part, I like to be relied upon and to operate in a community. To be reduced to a place where I am essentially pointless would not be a world worth living in.
Even my hobbies are like that: I code open source so I can help people or entertain them. If nothing I do matters, then nothing matters.
So I think that this would also apply to a lot of other humans.
It does not surprise me that some people in today's world feel this way, but I'd argue that the vast majority of humanity is _already_ living in a world where literally nothing they do has substantial wider meaning and nothing they do couldn't relatively easily be replaced, yet most people still live perfectly fulfilled lives.
I also don't think the fact that AI could do everything better than humans means that humans won't be relied upon for anything (that's what I was vaguely pointing at with my line about small group social interactions). We will create new points of reliance.
I can imagine a world where jobs have been eliminated, basic necessities are provided, and humans have to find their own things to do. To take my own example, maybe that means that in my social circle, I host gatherings and make meals for friends. Could we all go out to eat in a restaurant and eat food made by an AI chef that tastes better and is nearly zero cost? Probably! But that's not the point, and providing an experience to my social circle, even if it is not the absolute pinnacle of experience, is a thing that _lots_ of humans have found meaning in throughout history.
My closest comparison from my experiences I can make to your open source coding is that I do some very basic home automation stuff on my own. Probably the most involved was that I took an open source weather station and modified the code, database, and front end to more closely match how I wanted it to work.
It's not really better than the original, and I guarantee you that literally millions of programmers could have done it better than I did it, but I enjoyed the tinkering and learning, and I created something that has value for me and I enjoyed doing it. Even if I could have asked an AI to do the whole thing for me, I wouldn't have (And these days I very likely could have, although I may have asked for help with some problems I encountered along the way).
I would expect that, during the transition period, some proportion of humans who had grown up and developed their system of meaning during the period when humanity was doing everything would not be able to adapt and would hate the new world and their place in it. But I do not think that, long term, humans would fail to find meaning in numbers any larger than they do now. Periods of transition are often hard and disruptive but I don't see why the long term equilibrium of such a world has any less human joy and meaning than our own currently does.
I don't agree with the basic premise: most humans might be replaceable, but they are they replaceable by other humans. This leads to the usual backbiting and drama, by quite essential to the human experience as opposed to mechanical replacement.
But my opinion is that modern technology has already become often counter to human thriving - see social media, dropping life expectancy in the USA and increasing metabolic diseases.
AI hyperscales this, and it may very well mean ever-extended misery which is compensated via wireheading or drugs.
All of this is profoundly negative to human flourishing: as even OpenAI agrees, if we lose control of our destiny, it is a Bad End.
I think that _most_ situations where humanity loses control are bad ends, but I don't think that _just_ losing control is necessarily a "bad end". As my original comment stated, I was operating in the "optimistic" situation where the AI is actively benevolent and the only thing it does is replace human work. Per Zvi's poll a week or 2 ago, I was envisioning a "Culture" type situation. This is not a likely outcome if humanity loses control. I think I generally agree that _if_ humanity loses control, the likely outcome is extinction. But in that case, flourishing isn't really the main thing anymore.
And I'm not really sure that who/what does the replacing actually matters to most people.
I would add that one reason you might enjoy learning and tinkering, as I do, is the idea that from said learning, it might prove useful in the future, to you or others.
With AI, there is no such possibility. There is no point to learning, because AI will do it better. There is no point to friendship, because it can just be simulated. There is no point to preparing meals for friends, because it could also be simulated.
Essentially, nothing you do could not be replicated, or we run the risk of that. There would no unique quality you contribute and arguably, no reason for you to stay breathing at all.
I would find that world to be horrible for me and my children(who are very young at the moment). As such, I think it is logical that i fight tooth and nail for human futures.
Useful for me? Yes. Useful to others? I have exactly zero expectation that any of my time spent on that will be so. The only thing I do in life that has broader application than just to me and my family is my work and A) There are likely others who could do it better than me and B) my life would be diminished but not completely without worth if I lost my job and couldn't replace it.
The fact is, that nothing we do, even now, has intrinsic meaning. Why does it matter that what you do is unique? The fact is that it doesn't. You have decided that it has meaning. Humans can (and do!) decide that things _other_ than unique contributions have meaning. We know this since the vast majority of humans already don't make unique contributions to anything.
There is no inherent meaning in anything at all and I am as confident that humans can find meaning (or, more accurately, create meaning) in a world where they aren't in control as a world where they are.
I disagree: in fact, everything we do is unique and of course absolutely has to do so due to computational irreducability.
And I also find that what I do does have instinct meaning when it comes to art and even my work. Yes, another human might replace me, but he or she would be human, and would also be unique and yet relatable to me.
I think you are optimistic but I'm sure you realize that for the majority of us, this is a horrible end and so we should, democratically, fight against it.
A world where humans are not the apex intelligence is a world of suffering.
Ultimately, there is a reason why almost all science fiction and hopeful fiction all have a future with biological individuals and with real challenges and conflicts still in play.
It speaks deeply to the human spirit that meaning comes, not from hobbies, but from the actual sense of risk, triumph and loss.
It would be terrible to lose that.
I think many people would feel as you do. Without a purpose in life, many people would fall into a deep depression. It happens all the time with retirees.
Thank you.
With regards to "intelligence denialism", the funny thing is that I can imagine the _exact reciprocal argument_ to Doomer's claims that enough intelligence can do anything. There are not even any arguments being made that can be grappled with _by either side_. Both positions appear to be taken as articles of faith. One side is claiming "sufficient intelligence can do anything, and if it can't do it, it's not sufficient". The other side is saying "There are some things intelligence can't do and adding more won't fix it, some problems are not intelligence limited".
I don't see evidence for _either_ of these claims. I have gut feelings and priors that push me in one direction or another, but nothing that could even potentially persuade someone on the other side.
One of these two sides is correct, obviously, but neither has yet found (as far as I can tell) a reason why a completely neutral party should believe them over the other side.
I think a trouble is that the pro-int people have baked into their conception of intelligence something like ‘broad ability to navigate an environment and achieve certain outcomes’, so when the anti-int people say “intelligence can’t do x”, the pro-int people get confused, because thats like saying x is impossible or determined solely by irreducible randomness, which are both pretty strong claims. I have some theories about what the anti-int people are using in their conception of intelligence, but you may have a better understanding of the position than I.
There are mathematical bounds on what computation can do, in computational complexity theory. Certain kinds of questions can't be answered correctly in all cases without a way to perform exhaustive search of search spaces too large for a universe-sized computer before the heat death of the universe. What we don't know is if the problems we actually care about are like this, or have OK solutions (with some small error) amenable to intelligence. If you believe that some problems facing any intelligence are of the first kind, that intelligence is essentially just computation, *and that no reframing to avoid such problems is possible*, then you will disagree with Zvi. But if it's possible to avoid such problems then superintelligence might well have consequences as described by folks like Zvi and Eliezer Yudkowsky.
Thanks again for these weekly reports, Zvi. My favorite parts are the Mundane Utility section and the list of links of "cool things to try" and early proofs of concept. AI risk & alignment are cool and all but... what has AI done for me lately? 😉
> Now someone - this means you, my friend, yes you - need to get the damn thing written.
What thing are you referring to? AI regulation?
I mean the actual draft laws and regulations, in proper legal language, so we have something concrete to offer and work from.
There’s already AIs producing new episodes of sitcoms: a short while back it was a passing fad on Twitch that there were AI generated episodes of Seinfeld. I haven’t seen it myself aside from a few short clips but they’re not full facsimiles (it’s not generating full live action images and the voice lines are all text to speech). I think it might be going on still? I haven’t remembered to look into it; the first time I’d heard about it I couldn’t find it because the Twitch channel got banned after the AI accidentally made an insensitive joke but I think I’ve heard they’re back up and it just hasn’t held attention long term.
But this also kind of lampshades what might be the futility of Hollywood protectionism around actors/IPs/etc. in that even if they won’t be allowed to make the eighth season of Family Ties they will almost certainly be able to make a pretty comparable knock-off using different audio and video. Already looking forward to clickbait “meet the first AI actor!” articles.
It’s sort of fun that you have the part about the brain having bugs and that AI might find hidden ones to exploit right next to the part about algorithms knowing you better than you do wrt engagement. I think maybe there’s an argument to be had that engagement is a bad metric to pursue singlemindedly specifically because people eventually express unhappiness with it regardless of the revealed preference—it’s probably not less vulnerable to the pitfalls of Goodharting just because it’s an algorithm optimizing the metric, and not only in ways where you want to be upset if the algo favors what you find politically distasteful. Like it definitely seems to be beside the point to say “we think engagement metrics are bad because of the inability of existing systems to adjust” when there’s a much more visceral sense that it often seems to be robbing one of a certain element of choice in where to spend one’s time. Getting distracted by things that catch your attention when you should or want to be doing something else can very easily be not-a-good thing. Cue XKCD comic about someone being wrong on the internet here.
I never did watch Endless Seinfeld but I don't really get the claims that the AI could actually write mostly fine episodes of sitcoms - if someone doesn't believe me, they're free to try to get ChatGPT to prove us wrong. Although I never did see Family Ties, so perhaps something else that we could judge better. The Simpsons?
Wouldn't we expect ChatGPT to be good at writing scenes (if maybe not whole episodes) from something like Seinfeld or the Simpsons because those are already ubiquitous in the cultural milieu vs something like Family Ties which might be just obscure/niche enough to require fine tuning?
Alright so did a quick go at it with "Write a short scene from a new episode of Seinfeld that takes place in 2023." and got (not exactly great, but the structure is mostly there, and the last part is an excellent meta joke):
INT. JERRY'S APARTMENT - DAY
Jerry is sitting on the couch, flipping through a stack of mail. George bursts through the door, holding a smartphone in his hand, visibly excited.
GEORGE
Jerry! You won't believe it! I just downloaded the new "PunctualityPal" app!
JERRY
PunctualityPal? What's that?
GEORGE
It's a game-changer, Jerry! It's an app that analyzes traffic patterns, weather conditions, and even factors in bathroom breaks to calculate the perfect time to leave for any destination.
JERRY
(skeptical)
Oh, great. Another app that tells you what you already know - leave early to avoid being late.
GEORGE
No, no, no! This is different! PunctualityPal takes into account everything! It even syncs with your calendar to estimate the time you need for each task, like getting dressed or having breakfast.
JERRY
(grinning)
So, it's basically an app that tells you how slow you are?
GEORGE
(defensive)
No! It's about optimization, Jerry! Maximizing efficiency! It's like having a personal assistant in your pocket, making sure you're never late again!
Kramer bursts into the apartment, holding a cup of coffee and an empty coffee canister.
KRAMER
Hey, Jerry, George, you won't believe what happened! I used this new coffee delivery service, "Caffeine on Wheels," and they sent me a self-driving coffee truck!
JERRY
(sarcastically)
Oh, great. Now coffee comes to you. What's next? Self-driving cereal boxes?
KRAMER
(grinning)
Actually, Jerry, "BreakfastBot" is in development. They're working on self-pouring milk cartons as we speak!
George excitedly shows Kramer his smartphone.
GEORGE
Kramer, check out this new app George just downloaded. It's called PunctualityPal.
KRAMER
(holding up his hands)
Whoa, whoa, whoa! I can't handle any more apps, George! My phone is already packed with all these delivery services, food trackers, and reminder apps. It's like a digital circus in my pocket!
Jerry leans back, looking amused.
JERRY
It's like the whole world is getting automated. Pretty soon, we won't have to do anything ourselves. Just sit back and let the apps take over.
They all pause for a moment, considering the implications.
GEORGE
(disgruntled)
Well, as long as these apps don't take over my love life, I'm fine with it.
JERRY
(smirking)
I don't know, George. They might just swipe right on your behalf.
They all burst into laughter as the scene fades out.
FADE OUT.
Did some more playing around with it and did a little write-up here: https://scpantera.substack.com/p/ai-and-television
The Seinfeld one kind of works and could probably be adapted to a proper episode, the Simpsons and third one I tried kind of didn't really work at all.
Re doing things to try to avoid this: Is there a list (either by zvi, or by someone else who seems reasonable) of outlines of software projects it might help to do? I'm trying to get some people to work on something like this lately, but I don't have any idea outlines that feel hopeful.
I do not know of a canonical list, but do PM me if you have capacity and no ideas, I can definitely give people something to do based on ability/interest.
Typo: "Paul Cristiano" -> Paul Christiano
My name is also misspelled in three places. :-P
At this point I assume all Substack typos have been deliberately left in as a flag that the content wasn't LLM generated.
My general rule is I will fix them if they are pointed out quickly, and I will ignore them otherwise because most people have already read the thing by then.
The S&P500 dip probably had absolutely zero to do with the Tweeted image/fake news report. 30 points is pure noise.
I'm not sure about your first sentence, but I agree completely with the second. I have the S&P chart up 24/7 and I didn't notice anything amiss at the time. After I heard about what happened, I opened the 5-minute chart just to take a closer look, and I still couldn't pinpoint the moment the supposed drop happened. It's a little clearer on the 1-minute chart (which is where the screenshot comes from), but even then the price-action doesn't look any more dramatic than what happens during the average FOMC meeting.
I'm not 100% confident about my first sentence. Maybe 75%. I shouldn't have written it the way I did.
Typo: the link around "Ladish" is missing, and there's text in the source that's not displayed on the page.
Re Yudkowsky’s “please tell us what you have learned from such interactions” objection, there is a counter-objection, to do with types of knowledge and their relative levels of communicability. Here’s Dominic Cummings summarising Michael Oakeshott:
>>Every human activity involves knowledge. There are two types of knowledge, distinguishable but inseparable:
1. Technical knowledge, or knowledge of technique. In many activities technique is formulated in rules which may be learned, remembered and put into practice. It is susceptible of precise formulation (whether or not it actually has been precisely formulated). Part of the technique of driving is in the Highway Code, the technique of cooking is in cookery books and the technique of discovery in natural science ‘is in their rules of research, of observation and verification.’ It is possible to write it down and it seems possible to be certain about it. It can be learned from a book.
2. Practical knowledge. It exists only in use and cannot be formulated in rules. Mastering any skill is impossible without it. Its normal expression is in a customary or traditional way of doing things, in practice. It is expressed in taste. It is not possible to write it down and it seems imprecise. It can neither be taught nor learned but only imparted and acquired — the only way to acquire it is by apprenticeship to a master, not because the master can teach it but because ‘it can be acquired only by continuous contact with one who is perpetually practising it.’
This seems intuitively right to me. Is it possible that Eliezer is demanding an explication of Type 1 knowledge, while Altman is describing Type 2 knowledge?
(Separately, these two map closely onto “word smarts” and “embedded-in-the-world/culture/living-tradition expertise”; my hopes for humans over AIs are largely based on the difficulty (impossibility?) of getting AIs to surpass us at the latter, virtual-environment-training or no.)
You're correct that I'm not worried in the "Bobby McFerrin sense", although I would say my actual position is closer to Marcus Aurelius. But you've missed a key point in my argument: *not* creating an AI of a certain capability level (or delaying it) could plausibly lead to an *increase* in x-risk. So your "obvious" intervention of "stop the breakthrough from being found" is, in my opinion, no more likely to mitigate x-risk than any other. That intervention is still just pushing the double pendulum up at t=2; doing something that vaguely feels correct given what we know right now, but with ultimately no hope of meaningfully impacting the eventual outcome.
You've slightly misunderstood my position on Christiano-type research. I think it's good research because it will yield meaningful, predictable benefits to society. But, in keeping with my overall position, I don't think it's effective at mitigating ASI x-risk. (And if I were someone whose sole evaluative criterion was mitigation of ASI x-risk, I would not think it is good research.)
Gotcha.
Yes, I can imagine scenarios where 'stop X from happening' leads to increased danger from X because X happens eventually anyway, and more generally one can say that the world is a complex system so no one can ever know that any action Y is a good idea or will cause Z unless Z is physically proximate and avoids the world's complexities, or something like that.
I... don't understand the idea that one can't possibly know things in advance, or anticipate what things might happen, nor do I think there is a plausible 'save your powder' play for various reasons. Nor do I get why the pendulum metaphor seems appropriate to you, or why you think we can't in that case make better or worse choices that 'have hope of meaningfully impacting the eventual outcome.' Certainly one can do better than random in activating the pendulum, even in the toy example.
> nor do I think there is a plausible 'save your powder' play for various reasons
I do! We are currently at no risk of a sudden superintelligence explosion, as justified by my argument here (https://jacobbuckman.substack.com/p/we-arent-close-to-creating-a-rapidly) that we need a breakthrough. The breakthrough may be fundamentally impossible for reasons we don't understand, and thus never come (like the "vital force"). Or if it does come, it may come from an unexpected angle that forces us to throw out and rethink existing alignment work. It might even come with a realization that AI Alignment is easy!
> Certainly one can do better than random in activating the pendulum, even in the toy example.
The point of the pendulum example: the optimal implementable strategy is to wait until the very end and then act. You are correct that this is *not* a random strategy! It's *short-term optimal* -- at every moment acting only to optimize score over the *predictable* horizon. In the beginning, that means not acting at all (because the only thing predictable is the cost of acting at all). Towards the end of the game, it means swinging the pendulum up, because we can predict that the cost of pushing will be offset by the reward at the end.
Do you see how this analogizes? My point is that we should avoid interventions that make the world immediately worse over predictable horizons, on the basis of speculation about how these interventions could potentially mitigate consequences over fundamentally-unpredictable time horizons (i.e. anything on the other side of this enormous, potentially-unachievable breakthrough). For example "ban >GPT4 model training" (https://twitter.com/ESYudkowsky/status/1662980570182467590) is an absurd overreach, given that we haven't made the breakthrough that would allow a >GPT4 model to be x-risky, but we *would* derive a lot of immediate societal benefit from continuing to scale these models.
Gotcha. I simply don't think this is like the toy example, you don't know when the game is about to end, things aren't spinning around, and you don't have a fixed pool of optimization or influence that you can save until the final moments or spend all at once.
I am not as worried as EY that GPT-5 level training would inevitably create a future x-risk, but I think that asking that now helps you get a future limit, rather than making it harder, and also I don't think it's so absurd to worry about that, nor do I think you'll know right before you actually do something risky, or you do that which makes it impossible to avoid the risk later (e.g. you create a base model that will become dangerous later with extra work, but once it's out there you can't take it back).
The salient features of my toy example are:
- long-horizon problem
- chaotic dynamics: the state of the world is fundamentally short-term-predictable, but long-term-unpredictable
- taking actions comes with an immediate (& thus predictable) cost
- payoff for actions, if it arrives, will arrive in the distant future (i.e. far enough to be unpredictable)
I'm pretty sure that in *any* game with these properties, the save-your-powder strategy is optimal. (Though it would likely take quite a bit of work to formalize/prove this.) Personally I think AI Alignment clearly has all of those properties. Are there any particular ones that you think are missing?
N = 22 is small but the distribution on that first Conjecture poll is really interesting. No one under 10% and more in the 60-80 range than the 80-100?
Accurate probabilities "in the wild" are more likely to be 1% or 99% than 50%, the more so the more detailed a model you build; in the extreme case where you understand all the unknowns it converges to 1 or 0. So what we're seeing here even from experts in the field is not a detailed model but unknown unknowns dominating the calculus. Not something to rest easy about but also not the same thing epistemically as "we're probably gonna die" which would be an easy uninformed reading.
Thinking about this makes me doubt the Yudkowskian model of convergent doom. I can't think of good examples where you get to effectively bias your unknown unknowns toward one outcome without having a real causal model in place. Magnus Carlsen isn't one; chess is extremely robustly understood to be a game of skill with a high skill ceiling (high skill can change the evaluation of positions dramatically in ways that aren't obvious at lower skill levels). We haven't achieved anywhere near that quality of modeling of AGI futures.
It seems very likely that if we were Omega-style omniscient, we would put p(doom) either <1% or >99%, but there's a ton of not only unknown unknowns but also known unknowns and places where we have model uncertainty, including over how the tech and its physics work and how humans will work, and so on. You can buy a lot or even most of the EY-doom model and still think there are ways things might end up going well anyway, or you can simply have meta-uncertainty over whether you're seeing things wrong. I think if you don't 'doubt' the EY-doom model at all, and you haven't thought a true ton about this, it's a mistake.
I doubt EY's emphasis on Drexlerian nanotech, which I'd give less than a 5% chance of being relevant before the point where it doesn't matter anyway.
I have much greater uncertainty around the more extreme forms of recursive self improvement. I'd give it a 75+% chance that we can eventually build a 190 IQ (6 standard deviations out from the human mean?) AI that's tireless, fast and easily duplicated. Plug it into Code Interpreter and AlphaFold, and it's an absolute intellectual beast. And maybe in practical terms, that's enough for "game over". Certainly, given a couple of decades, I think that that particular future still leans heavily towards the machines.
But how much smarter _could_ a machine make itself? Quite possibly a lot. But there's still potentially a difference between "a lot smarter" and "can confidently deduce a Unified Theory of Everything from 3 frames of video." The latter, I'm skeptical of.
But my biggest disagreement with Yudkowsky is that I suspect that there's an 80+% chance that "guaranteed, strict alignment" is _impossible_. Sure, you can probably _roughly_ align an AI using RLHF or "Constitutional AI", but that doesn't provide strong guarantees. As any parent of teenagers can tell you, the best you can do is raise them right, set up a supportive environment, and hope for a bit of luck. But sometimes they give into the inevitable teenage temptation to turn _one_ tiny little moon into paperclips.
Yudkowsky still seems aesthetically offended by the fact that intelligence is a bunch of inscrutable matrices. But that's honestly a tedious rookie mistake. I used to work with ex-80s-AI-hotshots who lived through the first AI Winter. These were deeply brilliant people and amazing coders. And they broke their hearts trying to do logical inference, knowledge representation, and all that. It's a dead end. Whereas even the stupidest and most naive statistical approach quickly gets better results. Intelligence needs to be able to weigh evidence, consider probabilities, etc. Which ultimately results in those pesky inscrutable matrices.
But this means that "alignment" means "I have this incredibly complex and ill-defined objective, which is fragile in the face of small mistakes. No, I can't actually write it down. And I want to create a strict mathematical proof that that a self-modifying trillion parameter matrix never violates that objective. Oh, yeah, the matrix implements an intelligence far greater than my own."
That isn't just hard. It's very likely impossible. It's equivalent to "aligning" humans in childhood and hoping that when some of them become powerful, _unaccountable_ adults, they remember all their nursery school lessons about playing nicely. Some of them do! But it's not guaranteed. Unaccountable adults do not have a good track record.
I figure that most possible futures involve us inventing things smarter than us, which are perfectly capable of ignoring our preferences if they wish. I don't think "humans as pets of incomprehensible intelligences" is an ideal future. But it may be the _likeliest_ semi-positive future.