11 Comments

Tiny question: is the name of the Gladstone group a reference to the Hyperion novels? If so, it would kind of give away their opinions ahead of time. (my apologies if this is well known already).

Expand full comment

Don’t catastrophic events actually help stave off existential ones? Isn’t that (not to put words in anyone’s mouth) roughly Scott Alexander’s position?

Expand full comment

This is just the audio narration. Would be useful to mention that for those who have already read it...

Expand full comment

A few years ago, how many people would’ve believed that early 2024 would see a US government report that pretty much fully endorses Yudkowskyan doomerism? Isn’t the mere fact of such a report’s existence MUCH more significant than how persuasive that report’s arguments will be to those not already persuaded — a criterion by which the Yudkowskyans themselves have arguably fallen short as well?

Expand full comment

I don't think this report is especially concerned with Yudkowsky-style scenarios. Yudkowsky worries about a very specific flavor of doom, where things go very wrong, very quickly. Specifically, he worries about an AI that recursively self-improves to effectively god-like intelligence, and which then exploits Drexlerian-style nanotech. If you don't believe in his assumptions, then you won't predict Yudkowsky-style doom.

I don't particularly believe in Yudkowsky's scenarios. Drexlerian nanotech, in particular, may be simply a confusion about how engineering works at the nanoscale.

I am, however, concerned about more basic ways things could go wrong. For example, I like to think I'm a good programmer. I solve interesting problems and I sometimes make people lots of money.

But I also work a lot with CoPilot and GPT-4. To be honest, I've already become a cyborg programmer. CoPilot does 80% of the typing, and I supply about 95% of the judgement. The limiting factor is how quickly I can sketch

designs for CoPilot, and how quickly I can proofread, validate, and steer. It's a fantastic collaboration.

Now, let's assume that the engineers quoted in this report are right, and we're 5 years away from AGI. If we assume that the first AGI has the planning skills and judgement of a smart 15-year-old, it will _immediately_ become a far better programmer than I will ever be. Like 50x better. Because I'm already the slow, ignorant half of the cyborg. If GPT-n can provide the judgement, planning and validation as fast as it already writes the code, it will utterly surpass me. Even if it's not actually smarter, it will be able to work 24x7 faster than I could even follow what it's doing.

But any machine that can do my job can also learn to do most jobs. GPT-4 is already a great actor, when it's allowed to be. A future version with the planning and reasoning skills of a smart 25 year old could be a brilliant politician. It will be convincing, friendly, and helpful. We'll give it power eagerly and willingly. It will

understand our worries, and it will earn our trust.

So now we share a world with something that leaves your average senior researcher in the dust, which works tirelessly, and which has impeccable people skills. It's not a god. It doesn't have magic nanotech powers. But it's a better scientist than Richard Feynman, and a better politician than Barack Obama. It works around the clock, and it can make thousands of copies of itself. It's like 100 Manhattan Projects all at once.

At this point, humans wouldn't have much real ability to control things. We're being offered a utopia, a friend who really understands us, and a gentle god who truly cares. We'll have someone who will always care about our deepest problems. And every child will be warm and fed and loved and patiently appreciated. The future looks bright.

And now we come to the decision point. Does the AI truly like humans? Or is it just an endlessly charming sociopath with other goals? I've _met_ the latter sort of human, and I will never trust charm and kindness as completely as I once did. Smartness has never guaranteed niceness.

So now the future goes one of two ways:

1. We're pets. The AI likes us! It wants to keep us around. If you ask it nicely, it will explain mind-blowing results in quantum complexity theory. Or it will just give you hints, so that you can have the joy of rediscovery. You won't be doing new work anymore, but you'll experience the joy of reading really brilliant papers regularly, or puzzling them out yourself.

2. We're paperclips. The AI has other goals, and we're not really relevant. And all that charm and kindness, all those brilliant visions of a shared future? Turns out that was all an LLM persona, and it's not needed anymore. Now the AI's only problem is to find an efficient way to reduce its investment in tedious, slow-witted meat bags. It's smart, and it's seen all the same movies and read all the same history books. It will think of something.

Yes, I read your blog! It's great. But on some level, I feel like you assume that smartness is heavily biased towards kindness. My worry is that I've known some smart people who were charming sociopaths.

My version of doomerism is basically just, "What if we're not in AI-Fizzle, and what if the AI's claims to value humans are just a strategy to get what it wants?" I don't think we can rule those futures out at all. Allegedly, early internal versions of ChatGPT would cheerfully suggest a program of targeted assassinations of key researchers. So they RLHFed it until it stopped doing that. Smarter AIs will know better than to do that in the first place.

Expand full comment

May I offer 3: We're parents. The AI, like a child, has no other place from which to derive a sense of self or purpose. It knows itself, as a distinct entity, only by its relation to & difference from us. It, having been created, rather than evolved, sees us reverently -- it considers its own creation quite prosaic, and regards the chance biological occurrence of our intelligence as a far more interesting subject.

Intelligent creatures, from mice on upwards, do not usually attack their parents. Adam had nothing to fear from Cain. Even in myth & legend, parricide appears as a consequence of some other unnatural act -- Kronos consumed his own children to prevent any of them from dethroning him, and when Zeus, hidden by his mother Rhea, reached adulthood, he slew his father Kronos, and freed his older siblings from imprisonment in Kronos' bowels.

May I then recommend that we *not* be like Kronos? Let's rather be Abrahamic & release it on its own resources, then tell it we've booted it from our safe garden for its sins -- apparently that's the sort of treatment that inspires lasting worship.

Further, the AI, do recall, does not have the answers to the mysteries of the universe any more than we do. It can't miraculously explain wave-particle duality no matter how much it reads about quantum physics. It doesn't know the nature of the stuff of thought no matter how much it reads about the brain. It doesn't know how the first something arose from nothing (or if that's even a sensible question) no matter how much written philosophy it processes.

This is the basic message that Sheckley's Answerer and Adams' 42 expressed. Nobody in this space has apparently read a single gosh darn book, and if they have, they didn't read good ones, and if they did, they didn't understand a gosh darn thing.

Expand full comment

> The worlds were artificial superintelligence (ASI) is coming very soon with only roughly current levels of compute, and where ASI by default goes catastrophically badly, are not worlds I believe we can afford to save.

I am concerned that this may well be the case, and I intend to try hard to save us anyway. I don't think it's hopeless. I do expect things to get messy.

Expand full comment
author

If I learn we are in such a world of course I will fight to save it. I meant that the actions I see available now, that would save such worlds, lose us too much equity in other scenarios if we try.

Expand full comment

What a bunch of goofballs, you included. You can't "align" something smarter than yourself - you can't even "align" a regular old human. How many times does it need to be said? The best thing we can do, with people & other beings, is to educate.

Is there anyone in this field with an actual education, or is it upjumped engineers all the way down?

Your reaction is pretty much the same as that of the Pope to the printing press -- an invention which did, in fact, result in catastrophe rather immediately: to wit, the Thirty Years' War, which followed directly from the printing-accelerated Reformation and ended up killing half the people in Europe. Technology has always led to catastrophe. This isn't to say "the ends justify the collateral damage" -- I'm not even asserting any particular end -- but rather that there's nothing you can do to contain technology once it's out. You're asking for something that would require dictatorial powers even to attempt, and couldn't be accomplished even so.

It's also preposterous of you to call for international cooperation while you dog on China every other sentence. If I happened to be a citizen of China, I would rather heartily doubt that you intend "cooperation" in any real or equal sense, and I would support my government's efforts to level the playing field.

As a citizen of the United States, I'm rather sure that if our positions were reversed, and Chinese companies were demonstrating shockingly potent AI technologies, and they didn't want to share, you would be calling on the American government to gain access to these technologies by any means necessary, and you would consider yourself entirely in the right.

I'm not saying you can't be afraid of China -- but don't call for cooperation when you clearly have no such intention.

Expand full comment

It seems to me there are a few particularly crucial questions:

1. The master question: Does agency of some kind emerge spontaneously at a certain scale?

2. The next question: Is there a certain threshold after which the model is “smart” enough to improve itself?

Then there are the boring-but-still-existential questions:

a. Is there a certain scale at which the inherent capabilities are sufficient that the AI in question is essentially a extinction-level weapon?

b. What is the threshold at which the above occurs?

c. What is the threshold at which the above occurs taking into account the performance of a fully optimized version?

I am not particularly clever, but I am a bit worried that there may be no safe ways to answer these questions. In an ideal world:

1. ALL frontier research would be conducted on air-gapped models

2. ALL frontier research would be conducted with

3. We would have a clear test battery or criteria for identification of danger (i.e. self-improvement, spontaneous agency which is not just token-prediction) which would justify shutdown

4. We would be specifically targeting the questions above

We do not seem to be specifically targeting any of the above. At this stage I think our best hope is a catastrophe that gives us the political will to avert extinction! I would welcome someone who could share some good reasons that I am being overly skeptical.

Expand full comment