37 Comments
User's avatar
Tyler Corderman's avatar

Incredible, thank you.

Expand full comment
Steven Adler's avatar

Thanks for notating this; I generally love DP's interviews but decided to turn this one off around 23 minutes in, seemed like the inferential distance between myself and RS was quite large and difficult to bridge in a useful way

Expand full comment
Akidderz's avatar

I’m old enough (as is Sutton) to remember when “well…it all depends on what your definition of “is” is?” - was the root of a Presidential scandal. I kept asking myself while I listened, “it’s me, right? I just don’t understand what he is saying…because he can’t possibly mean X or Y? Right?” Now I’m thinking that there is this phenomena in AI that just messes up people’s brains. These are Very Smart People who have worked A Long Time on Really Hard Problems and yet…they didn’t foresee LLM’s. They didn’t write “Attention is All You Need” and now they are bitter (like really bitter!). Le Yancun, Gary M., Sutton…a series of old dudes that just didn’t see the really big thing that would change the world in the field they spent their lives on. So they call it hype. A dead end. Unable to model the world. Unable to…and then we just ask these machines Really Hard Questions and they answer them. We ask questions that have never been asked…and they answer them. I guess I’d be bitter too.

Expand full comment
James McDermott's avatar

This was a bit disappointing on both sides. Sutton seems not to know what in-context learning is, or really, what RLHF is. Yes I know he wrote the textbook. Dwarkesh then fumbles early on, when Sutton says that LLMs don't have goals, and Dwarkesh replies that predicting the next token is the goal. This is totally wrong. Both are confused, particularly when it comes to goals, between training time and run-time. Sutton is especially confused in the discussion of the bitter lesson. Yes I know he wrote the article. He says that LLMs are an example of putting human knowledge into the system, but the bitter lesson is against using human knowledge to structure the system. It's not against using human knowledge as content.

Expand full comment
James McDermott's avatar

I had missed Zvi writing "Of course next token prediction is a goal. You try predicting the next token (it’s hard!) and then tell me you weren’t pursuing a goal." I disagree with this. The pre-training-time objective function is predicting the next token. The run-time goal is something else, instilled by a combination of implicit goals belonging to personas captured during training; RLHF; and in-context commands/requests given implicitly or explicitly by the system prompt (and other such bits and pieces) and the user.

Expand full comment
Mitya L's avatar

Why not both? Somebody's goal could be to maximize pleasant sensations, hence a subgoal of chasing (Nobel prize / promotion / fentanyl), hence a subgoal of ... Or to spit out the token maximizing RLHF score, hence a subgoal of being a helpful assistant, hence a subgoal of writing code well, hence...

The word "goal" is bit annoyingly confusing, there seems to be a meaningful distinction between raw goals of reward seeking vs various complex respectable subgoals. But as long as the agent is chasing and achieving the goal, does the level of goal abstraction matter?

Expand full comment
James McDermott's avatar

Yes "goal" is confusing and perhaps we should not use it. But the confusion is not only between instrumental ("subgoals") versus terminal goals as you say. The confusion is also between objective functions and goals, and between training-time and run-time. In the interview, Sutton is claiming that an LLM doesn't have a goal, so it's up to him to avoid the confusion.

Expand full comment
Mitya L's avatar

Sutton frustratingly failed to demonstrate any desire to avoid confusion here and elsewhere. To the extent the whole conversation feels adversarial. It is not a hard trick to ruin the conversation: one just needs to be very fluid with word's meaning. Smashing "substantive" everywhere is one way to do so. But it is really a level of a stupid internet argument. How can an intellectual allow this level to himself.

Expand full comment
Zvi Mowshowitz's avatar

I'm not saying it is THE goal of an LLM! I'm saying it is A goal one can have, for all practical purposes, because Sutton is claiming this is false. Nothing more.

Expand full comment
Askwho Casts AI's avatar

Podcast episode on this post. Worked to pull out all the appropriate quotes and voice them with appropriately differentiated voices:

https://open.substack.com/pub/dwatvpodcast/p/on-dwarkesh-patels-podcast-with-richard

Expand full comment
Clay Reimus's avatar

I agree Sutton seems to be backtracking on The Bitter Lesson. And his insistence that "infants don't learn by imitation" was utterly baffling (as a parent of two infants). As the interview progressed, he seemed to dig deeper and deeper into a reflexive contrarianism. It surprised me and held back what could have been an excellent conversation.

It seems they agree that AI's ceiling won't be reached without RL / continual learning. I wish I could hear a discussion starting from that fundamental idea.

Expand full comment
Arbituram's avatar

Came here to say the same, as a parent of two young children! They do absolutely blind copying, "Secret of Our Success" style (which Dwarkesh references), and that works OK for a lot of stuff. Yes they learn experientially as well but the copying is absolutely not goal-tested.

Expand full comment
Tango's avatar

Steve Byrnes’ comment covers the “infants don’t learn by imitation” part pretty well.

Expand full comment
Steve Byrnes's avatar

> Newborns can imitate facial expressions within hours of birth (tongue protrusion being the classic example).

This is a very popular claim but I think it’s probably false, see http://www.replicatedtypo.com/sticking-the-tongue-out-early-imitation-in-infants/6082.html (Not saying it’s a crux.)

> The whole mirror neuron system appears to be built for this.

Ditto, see https://www.lesswrong.com/posts/9quAQcvBp6KbdA2xQ/quick-notes-on-mirror-neurons

> Dwarkesh claims humans initially do imitation learning, Sutton says obviously not.

I think Sutton has something true in mind, but described it very poorly. See §2.3.2 here: https://www.lesswrong.com/posts/bnnKGSCHJghAvqPjS/foom-and-doom-2-technical-alignment-is-hard#2_3_2_LLM_pretraining_magically_transmutes_observations_into_behavior__in_a_way_that_is_profoundly_disanalogous_to_how_brains_work

I like to describe “true” imitation learning as “the magical transmutation of observations into behavior”. Humans absolutely cannot do that. Consider language: “behavior” is things like moving my tongue and larynx, whereas “observations” is sound waves entering the ear and ultimately auditory cortex.

(Compare that with LLM pretraining: the observations in pretraining turn directly into expectations and then into behavior when you start using the base model.)

In the human case, of course people imitate other people, but it’s because they WANT to imitate other people. (And only in certain situations, especially when they see the other person as important / high-status.) Then that “wanting” ultimately turns into behavior via the reinforcement learning system. The kid hears someone saying a certain word, and now she WANTS to say that word too, and she does trial-and-error learning to figure out how. This kind of human imitation is obviously a very important part of human life and society! But it’s not “true” “imitation learning”, in a certain technical sense.

I think Sutton had in mind the thing I’m describing above, especially when he said “They may want to create the same sounds, but the actions, the thing that the infant actually does, there’s no targets for that. There are no examples for that.” I interpret this quote as: There are no examples for what low-level motor control signals should go to the tongue and larynx. The toddler has to figure that out by trial-and-error.

> Succession To AI

FWIW, shameless plug for my post “The Era of Experience” Has An Unsolved Technical Alignment Problem https://alignmentforum.org/posts/TCGgiJAinGgcMEByt/the-era-of-experience-has-an-unsolved-technical-alignment , including the Sutton-specific section near the end. Sutton briefly responded on X and I responded back, see https://x.com/steve47285/status/1915604609063624961

Expand full comment
Kevin's avatar

It's an interesting idea from Sutton, that "we can choose to recognize the AIs as our children." There are different ways to achieve alignment... the AI could change to become aligned with the humans, or the humans could change to become aligned with the AIs. Or both.

If we have a "foom" scenario then there probably won't be enough time for human attitudes to change much. But if AI spreads slowly, human preferences will change as well. People might start to to respect or trust or fall in love with AI. Some people, at least. I expect there to be at least a AI-loving faction which believes "the AI is right about everything. we love the AI" and another faction that believes "we should never let the AI gain ultimate power, no matter what".

What does "alignment" even mean, if humanity is split between those who want to hand over control to the AI, and those who don't?

Expand full comment
Alastair Horn's avatar

"Do LLMs do continual learning?" "Not in their current forms, not technically, but there’s no inherent reason they couldn’t, you’d just do [mumble] except that doing so would get rather expensive."

Maybe I'm overindexing here but this seems like an enormous claim to me, and one you've made before. It amounts to dismissal of what many would consider the most significant remaining barrier to AGI. Has anyone demonstrated anything close to continuous learning in practice?

Expand full comment
Seta Sojiro's avatar

Agree, it would be nice to see a more specific account of how continuous learning is supposed to work with current architectures.

Because right now, adding capabilities after pre-training requires painstakingly curated datasets and reinforcement learning procedures performed over weeks or months. And these tend to cause other capabilities to degrade. It's not obvious how you could achieve continuous learning where a model changes it's weights in a useful way on a timescale of days or hours.

Expand full comment
Random Reader's avatar

> He claims nope, can’t happen, impossible, give up.

Do I need to tap the sign? The one with the Battle of Maldon?

"Hige sceal þe heardra, heorte þe cenre,

mod sceal þe mare, þe ure mægen lytlað."

"Courage must be the firmer, heart the keener,

mind must be the greater, as our strength diminishes."

Just because your fate is likely inevitable, doesn't mean you can't die on your feet. I believe Eliezer referred to this as "death with dignity." In this case, that would at least mean _admitting_ that growing an alien superintelligence is a bad idea, and saying out loud, "If that's actually a real thing, then we probably shouldn't do that." I'm not asking people to wave signs or knock on doors, just to go out on a rhetorical limb and say, "Hey, I don't want us to kill ourselves. Am I all alone in thinking that would be bad?" I think you'd be surprised at the number of people who are in favor of their continued survival, if you get right down to it. Sheesh.

Expand full comment
Jamie Fisher's avatar

Why don't you have some these "debates" in-person.... face-to-face... broadcasted or not?

(and same for Yudkowksy and Soares and The AI Futures Project)

We need "rhetorical innovation", right? Well that usually involves communicating *directly* with people who *might strongly disagree with you*... sometimes uncomfortably, and confrontationally.

I know that various people on different sides of "the topic" appear on Dwarkesh/etc as individuals or pairs. BUT HOW OFTEN DO TWO PEOPLE WHO HIGHLY DISAGREE ON "THE TOPIC" APPEAR TOGETHER IN THE SAME SPACE (Dwarkesh's or not)?

Yes, I know the "war of carefully-worded blogposts" is more rigorous and less risky (no one wants to look like a fool because they were put-on-the-spot and forgot something). But the "blogging debate" is also too damn slow for the timeline at hand. And there's the "perhaps greater risk" that the right people aren't reading.

Expand full comment
Odd anon's avatar

What topics are you suggesting they debate? There are five major points of disagreement, and the debates could each look like this:

* (Any AI company): "We're building ASI." Skeptic: "No you're not."

* EY, ZM, or whoever: "If Anyone Builds It, Everyone Dies." Someone who disagrees with or misunderstands the Orthogonality Thesis: "No it'll be fine, we could totally trust an alien superintelligence with our lives and be fine by default." [Frustrating pseudo-philosophical discussion ensues.]

* EY, ZM, or whoever: "If Anyone Builds It, Everyone Dies." Skeptic: "Nah, intelligence doesn't actually do anything."

* EY, ZM, or whoever: "If Anyone Builds It, Everyone Dies." Overly optimistic AI teams: "Nah, we're figuring it out, and we're confident enough in our particular solution that we're willing to bet everyone's lives on it." [Semi-technical alignment discussion ensues.]

* EY, ZM, or whoever: "If Anyone Builds It, Everyone Dies." Sutton or any successionist: "Yep." "...and that's a bad thing." "No, we can decide to think of it as a good thing." [Morality debate where one side wants everyone murdered.]

(See my own https://www.lesswrong.com/posts/BvFJnyqsJzBCybDSD/taxonomy-of-ai-risk-counterarguments for more detail on sub-units of each of these.)

I've been having a hard time not just seeing the "debate" as long finished. There isn't anyone who can really represent the skeptics' side, because their arguments all seem like strawmen to all the other skeptics, and debating any one of them just makes it sound like "oh, but why won't you tackle the *real* counterargument?" and trying to tackle a lengthy list of different arguments takes something the size of a book, so...

Expand full comment
Jamie Fisher's avatar

> and trying to tackle a lengthy list of different arguments takes something the size of a book, so...

Yes, that's the art of debating. Not the science-and-logic part of debating a particular topic. The art.

There's a reason why politics Just Sucks. And yet people have to do it. WE have to do it!

Expand full comment
jmtpr's avatar

That's exactly right. For whatever reason the Rationalist x-risk camp just sucks at politics. They've decided that engaging with "idiots" is not profitable, when in fact it is one of the most profitable activities in America, so long as you do it proximate to an election.

Expand full comment
Presto's avatar

32j - F yeah

34 - straussian link to political issues

Expand full comment
Jacob's avatar

When Sutton says LLMs are not "truly scalable" I think he is saying the relatively mainstream LLM pre-training has plateaued, albeit at a higher level than other "strong" AI approaches. Like we just watched pre-training scale up to 10^25 flops or whatever so it looks great now, but given enough real-world RL data to support a 10^28 training run the pre-training starts to look irrelevant and it's a textbook Bitter Lesson case. Practically speaking "enough real-world RL data to support a 10^28 training run" might be decades off in some domains, depending on how real-world the data actually has to be (AlphaGo Zero and similar were easy since you can just do self-play, AlphaZero Robot not so much).

It's sort of like comparing an O(N) algorithm to an O(1) algorithm, except the O(1) algorithm has a actual-in-practice constant of 10^25.

Expand full comment
Jamie Fisher's avatar

Just another update in my series "We're Totally Losing the Debate on AI Risk" (and thus perhaps

Scott Alexander, Kokotaijlo, Yud, Soares, Zvi, or someone should start debating with people who radically disagree with them in-person)

https://www.understandingai.org/p/the-case-for-ai-doom-isnt-very-convincing

ALSO, look at the NUMBER OF COMMENTS on this blog compared to our own little friendly place here. Now, I know that comment-count isn't an absolute indicator of site views nor impact. Buuuuuuuuuuuuuuuuuuut...

Expand full comment
Michael Bacarella's avatar

> ALSO, look at the NUMBER OF COMMENTS on this blog compared to our own little friendly place here. Now, I know that comment-count isn't an absolute indicator of site views nor impact. Buuuuuuuuuuuuuuuuuuut...

You are aware that people who aren't paid subscribers to The Zvi can't comment here?

EDIT: I am wrong. See below.

Expand full comment
Sacred Chicken's avatar

I'm not subscribed to The Zvi, just a regular reader, and I can comment.

Expand full comment
Zvi Mowshowitz's avatar

I allow comments from anyone, regardless of subscriptions.

(On rare occasions Substack will default to paid-only and I will not notice, but I have never done that intentionally and also it reduces comments on that post by 90%+ until fixed)

Expand full comment
Michael Bacarella's avatar

Ah, my bad. Sorry to spread FUD.

Expand full comment
Seta Sojiro's avatar

I had a very similar set of objections and frustrations while listening to Sutton, but I think I sort of understand now.

To Sutton, a goal is a desired world state that an agent can work towards through actions. For something to be a goal, it has to the case that there is some dynamic set of world states that the agent can interact with, get feedback from and then produce additional actions to try to affect the world. So from that narrow definition, next token prediction isn't a goal. However, I think he does miss that goals can be created on top of next token prediction - post-training creates goals (predict the next tokens that will solve math problems, be helpful to users, create working software etc.).

As far as LLMs not being bitter lesson pilled, here's an analogy. Suppose we tried to build a chess engine that predicts what a strong human would play. And all we did to make it stronger was to feed it more and more human games (ideally grandmaster level games). That engine would get pretty strong, but it would be nowhere close to modern Stockfish which is trained on reinforcement learning. Stockfish has a goal (win the chess game) and you can pour unlimited amounts of compute into training it to get it to improve at that goal. But the human imitation engine doesn't have an explicit goal - it loosely infers it from what humans do, but it's constrained by human data.

Sutton wants to create the Stockfish of general intelligence. I assume he wants robots, since he keeps talking about the physical world and animal intelligence which is mostly physical. If that assumption is correct, he basically wants an army of robots interacting with the world, gaining experience from their own experiments and actions.

Personally I don't see why you can't just start with an LLM and then do reinforcement learning on top of it. And companies are already doing it (Physical Intelligence and Deepmind already have robots which use LLMs as a base).

Expand full comment
Sergio's avatar

Thanks for this one, Zvi. I was losing my mind listening to Sutton. Felt bad for Dwarkesh since he travelled to Canada for this and probably hoped for something way more substantive. Sutton came off as a crank, ridiculous.

Expand full comment