Discussion about this post

User's avatar
Steven Chicoine's avatar

For a lot of people it will only be "true" AGI when it does something that they can't imagine how it did it. This threshold will keep increasing as the technology gets better and eventually people will be saying "It's just an elaborate parlor trick" as their leg is disintegrating from the nanobots taking it apart for the AGI's space probes.

Expand full comment
Robert Beard's avatar

>From page 77, something about the math step by step thing has me curious.

>

>>However, if GPT-4 “takes its time” to answer the question then the accuracy easily goes up. For example, if we ask the model to write down the intermediate steps using the following prompt: What is the value of the following expression? 116 * 114 + 178 * 157 = ? - Let’s think step by step to solve the expression, write down all the intermediate the steps, and only then produce the final solution. then the accuracy goes to 100% when the numbers are in the interval 1 − 40 and to 90% for the interval 1 − 200.

>

>The explanation given by the paper is that this is the model being unable to plan ahead. I’ve heard similar claims elsewhere, and that is one logical way to interpret step-by-step working where direct answers don’t. I’m not sure I’d quite describe this as ‘it can’t plan’ so much as ‘it can’t do any steps or planning that isn’t part of its context window’ maybe?

This feature is one of the things that most strongly gives me the feeling that GPT4 is intelligent and reasoning like people do when interacting with it. I don't truly understand the ML well enough to say this with confidence, but it feels like there's a distinction between the kinds of logical inference that GPT-4 is doing "under the hood" with self-attention and the more complex things it's able to bootstrap itself into by taking intermediate conclusions and effectively adding them to the prompt. This feels a lot like how I would figure out the answer to 99 * 17:

Thought 1: 99 is just 1 off from 100 (this step is atomic to me and I can't further introspect how I noticed this).

Thought 2: OK, so 99 * 17 is the same as 100 * 17 - 17. (I can explain this in more detail if I needed to, but when actually reasoning, this is atomic.

Thought 3: 100 * 17 is 1700 (basic manipulation of the numbers).

Thought 4: 1700 - 17 is 1683 (mental arithmetic).

It's not obvious that the right way to solve this problem is by using mental math shortcuts rather than going for pen and paper, and if the problem was 64 * 131, just working it out is probably faster than my (poor) mental arithmetic skills.

The observation that GPT-4 is bad at math actually makes me think of it as more humanlike, because humans are also very, very bad at math. Math is hard for language-based reasoners because of the enormous overhead of using general-purpose cognitive machinery to manipulate numbers rather than shifting bits around or using evolved neural structures that are specialized for a particular math-like calculation.

Rather than saying that GPT can't plan, it's more like it's not good at noticing when a good strategy is to bootstrap its basic level of inferential power into higher-order reasoning by building towards the desired outcome with smaller steps. And then it's seduced by it's love of making up bullshit to sound like it knows what it's talking about. But the fact of being able to do the bootstrapped higher-order reasoning is miraculous! And the process of having to remind a reasoner that she can approach problems that are too complex to solve in one intuitive leap by breaking them down is very familiar from teaching math and reading to my kids. "What's the next word?" "I don't KNOW it's too HARD" "OK, start by sounding it out, I know you know that first sound..."

Expand full comment
27 more comments...

No posts