27 Comments

These updates are massive and difficult to get thru, without having the GPT machine produce a summary. But I did not see this included: https://youtu.be/R0wBMDoFkP0 (mention of the LLM, not Jean Picard the II).

Expand full comment
Jul 6, 2023·edited Jul 6, 2023

Is it worthwhile encouraging each other to be strict/pedantic about the terms "validation set" and "test set", or has that ship sailed? That recent terrible (thankfully now retracted) MIT curriculum arXiv paper misused the terms in a bad way: "GPT-4, with prompt engineering, achieves a perfect solve rate on a test set..." It worried me a little to see smart people take the time to critique the paper in great detail, when it should've been a kind of auto-fail (if only there was such a thing...) until they do a find-replace "test set" -> "validation set".

This is more a discussion prompt than a serious literal question about "test set". (*) Is it worthwhile encouraging each other to be pedantic/strict about _any_ use of language? In case you've no idea where this is coming from: you wrote "test set" several times in reference to a validation set error in your (as usual excellent) commentary inspired by the grokking convo. How obnoxious of me to suggest a correction in this obviously harmless context where no misunderstanding could arise, right? But here's a counterpoint: humans, even very smart humans, use language by imitation. It's not possible to plot a test set error over time with the strict meaning of "test set", because you're not allowed to repeatedly evaluate your model on a test set. I think it's a useful distinction. Many years ago, I remember Hinton introducing the terms in his first lecture, and he was quite passionate about the distinction.

A follow-up question I've often wondered about: Supposing the answer to (*) is yes: (**) is it necessary to be tolerant of, and responsive to, nit picks like mine? I suspect the answer is yes. A possible generalization: there's no effective Schelling fence that doesn't sometimes demand obnoxious respect.

Would love to hear thoughts on (*) and (**). I hope it's clear that those are the questions I think are interesting, and that "test set" was just what prompted me to ask.

Expand full comment

One issue I have with the Superalignment Taskforce is that based on everything I've read, throwing more resources at AI alignment probably won't help much rigbt now. What we need is a theoretical breakthrough, and that's not something you can buy. I would prefer they put those resources in reserve so that if someone discovers a path to alignment, we can quickly start work. You need an Einstein to lay the groundwork before you can start a Manhattan Project.

Expand full comment

I hope I can still convey my gratitude and appreciation by just repeating "Thanks!" on everyone of these posts!

(I was going to comment about what Dan already pointed out. These posts are so long that I alway want to start writing my comment(s) almost immediately after beginning reading.)

Expand full comment

The whole 'training data copyright' drama is a little confusing to me. Isn't it obvious that _human_ intelligences are doing something _remarkably similar_ to AI training when we see/read/listen to other's art?

Expand full comment

Damn – the section on grokking could use a whole expansion into its own post.

Similar to you, I _breezed_ thru high school chemistry (including the AP version), but struggled in my one and only college chemistry class because whereas the former was 'all algebra', the latter was more like 'memorize this big list of possible chemical interactions' (AND the lecture portion was at like 7a, in a big auditorium, and the lecturer turned the lights down _real_ low).

(I also was never able to reliably replicate any of the experimental results we were assigned to perform. I'm _sure_ there's all kinds of neat/interesting practical knowledge that I think no one really bothered to explain sufficiently so that I could _practice_ it long enough to become proficient.)

Expand full comment

Have to say that OpenAI is my hero at the moment. Much as I wish that AGI never happens, I find myself more chill with OpenAI AGI than others.

Expand full comment

Oh my God, that signature line is so tempting.

In regards to teaching and homework, I'm a college writing instructor, and I'm switching every single task to in class work. A lot more reading, editing, and revising work now, with almost 0 from-scratch writing.

Students will be encouraged to use llms, but the standards will shift to producing actually useful, well-written content that the intended audience would actually read.

In other words, I'm removing a lot of the entry-level work in favor of focusing on what quality content and excellent support/presentation looks like in adult, professional communication.

It's a short term adjustment given where I expect capabilities to go, but it is a COMPLETE overhaul. And I was already pretty far along this path in comparison to my peers.

My colleagues are still trying to outlaw AI while they're all using it to create lesson plans and presentations (check out gamma.app if you're unfamiliar).

Should be an interesting semester.

Expand full comment

On the issue of getting useful, nuanced summaries, the best results I've had from both my students and AI is to ask them some variation of: what are the 5 most surprising assertions in the text given your knowledge of similar texts?

The "surprise" covers more ground than you'd think, including uncovering assumptions, cliches, genre expectations, stance prejudice, and so on.

Still not "good." But better than other prompts I've tried in a relatively simple format.

Expand full comment

Here’s the passage quoted by Roon, from Alfred North Whitehead, in full:

It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle – they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.

Isn’t this passage suggesting something very different from “abstracting away” cognitive operations to AI such that we no longer have to perform them ourselves? Rather than having to think in a computationally intensive way about what we’re doing the whole time, Whitehead says we should cultivate habits that make this unnecessary. A civilised, (or “cultivated”) person has internalised a robust and flexible set of heuristics, biases, intuitions and ethics, such that they can normally act without conscious deliberation and have it turn out well. You might say that such a person is well aligned. Also, happy, at ease in the world, “bien dans sa pelle.”

The idea that we can just get LLMs or their successors to figure this out for us seems like a category error; i don’t think this is what Roon is suggesting, to be fair, although the result of what Roon *is* suggesting seems likely to be the opposite of civilisation advancing in the sense Whitehead is talking about.

“Can I interest you in some virtue ethics?” 👌

Expand full comment

I’m confused by all the “no moat” takes. GPT-4 plausibly cost $100M, who knows, GPT-5 could be $1B (they could afford it). If you can’t throw that much money at the problem you can’t compete. (And money is necessary but not sufficient as Google, Meta, Microsoft and so on prove.) Furthermore the training set is not public and the more they directly spend on hiring people to contribute expert material and RLHF, the more of a moat they will build. Microsoft, Meta, and Google also have private datasets that could be valuable for training too. Joe McStartup does not.

OpenAI is letting people use their API to distill other models (in violation of terms in many cases) because it currently benefits them to grow the pie of the AI economy, and because these distilled models don’t actually achieve the same performance, and because Altman genuinely (it seems) believes that kicking off a slow takeoff now is better than leaving an overhang that could produce a fast takeoff later.

At any time they can raise the drawbridge on their moat and start charging more for access, enforce their Ts&Cs to police other models distilling from them, etc.

Expand full comment