AI #18: The Great Debate Debates

Zvi Mowshowitz

Jun 29, 2023

Is debate worthwhile?

Read →

29 Comments

Comment deleted

Jun 30, 2023

Comment deleted

Expand full comment

Zvi Mowshowitz

Jun 30, 2023

Yes, I'm confident it would, including mine. It has a lot of influence on the costs and benefits, none of it good.

Expand full comment

DangerouslyUnstable

Jun 29, 2023

Curious if you have either heard anything or have any information on how well the Khanmigo tutor is actually working in the real world? The central claim is basically "give everyone a personalized AI tutor", with comparisons to research that showed personalized human tutors improving learning by 2 standard deviations [1], which I think we can all agree is huge. The problem is obviously that personalized 1-on-1 human tutors doesn't scale. AI tutors are potentially a solution that does.

While they do not explicitly claim that Khanmigo will produce this same 2-sigma improvement, they pretty heavily imply it.

For $100/year, if their claims are even sort of true, this actually seems pretty huge to me. It's obviously still new, but the original tutoring study that claimed the 2-sigma improvement only lasted 3 weeks, so it's been out more than long enough to compare to the original study (although probably not nearly long enough to prepare and conduct a study).

[1] https://journals.sagepub.com/doi/pdf/10.3102/0013189X013006004?casa_token=OhCqFjwMlFAAAAAA:6gVIVn71a9KKBYYV_zakMckRSGjv0cVl7ofdLtPV6O0n8D7WJRSMHhbDZhTuAt_6blc7y_-WAgsS1w

Expand full comment

Reply (1)

Zvi Mowshowitz

Jun 29, 2023

I don't have any idea here. Does anyone else?

Expand full comment

Reply (1)

Rob Cobb

Jul 4, 2023

There’s decent reason to think the original Bloom result overstated the effect size, but that a significant tutoring effect is real.

If there were a robust Khanmigo study, they’d publish it.

My hunch is that keeping kids “on task” is a surprising bottleneck for a decent chunk of students, and a big difference-maker between human and AI tutors right now. They might produce the same strings, but for now, the physical presence / voice / etc favors humans. Comparing effectiveness on a per-interaction basis, it seems likely that Khanmigo will outperform most human tutors, many of whom don’t tutor all that well (e.g. give away the answer instead of socratically guide to it). Studies framed this way or with a human supervisor to keep students on task are likely in the works, but I have no knowledge of specific in-progress studies

Expand full comment

Reply (1)

Rob Cobb

Jul 4, 2023

Nintil on Bloom: https://nintil.com/bloom-sigma/

Expand full comment

Kevin

Jun 29, 2023

Beating Angry Birds is indeed a pretty funny target to optimize. AFAICT the move space of Angry Birds is basically just a few floating point numbers; you choose how to launch the birds, and that's it. I estimate you would need on the order of 10,000 random tries to win an Angry Birds level just attempting random moves. With modern reinforcement learning techniques, I don't even know if they would be able to improve on random performance! - it's hard to learn much about a specific level before you've beaten it once, and after you do beat it once, you don't need to do it any more. So with 700 levels, I think a decent estimate is that you need to connect a game-playing algorithm to the game in such a way that you can play the game ten million times, and if AI helps at all, it's maybe one order of magnitude.

So how hard is that? A single real-time emulator isn't going to be enough. It's going to be impractically difficult to obtain a version of Angry Birds with special hooks for AI. Maybe some combination of speeding up emulators and running on multiple machines would make it plausible. Overall, it seems like more systems/infrastructure work than AI work. And why would anybody want to put systems/infrastructure work into Angry Birds simulations...

Anyway, TLDR I completely agree with your brief summary of "the reason nobody has beaten Angry Birds is because nobody is trying very hard", I just thought it was an interesting thought experiment to sketch out how one might do it.

Expand full comment

Yusef Nathanson

Jun 29, 2023

Rob Miles: “ "AI will make everyone unemployed" is a bit like "This bus driving off a cliff means nobody on it will get to work on time and they might get fired", it requires following through on only a small subset of the relevant consequences. ”

When normies encounter sentiments like this, don’t be surprised if they conclude that focusing on extinction risk distracts from near-term concerns.

As long as the rift exists between xriskers and people who emphasize problems of job loss, bias, misinformation, etc., both sides will look callous, irresponsible, and dishonest to one another.

Expand full comment

Paul T

Jun 29, 2023

Regarding game AI, for a long time I have thought it would be more interesting to have a single AI that you interact with deeply, vs trying to solve the Skyrim style immersive world with many agents. The agent-agent interactions rapidly go off the rails, plus you can’t constrain the character environment. An AI on a space station has such a constrained action-space that you could make it pretty convincing with current-gen tech.

Basically, Sydney could do an entertaining GladOS.

The big risk is of course prompt hijacking, you need to solve that before letting an AI drive your game characters. (At least, big companies will have a high reputation risk, maybe an indie doesn’t need to solve this problem.) Maybe this can be baked into the game narrative though?

Taking this further, the 1:1 AI porn game obviously doesn’t need to solve this problem and will probably sell well.

Expand full comment

blf

Jun 29, 2023

In section 18 the following paragraph seems misplaced after Dustin Moskovitz's quote. "I instead consider this the Fully General Counterargument. [...] not The Current Thing?"

Expand full comment

Reply (1)

Zvi Mowshowitz

Jun 30, 2023

Yeah, I'm going to remove it, probably got missed when I moved around sections, but everybody knows already.

Expand full comment

Chris M.

Jun 29, 2023

Honestly I thought the Post's "By the Numbers" article was the most useful AI article I've ever seen in the paper, and actually had it bookmarked. Why? Because it lists the senators tapped to lead legislative efforts, and better yet links to a transcript. (https://csis-website-prod.s3.amazonaws.com/s3fs-public/2023-06/230621_Schumer_SAFE_Innovation.pdf?VersionId=jApHm2QrP7nAZvl_B4GJ6s_YjSrfyYBK)

I agree the article is not *good* in the sense that it's a terrible summary of the speech, which I recommend just reading, if only so that you can complain about the reporter more accurately. But at the same time I found it very *useful* -- now I have some new one-way pen pals and a slightly better sense of where the Senate is starting.

Expand full comment

Reply (1)

Zvi Mowshowitz

Jun 30, 2023

Quite fair - often one thing like that (that I missed) is more important than everything else.

Expand full comment

Jurgen Gravestein

Jun 29, 2023

I’m utterly impressed by the sheer volume of interesting reads you are able to collect for every edition, thanks so much for sharing. Stimulating as usual.

Expand full comment

Egg Syntax

Jun 30, 2023

> Predict hit songs with 97% accuracy using machine learning on brain reactions.

Yeah, I'm calling 100% bullshit on that one. When you dig into it,

- they're training against 6 hits and 6 flops (+/- 1)...

- ...augmented with synthetic data, and I have a sneaking (but evidence-free) suspicion that they accidentally built the synthetic dataset from the training AND test data.

- what they're actually measuring AFAICT is...heart rate. Because heart rate data, they say, can be used "to infer neural states from activity of the cranial nerves using the downstream effects of dopamine and oxytocin".

- their test set, similarly, is ~6 hits and 6 flops.

There's more, but presumably that's more than enough.

Expand full comment

Jonathan Weil

Jun 30, 2023

“What is the force that would keep things at that level for a long period?”

Isn’t it something like, “It’s plausibly much easier for more-intelligent agents to improve the ability of less intelligent ones to do the things they can already do themselves (humans designing increasingly near-peer AIs) than it is for less-intelligent agents to build things that can do things they can’t do (humans/human-level AGIs designing superintelligences).”

I take your point about speed, replicability etc, but I think you can still meaningfully call that sort of thing “roughly human-level”, and posit a significant obstacle to further qualitative leaps beyond that level.

Expand full comment

Reply (1)

Tim Knittel

Jul 1, 2023

Narrow AIs already refute this concept - real-time GPS; AlphaGo; some facial recognition, image recognition, medical diagnostic systems; etc.

How many combined superhuman narrow AIs would it take to make an AGI, or at least an extinction risk level near AGI?

Expand full comment

Reply (1)

Jonathan Weil

Jul 1, 2023

Without being in any way an expert, I’m not sure they fully refute the concept. It doesn’t seem totally weird (in the way that Zvi is saying a pause around human level would be totally weird) to say that narrow AI is “doing things we can already do”, just way faster -- if Gary Kasparov had enough gigahertz, he could beat Stockfish, and the recent AlphaGo exploit demonstrates that in some ways, it is still distinctly subhuman *at playing Go* -- and that a superintelligence capable of, say, inventing nanotechnology would require a qualitative shift that fits my criteria. That said, I don’t think a human-level/near-peer AGI with the advantages of speed, replicability etc *doesn’t* necessarily kill us, so...

Expand full comment

Reply (1)

Zvi Mowshowitz

Jul 2, 2023

I would claim that in the sense that Gary Kasparov can beat Stockfish, I can invest nanotech.

Expand full comment

Reply (1)

Jonathan Weil

Jul 3, 2023

Let’s assume that inventing nanotech involves a whole series of dazzling, Einstein-level paradigm shifts in our understanding of physics. Idk, maybe you were already assuming this and I’m being redundant. Would you still say that you could do it, in the Kasparov/Stockfish sense? Would you guess that a randomly selected 25-year-old physics PhD with, say, IQ 150 could do it? To what extent does quantity have a quality all of its own in such cases? My own intuition is that there are certain sorts of leaps that require more than an abundance of time to achieve.

Speaking of Stockfish, I thought this was an interesting experiment: https://www.lesswrong.com/posts/odtMt7zbMuuyavaZB/when-do-brains-beat-brawn-in-chess-an-experiment

Expand full comment

Reply (1)

Zvi Mowshowitz

Jul 3, 2023

I mean Stockfish is like 800 points above the best human to ever play chess, it's pretty impossible as a human aspiration without augmentation. I think a 'similar amount of help' in some combination of compute/data/speed/duplication/memory/etc would indeed be enough for nanotech. Hell, I'd be unsurprised if normal humans could do it with $500 billion in funding.

Expand full comment

Steve Byrnes

Jun 30, 2023

I have a response to Anton 🏴‍☠️'s argument against foom [copied from my twitter]:

This tweet is an interesting argument but I think a central flaw is ignoring how long the program has to run to produce the prediction.

For example, “AlphaZero-after-1-game-of-self-play” and “AlphaZero-after-10²⁰-games-of-self-play” have essentially the same Kolmogorov complexity. After all, they have the exact same source code, apart from like 4 characters that specify the number of games to play.

But there’s a real sense in which the latter is better at Go than the former. Specifically, it’s better in the sense of “I don’t want to sit around while it does 10²⁰ games of self-play, I want to play Go right now.”

Another way to think about this argument: Suppose AI_1 builds AI_2, and then AI_2 does X. Well, it’s also true that “AI_1 did X”—specifically, “AI_1 did X” *BY BUILDING AI_2*.

In a certain sense, this is true! But that’s a pretty weird way to think about things! AI_2 does in fact exist here!

K-complexity asks us to forget about AI_2, by focusing the discussion on what happens given infinite time, as opposed to how it happens and how long it takes.

[…Then after sleeping on it I tweeted again…]

I think the core true point in atroyn’s argument is: there is a-priori-unpredictable complexity in the world that can’t be deduced from an armchair, but rather has to be observed, and making a “more intelligent successor” does not substitute for that.

If you flip a coin and don’t tell me, then I don’t know whether it’s heads or tails. And I also can’t make a “more intelligent successor” that knows whether it’s heads or tails.

This is entirely true! But I claim people talking about recursive self-improvement are not making that mistake.

For example:

• There’s an “overhang” of possible logical inferences that an AI could make on the basis of its existing knowledge, but doesn’t (e.g. if I tell an AI the axioms of math, it doesn’t instantaneously prove every possible theorem),

• There’s an “overhang” of possible input data that an AI could download and scrutinize, but doesn’t (e.g. as of this writing, I believe no AI has watched all 100,000 years of YouTube)

• There’s an “overhang” of possible plans that an AI could execute but doesn’t (e.g. an early AGI is unlikely to be simultaneously doing every remote job on the planet, while also starting a zillion new ambitious projects in parallel).

So an AI could self-improve in a way that allows it to go farther and faster on those metrics.

An obvious example is tweaking the assembly code to make the same AI run faster.

I also want to put self-replication into this category: going from “one instance of an AI” to “a million instances of the same AI running in parallel and collaborating” (e.g. by buying or stealing additional compute). If you think about it, I claim that should totally count as “self-improvement”, because after all one AI system is creating a more powerful AI “system”. The latter “system” is composed of many instances of the original AI, but so what? It should still count, IMO.

Expand full comment

SCPantera

Jun 30, 2023

Was there any indication who’s contracting the clickwork? As in, is this the companies who develop the LLMs doing this or companies themselves trying, or trying to contract out to other companies, to fine tune for specific purposes? Sort of personally curious in the case that I accidentally end up in the position to pitch LLMs for something conditional on it needing some titanic fine tuning.

Applying for a gig in my organization and had to write a qualifications thing and I know that AI’s an upper executive buzzword right now so I inserted a line about following trends in AI (translation: read this blog) and couldn’t figure out how to end it so I ran it through ChatGPT and got a way better corpo-speak end product.

I made the-kind-of-joke-I’m-not-sure-is-a-joke the other day about how I think Yudkowsky needs to hire a translator because he seems to be sub-capable of communicating with basically anyone effectively. Definitely agree with “you don’t need to be as alarmed as this to endorse the central logic here” but does he really think communicating in that precise way is going to accomplish his goals? Perplexed that the historic champion of “lol no take the Newcomb box with the fat ass cash” thinks demanding specific policy that nobody will agree to without actual persuasion is the right strategy. Inferential distances, my guy. I get the sense there’s a lot of “nobody’s listening so might as well swing for the fences” going on but at some point you still have to hit the ball a few times and run the bases if you want to win the game.

Expand full comment

Curious mathematician

Jun 30, 2023Edited

Enjoyed your post, but less than usual because tweet links no longer work. Possibly this is a temporary glitch, but maybe twitter is moving to a closed model? In any case, if you want non-twitter users to be able to access your content you will need to copy tweets into your post (as you often do, but not so much in this post), or link to a copy of the tweet stored elsewhere (e.g. the internet achive).

Whether this is a policy change or just something they screwed up, it seems like a vulnerability to have so much of your work depend on twitter.

Expand full comment

Reply (2)

Curious mathematician

Jul 1, 2023Edited

It appears the change was intentional and is intended to make twitter inaccessible to LLM training (see https://www.reuters.com/technology/twitter-now-needs-users-sign-view-tweets-2023-06-30/). I guess this is an example of mundane non-utility (they didn't take our jobs, they took our data, so we will reduce the utility of our data in order to prevent AI from using it).

Expand full comment

Zvi Mowshowitz

Jul 1, 2023

If you log in the links will work, is my understanding.

Important context is that only ~25% of all users click even one link in the entire post. My goal is to include the full text if I think a lot of people should read it, not do so if I think most people should skip it. I would update in favor of slightly more full pull quotes, but long post is long. If others strong disagree do speak up, I may run a poll in #19.

Expand full comment

Reply (2)

Curious mathematician

Jul 2, 2023

As someone who deleted their twitter account (and has since become much more productive), I'd vote for more pull quotes.

Expand full comment

Thor Odinson

Jul 3, 2023

Just chiming in as another person without a Twitter account who has no intention of being coerced into getting one. I previously did click maybe a third the Twitter links in your posts, but found value in the ones I did (basically, I click all the links for items of interest)

If there's a non-arduous workaround (somewhere that mirrors Twitter, be it the Internet Archive or somewhere else), I do agree that links are on net preferable to pull quoting literally everything, but currently twitter links might as well not be hyperlinks at all.

Expand full comment

Reply (1)

Zvi Mowshowitz

Jul 3, 2023

Getting a 'null' twitter account with 0 followers and 0 follows for this seems harmless to me, but some others do clearly disagree. I do not currently know of a workaround that would solve the issue in reasonable time. I will be changing procedure somewhat but I do not see any good options here...

Expand full comment

Victualis

Jul 5, 2023

The comments about Angry Birds seem strange. At the AIBirds 2021 competition http://aibirds.org/past-competitions/2021-competition/results.html the winner solved all 8 previously unseen levels within 30 minutes. I'd consider that solved.

Expand full comment

Don't Worry About the Vase

AI #18: The Great Debate Debates