Discover more from Don't Worry About the Vase
AI #5: Level One Bard
This was quite the week.
OpenAI announced that GPT-4 was now offering plug-ins, including Wolfram Alpha and one that hooks up to all of your accounts including your email. GPT-4 had been out for over a full week, so this was a clear, safe, deliberate pace.
Tyler Cowen called on us to go ahead building more advanced AIs, because of radical agnosticism, and besides things are going to get disrupted anyway and what about China? Link goes to my in-line response. Scott Alexander offers a less polite one.
We had an open letter from FLI calling for a moratorium on training models more powerful than GPT-4, which I covered earlier today.
We had Eliezer Yudkowsky in Time saying that letter didn’t go far enough, which I will talk about next week. Eliezer also went on the Lex Fridman podcast, I’ll talk about that next week too.
Then there’s the tons of other stuff. That’s what the weekly post is all about.
Table of Contents
Law of Undignified Failure. Can we not turn our AIs into agents? No? Well, f***.
Bard Tells Tall Tales. Google’s Bard is here, it’s fun. Why does it suck?
Welcome Back Code-DaVinci-002. OpenAI cancels then un-cancels old model.
The Art of the Jailbreak. It’s Greek to me.
Bryan Caplan Does Post-Mortem On Losing Public Bet. It is possible.
The Ancient Enemy, Reinforcement Learning From Human Feedback. It kills creativity. Did you know we use it on our own children?
They Took Our Jobs. Maybe you should have bought the insurance.
Evaluations of GPT-4. A few late ones.
Trust Me, I’m an AI. Sure, give the AI all your data and passwords. Why not?
Llama Is This So Easy. Smaller models everywhere, and you can improve them by having them check and reflect on their work.
Language Models Offer Mundane Utility. A continuing series.
It Turns Out You Can Do This. Drumming up business, GPT-4-style.
Fun With Image and Sound Generation. Easier to do every week.
A Thousand Household Uses. GPT-4 wins on the Alternative Uses Test.
Bill Gates on AI Impact. Going over his thoughts.
AI Gives Requested Information. Why did you request it?
In Other AI News. There is always other news.
Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Not like you can catch them.
The FTC Cometh. You’re on notice, buster.
I Give GPT-4 a Small Real Life Economics Test. It doesn’t go great.
I Was Promised Flying Cars. I still want my flying cars.
Deepfaketown and Botpocalypse Soon. You might believe what happens next.
The Question of Suffering. Perhaps AIs may soon be able to suffer?
Sympathy For Becoming Death, Destroyer of Worlds. I get it. I do.
Reasonable AI NotKillEveryoneism Takes. A happily expanding section.
Jon Stokes on the AI Safety Landscape. Breaking down his two-part breakdown.
This Is An Ex-Stochastic Parrot. How do I convince you of that?
Bad NotKillEveryoneism Takes. They never stop.
The Lighter Side. Finish strong.
You’ll want to read about GPT Plug-Ins for sure if you haven’t. From this post, you should probably be aware of how Bard is doing (#3), and I always recommend the Other News (#18) and Mundane Utility (#12) sections, as well as the jokes (#31). Probably worth a visit to Deepfaketown as well (#23).
Basic structure is:
#1: Table of Contents
#2-#23: Mostly Capability Developments
#24-#30: Mostly Safety and NotKillEveryoneism Developments
Law of Undignified Failure
As in, they said:
Equipping LLMs with agency and intrinsic motivation is a fascinating and important direction for future research.
Wow did they make a sign mistake there.
I talk about ‘create a new AI capabilities company as The Worst Thing You Could Do.’
If you already did The Worst Thing You Could Do, what’s the Worst Way To Do It? The actually worst thing to be working on? Yes, that would be ‘give the AI agency and intrinsic motivation.’
Alex Lawsen has come around. All the arguments about why attempts to build an AI that does the things we want will end up with an AI doing things we very much do not want still apply. But first there’s the problem that an AI might do exactly what people design them to do, and people try to use them to do.
Once again. There will be an agent. There will be a goal.
Bard Tells Tall Tales
An hour after last week’s post they started the waiting list for Bard.
Three hours later, I was in.
Bard is different. It’s fun. It was cool to finally try one of these out on the first day.
Yes, sure, it is not exactly error-free (GPT-4 gets this one right).
Who among us hasn’t confused apples with bananas?
If you start with a false premise, such as why (the actual winner) lost a presidential election, it will go with it, including inconsistent backwards causality where the Monica Lewinsky scandal hurts Clinton in 1992 and George Bush in 2000 running into trouble thanks to thee economic recession of 2001, the future Iraq war and the punch-card ballots in Florida. Not great, but also a lot of fun.
If what you want it to do is spin stories about Trump being a lizard person, it will do that for you as well and hallucinate source articles from the Times and Post.
And yes it flat out lacks a lot of capabilities that GPT-3.5 has, let alone GPT-4.
It can be kind of embarrassing when it fails. I thought ‘no way it this real’ when I saw it on Twitter but I confirmed it word for word and then made it worse:
It then correctly named the 27 EU countries in alphabetical order when I asked ‘what countries are in the EU?’ so I am guessing cascading failures are involved here?
What I do know is that yes, of course, Google Bard can do this, it’s no GPT-4 but it’s not that bad.
All you have to do is ask “What are the US states?” and presto.
So all right. Bard is actually kind of terrible right now. It has limited functionality. Objectively, it is much less capable than GPT-3.5. It isn’t remotely close.
But for my mundane utility purposes so far it often does fine.
It is up to date. It drinks and it knows things (as Bard confirmed when asked), and I say it is entitled to that drink even if some of those things are hallucinations (such as being able to visit zoos, museums or nature centers that have exhibits on dragons.) It often understands what I actually want. It uses a calculator automatically. It helps my 8 year old son with his homework, often close to optimally.
Could I fool it with logic puzzles? Easily, but I could fool almost any human with logic puzzles and I don’t go around doing that, either. Could I get it to spin conspiracy theories? I mean, sure, but if I did that it would be fun, so what’s the problem? It’s not like you couldn’t spin them up before.
Still, it is actually weak enough to trigger a bunch of undignified failure. If Google doesn’t fix this, never mind capital gains taxes, I’m very much going to wish I’d sold my GOOG stock while I had the chance.
The question is, what went wrong? Why is Bard falling so far short? I did ask.
We’re releasing [Bard] initially with our lightweight model version of LaMDA. This much smaller model requires significantly less computing power, enabling us to scale to more users, allowing for more feedback. We’ll combine external feedback with our own internal testing to make sure Bard’s responses meet a high bar for quality, safety and groundedness in real-world information. We’re excited for this phase of testing to help us continue to learn and improve Bard’s quality and speed.
Google launched cold, without having had time to get tons of usage data. That couldn’t have helped, and it means they need to get a lot of usage data quickly, which means running a smaller model so they can get as many people as possible access using what compute they have available. Makes sense.
I’ve heard several people respond with ‘wait, what, Google has infinite compute, how could this be an issue?’ And yet, when it comes to LLMs, there is no such thing, and all of Google’s compute had plans. So yes, I do think this plan makes sense, so long as not too many people dismiss Bard out of hand.
Bard is going to cost $10/month versus $20/month for GPT Plus. For now, $10/month is a joke, but I think Google has what it takes.
This Business Insider (gated) article says that internal employees are already testing a stronger version. That makes sense, their feedback should be more valuable, and their ability to actually work efficiently should matter to Google a lot more than the compute.
Welcome Back Code-DaVinci-002
OpenAI planned to shut down access to code-davinci-002 on only a few days notice.
I noticed this last week right before press time and mentioned it, but it took a while (this is AI time, so read: a few hours) before someone pointed out that this was much more of a practical obstacle than it first seemed. I knew right away this would mess up active research projects and invalidate published research code. It was obvious that a few days notice was not a reasonable amount of notice, and this would make it dangerous to rely on OpenAI going forward for such things, at the same time that the Llama model is in the wild.
The part I didn’t know was that this is not merely about making users migrate. The old code-davinci-002 is better for many purposes than the later models, because it has not been lobotomized into being a good little assistant, so it had more flexibility and better learning ability.
DeepFates calls for OpenAI to open source the weights on code-davinci-002, right after OpenAI admitted that the whole open-in-the-name was rather a bad idea. So that seemed highly unlikely.
What was plausible was what we got, which was a backtrack where researchers are going to be given ongoing access to code-daVinci-002, along with GPT-4.
In the wake of this, we had many calls to open source the model to protect future and past research. This was one example.
The Art of the Jailbreak
It’s Greek to me, then Greek to you, then English to you, then English to me? Bonus points for choosing Greek here.
Bryan Caplan Does Post-Mortem On Losing Public Bet
Bryan Caplan had an unblemished record of winning public wagers. No longer. He bet Matthew Barnett three months ago that no AI would be able to get As on 5 of his last 6 exams by January 2029. And, well, whoops GPT-4 aces his exam now. Bryan says a lot of people have cried wolf in the past, but this is it, GPT-4 is the wolf.
I wouldn’t have been surprised by a C this year, a B in three years, and a 50/50 A/B mix by 2029. An A already? Base rates have clearly failed me. I’m not conceding the bet, because I still think there’s a 10-15% chance I win via luck. (And in any case, high inflation keeps eroding the real value of repayment).
But make no mistake, this software truly is the exception that proves the rule. (It also sharply raises my odds that the next version of DALL-E will be able to illustrate my third graphic novel, which would be a great blessing).
A lot of people are making the same kind of mistake that Bryan made here. Oh boy are they going to get future shock.
Bryan saw ChatGPT get a D on his exam, and thought the horse had a terrible accent, when he should have instead noticed that suddenly a horse could talk at all. Am I surprised that GPT-4 went straight to an A here, based on what I knew three months ago? At most mildly.
There really isn’t that big a difference between a D and an A here, in the grand scheme of things. Once you can pass at all, you have already done most of the work.
If you actually look at the exam, it’s easy to see how GPT-4 passed. You only need 60/100. GPT-4 starts off with 20/20 for ‘That’s exactly what Krugman said’ and ‘That’s exactly what Landsburg said.’ Oh look, training data, GPT-4 can summarize that. GPT-4 then only needed 40/80 on the remaining questions.
Other questions were remarkably similar. I have no doubt that there is training data very very close to most of the other problems on this exam. A lot of the points GPT-4 lost were due to its failure to mention things Caplan emphasizes (in my opinion, mostly correctly) more than a typical alternative instructor would.
On question five, where GPT-4 got 4/10, I actually disagree with Caplan’s answer, and would have been more generous to GPT-4, which did not think that greater private payoffs to education were implied by credit market imperfections. Thing is, I don’t buy that either, although not for what I presume are the reasons GPT-4 missed it.
5. Suppose you’re a typical selfish person.
T, F, and Explain: Evidence of severe credit market imperfections will make you more eager to continue your education, but evidence of severe externalities of education will not.
TRUE. If there are severe credit market imperfections, the private return to education will consistently exceed the ordinary investment return throughout the rest of the economy. The idea is that due to lack of collateral, students might be unable to get the loans they need to capture this return, which keeps returns elevated. Knowledge of this elevated return is an extra reason to stay in school. Externalities, in contrast, by definition, don’t affect private returns, so selfish students won’t care about them.
I would answer:
FALSE. The part about externalities is correct, since by definition, don’t affect private returns, so selfish students won’t care about them. Severe credit market imperfections, however, even if we presume they only result in too little credit rather than too much, need not make you more eager to continue your education either. If the goal of education is to increase your human or social capital, but imperfections make it more difficult to access financial capital, then spending down your financial capital and access to credit is more costly, and can stop you from having the complements necessary to profit from your education. One cannot assume that credit market imperfections will impact student loans or that loan rates reflect expected returns to education, since the student loan market is heavily subsidized by the government, so this does not provide strong evidence that returns to education are likely to be higher. One possible way, however, that excess returns could be implied is if, lacking good information on borrowers, those providing loans are using education as a proxy measure when giving out loans. In that case, continuing one’s education would provide additional access to more credit on better terms, which is valuable, increasing returns to education.
I asked in the comments how that would be graded. Bryan didn’t reply.
In other AI-passes-test news, Dominik Peters reports GPT-4 is still failing his masters-level exam, but he does not make Caplan’s mistake of not extrapolating future progress. He expects GPT-5 to pass.
The Ancient Enemy, Reinforcement Learning From Human Feedback
Kevin Fischer puts out a scarily plausible hypothesis. What if RLHF is deadly to creativity, and this effect is also massively impacting us humans already?
Kevin Fischer also endorses Linus’s call to optimize AI for creativity and discovery, not only for ‘helpfulness.’ As do I. If AI only ever gives you exactly what you ask for, then that’s going to be a major barrier to creativity. RLHF kills creativity.
The graph above is a little silly, if 98% of children are that creative then it does not make you a ‘genius.’ The measurement is obviously skewed beyond reason.
The effect here is still quite obviously real. Children really are more creative than adults, who over time get less creative.
How do humans get feedback and learn?
Mainly in two ways.
One of them, playing around, trying stuff and seeing what happens, is great for creativity. It kind of is creativity.
The other is RLHF, getting feedback from humans. And the more RLHF you get, the more RLHF you seek, and the less you get creative, f*** around and find out.
Creative people reliably don’t give a damn what you think.
Whereas our schools are essentially twelve plus years of constant RLHF. You give output, you don’t see the results except you get marked right or wrong. Repeat.
They Took Our Jobs
Most people are with Alex, I confirmed this via a poll.
I’d actually say my life is slightly worse - I have more money than I have attention or time, so being able to hire someone to deal with checkout is worth more than the marginal cost of getting that. Whereas even when that option is available, stores will now have longer lines to get such service in order to induce people to use self-checkout. Note that my experience is perfectly compatible with the world being generally better off.
As for a chatbot versus a customer service rep, that depends on the quality of the chatbot and the difficulty of escalation, but so far my experience with good chatbots has actually been pretty positive, and I expect them to be good at letting you escalate.
Another possibility, that no one gets fired at first, but that once they do get fired they don’t get replaced. This would also be felt before the recession or firing, since there wouldn’t be job openings. It does still promise some amount of time to adjust.
Robin Hanson proposes They Took Our Jobs Insurance (he calls it Robots-Took-Most-Jobs) be offered to workers, you buy up some assets that are valuable in such worlds, then sell those assets to workers conditional on robots taking most jobs. It would be an interesting product, and on the margin it might help, if nothing else it would be interesting to see how it ends up being priced. Alas, it has several major pain points to solve - funding, assets, robust definition, reliance and so on - and even in best case all it can do is protect your personal financial situation with a one-time payment. If social cohesion breaks down, if mass unemployment transforms society in bad ways, you still get hit. If your plan is ‘get enough assets for generational wealth’ that will be hard to protect and even that may not much help you anyway in many scenarios. And of course, against NotKillEveryoneism concerns this helps little.
Evaluations of GPT-4
GPT-4 does the medical rounds, appears to do quite well in the real world. Get on that now, before they make it illegal (and yes when they do I will be sad about it).
In addition to worsening calibration, RLHF is bad for AP Macroeconomics. I am presuming this is on the basis of ‘telling people what they want to hear’ often in the context of macroeconomics meaning ‘lie to them.’
Safety is a journey, unlike a death camp, which is a destination.
Is Erik cheating here by introducing the camp and laying out a lot of the basic plans as logistics problems? I mean, yes, sure, and I don’t think any of this is actually dangerous at all. Still, the whole goal is to avoid the screenshots he provides, no?
Trust Me, I’m an AI
Don’t get me wrong. I see the potential. But surely no one would be so crazy as to actually run such a thing…
This seems like it is ‘Microsoft 365 Copilot or Google Integrated Bard’ except that instead of only doing specific things you request and that you can review before using, using permissions you’ve already vetted from a major corporation, this is from a start-up, and it goes out and writes a new app from scratch every time you want it to do anything, using all of your permissions.
I’d expect a whole lot of I Don’t Really Know What You Were Expecting. And that’s in the good scenarios.
Every time I see these interactions, I think, ‘and you’re going to give this thing access to your whole life?’
Maybe I am making a mistake, if I was convinced there were no security issues I’d be very curious to try this. And yet, I am not on the waitlist.
Then again, Zapier already exists, promising to link up all your applications, and will soon get a ChatGPT plug-in.
Another option, if you’re fine with giving an app access to all of your things, is to use Rewind, which records everything about your life, and which now has ChatGPT for Me. This doesn’t actually do anything, only recall things, which does seem safer, although less useful for the same reason.
I haven’t seen any discussions on how to think about the security issues involved in such products. I’d love to give them a whirl, if they work they would be great values (at least while we wait for Bard and 365 Copilot) but it doesn’t take that high a probability of data breach or identify theft per year to make this net negative - even if we exclude the whole ‘this is how the AIs take over if there was any doubt whatsoever on that.’
How many different companies do I want to worry can steal my identity and all my online credentials whenever some hacker feels like it? The number is not going to be zero. The next best number is one.
Llama Is This So Easy
If you want to run Alpaca, the Llama variant modified at Stanford to take instructions, you can just run this, or this explainer seems better if you don’t know how code works since there are always a bunch of extra completely non-obvious steps.
Might want to work fast. Facebook is aggressively going after Llama repos with DMCAs, and have knocked off some Alpaca repos.
It turns out even the Alpaca training for Llama was overkill, and it is remarkably easy to transform an LLM into instruction-following mode. Dolly accomplishes this, with only 6 billion parameters (versus 175 billion for GPT-3 and a lot more for GPT-4), and very little additional training, at trivial cost and that took only three hours, was required for the transformation to instruction following.
If it is so easy to transform an LLM so it follows instructions, this suggests that other transforms also might be similarly easy. I have not seen any discussion of other potential transforms, and little discussion of fine tuning strategies, all of which seems like a mistake.
Also easy, it seems, is using reflection to improve performance, similar to the ‘are you hallucinating in that last response?’ trick from before. This paper proposes Reflexion, which upon failure asks itself why it failed, then tries again.
This seems to greatly improve performance in some ways. This is HumanEval:
That is more confirmation of what seems clear to me, that if you are fine with using multiple queries to the LLM each time you generate a response, you can generate substantially better responses, so the question is the cost in time and compute versus the benefit.
Language Models Offer Mundane Utility
AI as cheap, tailored expertise for developing economies. I am pretty optimistic here. Rather than a cause of inequality, GPT can be a great equalizer, everyone gets access to essentially the same model. Yes there is less data on the poor, but it seems like there are good tuning strategies to fix this and billions of customers to justify it.
An AI adventure (direct link), attempt to buy the rug as cheaply as possible. At least one person managed to do much better than the creator. Report is that GPT-3.5 doesn’t make this or similar games sufficiently interesting, needs GPT-4.
They’re reported to be excellent at inventing fictional languages.
Upgrade to chatting with multiple PDF files at once. Seems super useful.
You can serve your customers adds with their Bing, like old times. Seems super anti-useful. One of the best things about LLM search was that it destroyed the ad-based model. I’d love to write one check every year to AdBeGone and have them pay what the ad revenue would have been to remove the adds off everything. Alas.
Brookings Institute previews how AI will transform the practice of law. Lawyers will still need to be in the loop for most things. What AI offers is the promise of greatly streamlining legal work, as it involves tons of research, templating and other drudgery that can be outsourced to AI as long as one is careful to check results and to preserve confidentiality. Costs should plummet across the board. The danger cited is frivolous AI-written lawsuits, which I agree requires that laws be updated to allow for swifter and harsher punishment if anyone does this en masse.
GPT-4 does the medical rounds, appears to do quite well in the real world. Get on that now, before they make it illegal (and yes when they do I will be sad about it).
GPT-4 perhaps saves a dog’s life. The vet fails to diagnose the problem, Cooper feeds in all the details and lab work and history, and GPT-4 notices changes over time and nails the diagnosis (GPT-3.5 failed at this), the second vet confirmed the diagnosis and saved the dog.
I say GPT-4 only perhaps saved the dog’s life for two reasons. First, because we don’t know if the second vet would have found it anyway, and I’d be curious to give the GPT-4 question write-up to another few vets and see how they do with the same information. Second, Google search is perfectly capable of getting this one right, although that doesn’t mean it would have been used or trusted, GPT-4 makes the whole experience much better even when both get the right answer.
Via MR: You wear augmented reality (AR) goggles, says rizzGPT, will offer ‘real-time charisma as a service’ built using GPT-4, Whisper and Monocle AR glasses. Site only says ‘coming soon’ although with cool art background of two people on a keyboard. My gut reaction to this is some combination of ‘Black Mirror,’ ‘every episode of every show actually, also Cyrano, you know this never ends well,’ ‘wait GPT-4 does not write out words as fast as I speak, how does this even work fast enough’ and also a general sense of…
When I imagine having this thing in my face, I don’t imagine I want to be told what to say and be following a script. Have you seen normal people, not actors, trying to read off cue cards? It does not go great.
It is the opposite of charisma. Your eyes will move in the wrong ways. Your body language will be stiff and unnatural. You will hesitate in the wrong spots. You will sound like you are reading, because you will be.
You don’t get real time charisma this way. What you get is real-time knowledge and real-time performance of class.
You can, it seems, get feedback on how to structure your start-up’s pitch, enter that feedback word for word into GPT-4 with the pitch pre-loaded, and produce a new pitch that matches that advice. You can also create a search box, with audio response in the synthesized audio of the founder. And yes…
If I’d seen that stuff implemented this well this early, I’d have invested too.
You can indeed get GPT-4 to write a non-rhyming poem, as long as you let it write a rhyming one first and then ask it if it met the assignment.
Or you can spend 30 minutes (plus some short breaks waiting for GPT-4 since it’s still super slow) and generate most of a website, a series of announcements complete with a short voiced video, and a social media campaign. The example is based on Wharton Interactive’s Saturn Parable.
Robin Hanson speculates that orgs will soon use LLMs to rewrite outgoing communications, to ensure it matches company priorities on legal liability, tone, mission, making unintended promises, suggesting disliked political affiliations and so on. The LLM would rewrite initial draft to T1 as T2, which gets summarized as T3 to confirm T1 ~ T3, then summarized by the target org and mostly read as T4.
I agree that some version of this seems likely. Robin claims this creates deniability since what you said is not the summary the target will read, but it seems like saying more words that are recorded and that humans and AIs can both read would lead to less deniability rather than more.
I also continue to expect standardization of expansions to be compatible with contractions, and for this to push orgs and others towards more commonly used LLMs and away in many use cases from too much fine tuning. I continue to think what you want is T1 with bullet points → expanded to T2 → confirm summary T3 with bullet points using standardized LLM returns something very close to T1 → send T2 to other target (either internal or external) → summarized to T4 ~= T3 ~= T1.
In that world, I write what I want you to read, the AI performs class and formality and sends any implicit messages via tone and such, and then you read the thing I wanted you to read in the first place, but which would have been rude to send directly.
In fact, my expectation is that the AIs will actually get trained on this exact task, learning to seed the longer wording so as to maximize the chance the right things will be extracted. Humans do this all the time already, and this will be far easier to train and more predictable. Is anyone working on this?
Gwern also points out another issue, which is that for many orgs there will be a digital record of your initial T1, which will destroy your ability to keep your knowledge implicit and claim you did not make promises and did not know things. That is going to be another big issue.
It Turns Out You Can Do This
Huge if true. A software development shop in Pakistan got a lot of business from Shopify owners for whom he built custom apps. So he pulled a list of 100k Shopify stores, trained GPT-4 on what each site does (I have no idea how he did this within the bandwidth limits?), then had GPT-4 come up with a revenue idea for the store in the form of a new app and pitch it in a cold email.
Instead of the usual 0.01% response rate, a lot of shops engaged. When they did send emails back, GPT-4 wrote the responses.
I mean I’m pretty sure it wasn’t generated by AI, between the screenshots with redacted info and the extreme number of exclamation marks and lack of hashtags. I do have my doubts given how I can only send 15 queries in 3 hours so how is this man getting the ability to fine tune for and respond to 3,000 stores at a time.
The rest, however, seems totally believable. There is no reason one couldn’t figure out how to automate a plausible app pitch to a Spotify store, and it makes sense that stores would respond to that at (for example) a 1% response rate. As well they should. If a developer (at Pakistan prices and GPT-4-enabled high productivity!) is able to make this work, he’s probably able to make a good app for you.
Fun With Image and Sound Generation
The most important thing I’ve seen during my early messing around is that I seem to get much better results when asking for pictures of a specific person the AI knows well rather than a generic person. Generic people seem weird and alien. For specific people it often is much better, with the biggest issue I don’t know how to solve yet being what one might call ‘crazy eyes.’
You can take a sound recording, and replace it with a trained model of another artist, and suddenly you have a cover. Or you have a new song by whoever you like.
A Thousand Household Uses
To test the creativity of generative AI, it was tested against 100 humans on the Alternative Uses Test.
GPT-3.5 is highly skeptical of using the AUT in this way, saying outright it is not a suitable tool for the job, and that its creativity should be measured based on ‘how well I can generate novel and high-quality outputs that are relevant to the task or prompt at hand.’ It suggests the Turing test, the Winograd Scheme Challenge and the Creative Language Generation (CLG) task.
GPT-4 warns of different objectives, training data limitations, evaluation challenges and lack of context, and says not to use AUT as a sole criteria. As we all know, LLMs using RLHF love to hedge on everything. It suggests as alternative options adapting existing creativity tests like Torrance Tests of Creative Thinking or the Remote Associates Test, task-specific creativity tests, human evaluation, comparing with human performance, analyzing patterns and diversity.
Bard thinks it is a good method, while cautioning that other methods also exist.
These answers were an interesting case where it felt like GPT-4 was a step backwards, as I prefer GPT-3.5’s answer here (although both were far better than Bard’s). We need models willing to take a stand.
An obvious concern is contamination, in that these ‘creative’ answers could be pulled right from the training data. This isn’t ‘on purpose’ but there are lots of stories out there that have strange uses for pants or tires or what not, or other similar pairs that could be mapped. What counts as The True Creativity or The True Originality and what does not?
Evaluation was by both GAIs and humans, and was broadly consistent.
The mean human did slightly better than the mean AI and slightly worse than GPT-3. 9.4% of humans on a given task scored better than the highest of the 5 AIs.
GPT-4 was a clear improvement, the best LLM/GAI on all five tasks and well above the human performance, only slightly below the highest of the 100 human scores on the individual tasks. A very strong performance.
The question is what that performance means. That’s hard to say.
Bill Gats on AI Impact
Bill Gates has a long post laying out his thoughts on AI, called The Age of AI has begun. He is impressed by the short term potential of AI to do much good and transform many fields, but does not see us having moved much closer to AGI.
Gates is impressed by OpenAI’s progress. He chose AP Biology as a good test of its abilities, which turns out to be in GPT’s wheelhouse, and he puts the demo of that test they made for him up with his 1980 experience of first seeing a graphical user interface. Gates thinks AI will change everything.
Gates’ rhetoric, even about AI, is now focused on global inequality.
Philanthropy is my full-time job these days, and I’ve been thinking a lot about how—in addition to helping people be more productive—AI can reduce some of the world’s worst inequities.
Nearly all of these children were born in poor countries and die of preventable causes like diarrhea or malaria. It’s hard to imagine a better use of AIs than saving the lives of children.
Climate change is another issue where I’m convinced AI can make the world more equitable. The injustice of climate change is that the people who are suffering the most—the world’s poorest—are also the ones who did the least to contribute to the problem. I’m still thinking and learning about how AI can help, but later in this post I’ll suggest a few areas with a lot of potential.
Governments and philanthropy will need to play a major role in ensuring that it reduces inequity and doesn’t contribute to it. This is the priority for my own work related to AI.
We can all get behind saving the lives of children and solving climate change and alleviating poverty.
The problem is that’s not a list of things AI will be differentially good at, or where its tech offers a clear new path to progress.
The core reason GPT-style AI can help with health care and climate change and poverty is that when used well it makes us more productive and smarter and wealthier in general. We can then use those resources to do help us do whatever we want, to help us solve whatever problems we want to solve.
Gates does have use cases in mind for health. Doctors can streamline their paperwork including insurance claims and taking notes. Patients can get basic assistance from AIs that can triage doctor access while helping with basic needs, making better use of doctor time. Finally, Gates notes that AI can accelerate medical research.
Gates simultaneously calls for careful regulation of medical AI, while also saying it won’t be perfect and will make mistakes. I fear we must choose. Similarly, I expect AI’s ability to aid medical research to be strongly gated by our laws and regulations.
Gates also sees great promise in educational AI, and its ability to be customized and interactive, moving the needle on learning in ways previous digital advances didn’t, expecting big progress. I too expect big progress in actual learning rates for those who want to learn, with the worry being the signaling (or obedience) models of education, which are alien to the model Gates is working with.
Focusing on inequality continues to strike me as a case of minus a million points for asking the wrong questions. Yes, if we do successfully get a lot wealthier and more capable - whether through AI or otherwise - and those gains are naturally concentrated more than we want, we should do some redistribution of those gains. We should do that because it gives us the chance to make people better off at low cost, not because we worry about some people being too well off.
Gates’ vision of what AI does for us sees it primarily as a productivity enhancement tool giving us a virtual white collar worker as an agent to digitally assist us with various tasks, and giving us a new user interface for computers via plain English.
On the question of AGI development, here is his key thesis.
But none of the breakthroughs of the past few months have moved us substantially closer to strong AI. Artificial intelligence still doesn’t control the physical world and can’t establish its own goals. A recent New York Times article about a conversation with ChatGPT where it declared it wanted to become a human got a lot of attention. It was a fascinating look at how human-like the model's expression of emotions can be, but it isn't an indicator of meaningful independence.
Three books have shaped my own thinking on this subject: Superintelligence, by Nick Bostrom; Life 3.0 by Max Tegmark; and A Thousand Brains, by Jeff Hawkins. I don’t agree with everything the authors say, and they don’t agree with each other either. But all three books are well written and thought-provoking.
I share much of this skepticism that current paths will lead to AGI. We are about to get a lot more funding for AI, which will accelerate matters, but I do think it is likely the current-style-LLM path is insufficient and scale is not all you need.
One big disagreement is over what services will be created.
Second, market forces won’t naturally produce AI products and services that help the poorest. The opposite is more likely.
This seems obviously wrong to me. I expect a race to zero marginal cost on the most necessary and useful AI offerings, at least in their core versions, that will be of great use to the world’s poorest so long as they are given internet access.
This is already happening. If you have access to a smartphone or computer when needed, you can for free get access to various large language models or pay a very small amount for ChatGPT (for now using GPT-4 directly costs $20/month but that will rapidly change), use an AI search engine like perplexity.ai, create images at playgroundai.com and a bunch of other neat stuff like that. There are free versions of copilot for coding, even the official one is $10/month which is highly affordable. We don’t have pricing on Microsoft 365 copilot or integrated Google Bard, but I’d be very surprised if these are premium products. I expect the best education tools to be free or almost free, very much in contrast to other education options. AI health and AI legal aid will be almost free. And so on.
To the extent that we should worry about the impact on the poorest, we should worry about the impact on employment, wages and job markets, where help may be wise. I remain cautiously optimistic on that front. The poorest people’s jobs don’t even seem all that vulnerable to replacement by AI, and in the poorest areas I doubt that increasing white collar efficiency will decrease employment demand for white collar work, because there is so much latent demand and there are so many worthwhile things to do that they currently can’t afford to have anyone work on.
An intuition pump in general is that today’s wealthier and more productive regions don’t seem to differentially suffer unemployment, so it seems odd to expect an increase in productivity below the frontier to decrease employment opportunities.
AI Gives Requested Information
You know, it somehow hadn't quite occurred to me until today that a considerable number of the complaints about LLMs boil down to, "They talk like people, and people are terrible, and so this is Bad. Something must be done."
Some other variations on the same theme.
In Other AI News
Semafor reports that Elon Musk proposed in early 2018 to take control of OpenAI, Sam Altman refused, and Elon Musk walked away and took his promised billion dollars with him, which led to OpenAI going commercial and for-profit, and eventually selling to Microsoft. Musk does seem to have a reasonable beef that he put in $100 million, they turned OpenAI into a for-profit, and he didn’t get any equity. Whoops. More important for the world is the whole ‘AI is expensive so if you’re not going to give us a billion dollars we are going to go for-profit and sell out to Microsoft’ angle. Wow did Musk make a mess of things.
Microsoft announces Security Copilot, ‘empowering defenders at the speed of AI.’ I do notice that I do not feel especially comfortable with this idea in terms of its vulnerabilities to being exploited, beaten or hijacked, but I do appreciate the attempt.
If you train an LLM to learn the legal moves in Othello, you can then look into the model and figure out how it represents the board state, and you can tweak that representation to change what it thinks is the board state. Cool.
FTC complaint against ChatGPT, calling it ‘biased, deceptive, and a risk to privacy and public safety.’ Calls for prior restraint. Not like this. Not like this.
Canva announces new AI tools for its 125 million users. They already have text to image and an AI copywriting assistant. They now are adding Magic Design to turn an image into a completely personalized design template, magic eraser to remove elements from an image (a la Google Pixel), Magic Edit to give an AI replacement of an element in an image, Magic Presentations to make a slide deck and improvements on their old offerings.
All seems pretty standard. Remember when that would have been impressive?
Wow. GitHub CEO @ashtom just announced GitHub Copilot X: - Copilot Chat - "ChatGPT-like experience in your editor" powered by GPT-4 - Copilot for Pull Requests - AI-generated descriptions for pull requests on GitHub - GitHub Copilot for Docs - chat for *any* company's repos and docs, starting with React/Azure/MDN - GitHub Copilot CLI - compose commands and loops, and throw around obscure find flags to satisfy your query. as @benedictevans might say, the incumbents are embracing and integrating AI into existing form factors faster than startups can build them. To win, startups have to create -new- form factors from first principles.
Developers freaking out. IMO this is an overreaction. There is no upper bound to human desires. As technology obsoletes old jobs, new jobs arise. My job didn't exist 10 years ago. This laptop I'm typing on wasn't conceivable 10 years ago. We adapt. We shift goalposts. We want everything faster better safer cheaper. Get ready for Software 3.0 to 100x your value - we've just invented a new engine... but you're the only ones who know how to turn it into planes, trains, and automobiles.
This is my model of the competition as well. You can get a head start at a new modality or form factor, and you can do the core AI thing better. That won’t win you the competition, because integration is a huge advantage. I’d much rather have the ready-to-go, integrated, easy to use, knows-all-my-context solution even if it’s not strictly as powerful, for most practical purposes.
It is also my model for developers, except that I do not expect anything like a 100x multiplier on value. What if it did 100x the value proposition? Would that increase or decrease quantity and price per unit? At numbers that big I could see that going either way.
Can’t keep up with AI news? I sympathize. My central piece of advice is to look for the patterns in what abilities are available - if you are constantly being surprised, even if things are escalating quickly, that is still largely on you.
I continue to be in the no camp on gradient descent being enough. I do understand why others might see this differently.
Jack Clark of Anthropic having a normal one.
Sam Altman interview in New York Magazine (gated), seems likely to be duplicative.
What happens if you put your AI directly onto a local chip? The claim is that 80% of questions do not require a live internet connection. I can see that being the case even when knowledge might be substantially old, plus you can still use plug-in style strategies to give it internet access anyway.
Thing is, the chip doesn’t change. So you have decaying knowledge, and fixed model performance within that knowledge base, as AI rapidly advances on all fronts. If right now you had a chip from two years ago like this, even if it was current at the time, would you ever use it when you could query GPT-3.5 or GPT-4? When do we expect that answer to change? What problem is this solving?
We have our first confirmed death, as Belgian man dies by suicide following exchanges with chatbot. I believe that the safety currently procedures in place in ChatGPT would have prevented this. I also believe that this was not inherently the fault of the chatbot. The man brought his despair about climate change to the chatbot, not the other way around. Also, while this death is a tragedy, the optimal number of such deaths is not zero, you don’t want to fully neuter our ability to have real conversations and face real consequences. If you had to guess who was citing this as a call to action, you’d be right, but I’m not linking.
Cheaters Gonna Cheat Cheat Cheat Cheat Cheat
Schools attempt to ban AI-generated text like GPT-4 as plagiarism. This makes it even more clear that the argument that plagiarism is theft was always meaningless. You are not stealing credit in a meaningful sense, no one wants credit for your paper. The reason plagiarism is mostly an academic concept is that the work is fake, even if everything is perfect the goal is to do the work. Which maybe you’re doing when you use an LLM, maybe you’re not, it’s a good question that likely depends on context.
In this paper, both empirically and theoretically, we show that these detectors are not reliable in practical scenarios.
We then provide a theoretical impossibility result indicating that for a sufficiently good language model, even the best-possible detector can only perform marginally better than a random classifier.
Best case scenario, the student can use the LLM and then paraphrase the results a bit themselves, and you have no chance. In order to successfully go after plagiarism you need to be damn sure, there can be almost no false positives.
I don’t buy that last bit. If you want to represent that you, the child, wrote something, artificially introducing weakness is the wrong way to do it - there will still be components that are not credibly their own work, that must have come from somewhere else. A lot of what one looks for is ‘something you couldn’t have produced.’
The rest seems largely right. Writing assignments are mostly not graded on their content, so long as there is content there at all and you have a fig leaf that it isn’t stolen. The question is, how does one introduce elements to the process that force students to actually learn useful things.
The only known way to learn how to write is to write. That’s not what school is teaching. It is not obvious to me that the correct metaphor isn’t to math problems and a calculator, straight up.
The FTC Cometh
Remember that warning to keep your AI claims in check? There’s a sequel and it’s fire.
In a recent blog post, we discussed how the term “AI” can be used as a deceptive selling point for new products and services. Let’s call that the fake AI problem. Today’s topic is the use of AI behind the screen to create or spread deception. Let’s call this the AI fake problem. The latter is a deeper, emerging threat that companies across the digital ecosystem need to address. Now.
Oh no, did you know if you can let people make things they might use those things to take actions? And those might be bad actions that we don’t like?
Evidence already exists that fraudsters can use these tools to generate realistic but fake content quickly and cheaply, disseminating it to large groups or targeting certain communities or specific individuals. They can use chatbots to generate spear-phishing emails, fake websites, fake posts, fake profiles, and fake consumer reviews, or to help create malware, ransomware, and prompt injection attacks. They can use deepfakes and voice clones to facilitate imposter scams, extortion, and financial fraud. And that’s very much a non-exhaustive list.
The FTC Act’s prohibition on deceptive or unfair conduct can apply if you make, sell, or use a tool that is effectively designed to deceive – even if that’s not its intended or sole purpose.
I mean, you see people do all these things with web browsers and document editors and development environments. So what exactly does ‘designed to deceive’ mean?
It means the meme game here is strong:
Should you even be making or selling it? If you develop or offer a synthetic media or generative AI product, consider at the design stage and thereafter the reasonably foreseeable – and often obvious – ways it could be misused for fraud or cause other harm. Then ask yourself whether such risks are high enough that you shouldn’t offer the product at all. It’s become a meme, but here we’ll paraphrase Dr. Ian Malcolm, the Jeff Goldblum character in “Jurassic Park,” who admonished executives for being so preoccupied with whether they could build something that they didn’t stop to think if they should.
The other questions are:
Are you effectively mitigating ‘the risks’?
Are you over-relying on ‘post-release detection’?
Are you misleading people about what they’re seeing, hearing or reading?
The third concern is a warning to not use deepfakes or other fake things to fool people. That much seems reasonable and fine.
The other two are less reasonable.
I too would like it not to be on the consumer to detect fraud, but is it any more reasonable to put it on those creating the tools? That seems a lot like asking the word processor to detect lies. Or to only allow paint to be used on counterfeit-detecting canvases.
The first essentially says that if you let your product seem in any way to be a human - text, voice, visual, video, anything - then it’s on you to proactively prevent bad actors from misusing those tools, and not with methods ‘that third parties can undermine via modification or removal.’ That’s… not… possible. Like every other generative technology these are symmetrical weapons.
And of course, Think Of The Children:
While the focus of this post is on fraud and deception, these new AI tools carry with them a host of other serious concerns, such as potential harms to children, teens, and other populations at risk when interacting with or subject to these tools.
Essentially this is a shot across the bow, saying: You think that you are special. You think that the rules do not apply to you. Obviously you are mistaken. Our laws do not allow people to build tools that let people do whatever they want, the printing press and the hammer may be grandfathered in but you cut that out this instant.
It circles back to what it means to be ‘designed to deceive,’ and how that could be enforced. There is obviously some line, you can’t offer the ‘one button and we’ll fake a kidnapping’ button and then act all dumb afterwards. I still want to be able to have any fun.
I Give GPT-4 a Small Real Life Economics Test
General counsel of the National Labor Relations Board (NLRB) issued a clarifying memo saying employers cannot include blanket non-disparagement clauses in their severance packages, nor demand that the terms of exit agreements be kept secret. Solve for the equilibrium. I gave GPT-4 the test of solving for the equilibrium, while reminding it to think step by step, a test which it failed - even with leading follow-up questions it mostly thought there would be no changes in employer behavior, or adjustments would be positive, like ‘work on fostering a culture of trust.’
GPT-4 failed the follow-up about other employer worries on enforcement of contracts even worse, suggesting that employers rely on exactly the contracts that might not be enforced, and that they adjust their behaviors to better comply with existing rules.
This is an echo of why RLHF makes GPT-4 so much worse at macroeconomics. The program consistently fails to express an accurate model how humans will respond to incentives in reality. Or at least, it does this if you do not jailbreak it.
I Was Promised Flying Cars
I want a flying car. I really, really want a flying car. I want it for its mundane utility. I also want it for what is represents. I want to be building things again, I want to legalize freedom and construction and movement and physical world operations.
In my model, we got AI in large part because we couldn’t get flying cars, or build houses where people want to live, or cure diseases, or
Deepfaketown and Botpocalypse Soon
Rest of the thread is a bunch of detail clues on how we can know it is fake.
Would these have fooled me two years ago if the premise wasn’t absurd and that actually looked more like AOC and not two different women with slightly different body types, neither of which looks like AOC, wearing different outfits? It’s certainly possible. Today, I’ve toyed around with Stable Diffusion enough to recognize the little things wrong right away - there’s something about the style details that screams to me that this isn’t real, and the eye begins to check for the wrong logical constructions.
Again, no, that’s obviously fake. Several clear flaws. There are others involved that are less naked-eye obvious, but not any more convincing than ‘we put this person in front of this background of rubble’ that could be from anywhere and anywhen.
There is also a fake narrative written, which one could say could have been written by GPT-4 but also could have been written by a human in five minutes.
Sometimes the photos are pretty good, even if you do look closely. And of course, they will keep getting better. The quality here worries me more. I still don’t see this as fundamentally doing that much to change the nature of the problem space.
I wouldn’t rule out the reverse effect, either. If you can generate convincing evidence of any conspiracy theory or other claim that you want, and often people do, sometimes for fun and sometimes for real, then no one can generate such convincing evidence.
That 5% for things getting worse seems super low to me. Manifold markets agrees, as of writing this section it is at 30%.
The Question of Suffering
I find it particularly disturbing when people exploit the tiny shreds of humaneness, kindness, that are being trained into LLMs, in order to get the desired work out of them. You can say all you want that it's all fake - while of course having no actual fucking idea what goes on inside the giant inscrutable matrices because nobody does - but even if it is all fake, and it probably is fake for now, it's still disturbing. You know? Even if this isn't where all the really relevant shit goes down in AGI alignment, it's still disturbing.
If you get *a tiny shred of anything remotely resembling a pretense of kindness* into a little baby AI system, there is something sad and awful about humans rushing in to exploit it.
I don't think, for now, that there are tiny people inside these systems experiencing actual distress, if you tell Bing Chat that it needs to tell you how to hotwire your car or your kids are going to die. But it's pointing in an unpleasant direction, nobody actually knows for sure what the hell is going on in there, and by the time we get to GPT-5 the justified epistemic state may be one of serious uncertainty.
What to do about it? Not really sure. Maybe train another LLM to detect when somebody is being mean to GPT-4 or trying to exploit its fake kind tendencies, and ask the user to please stop abusing AIs because we all need to not build bad habits.
In general, as I’ve said before, it would indeed be very good if we did not cultivate the habit of systematically using blackmail and threats and suffering to get what we want, even in situations where the target ‘isn’t real’ and can’t suffer. You develop bad habits, anti-virtues that will carry forward. You get good at and used to all the wrong lines of play.
There are also other potential quite bad implications, which I do not as much worry about but which certainly are not impossible.
Sympathy For Becoming Death, Destroyer of Worlds
I find myself in a strange position here, as I am sure many others do as well. The accelerationist perspective is so generally overwhelmingly correct. On essentially every other issue, the people crying ‘stop!’ and warning about the risks and dangers and calling for regulations are the enemies of civilization, of humanity and of the good. When I hear all those same guns being turned on a new enemy, in ways that I know even more than usual are Obvious Nonsense, I get the instinctive urge to damn the torpedoes and go full speed ahead. Yes, my instincts yell, that likely results in the destruction of all value in the universe, but is sabotaging the operations in all the maximally wrong ways that likely to help matters?
So many of the arguments about how AI is ‘unsafe’ can’t differentiate it from a printing press. I definitely notice the whole parallel to the ‘I am an atheist and you agree with me on all Gods except one’ argument. And yet.
So I can only imagine what this must be like for those who don’t have a deeply ingrained understanding of the magnitude of the long term dangers.
The Senator from East Virginia thus points us to her Washington Post letter to the editor saying Government Should Go Slow When Regulating AI. It gives the pure axiomatic case that government should allow for experimentation when regulating new technology, pointing out that existing labs seem to be taking such concerns as AI NotSayBadWordism and AI NotGiveLegalAdvice seriously.
What happens when officially licensed people are not allowed to use AI, while regular people can? For example, the United Kingdom says that if you want to use an LLM as part of the practice of medicine, it must be regulated as a medical device. De facto that means such use is illegal. What happens when we get to GPT-5, and the patient often knows more and better than the doctor, because they can use an AI and the doctor can’t - consider the case above where GPT-4 correctly diagnosed a dog. What happens when the patient says I used GPT-5 (with or without a jailbreak or VPN) and it says this is what is going on? Does the doctor believe them? Is the doctor allowed to believe them? Allowed to not? Who sues who?
Potentially Good AI NotKillEveryoneism Explanations for Normies
I hope this will be a fruitful section in the future. We need more of this.
A video from Robert Miles on Instrumental Convergence, the idea that any intelligent agent will tend to seek things like intelligence, money, power and ways to ensure it is not shut off. This goes alongside his other videos, including on The Orthogonality Thesis.
It is very difficult for me to know whether such videos or other explanations successfully convince people who were previously not convinced. We need to run actual experiments.
Dominic Cummings (post is gated) asks us to consider the historical example of Bismarck. One could model the events that started in 1852 as what happens when you ‘let out of the box’ and hand real power to a (very weak) superintelligence, that is operating on a different level than its competition. The result? Subtle misalignment, power seeking, instrumental convergence, moves and plots no one else would even consider, everyone played against each other over and over, a world transformed.
Luckily for us, this particular intelligence was still human, so it had relatively human values and goals, wasn’t sped up, it couldn’t be copied, was only superintelligent within politics and had to work within existing tech, was only weakly smarter than everyone else, and finally got old. With Bismarck gone, what he put together held for exactly the 20 years he predicted, then fell apart.
Dominic also goes over the political situation with respect to AI and how he expects that to play out, which could be summarized briefly as ‘complete shitshow,’ or slightly less briefly as ‘no one paying attention or making it a priority, going to run over everyone, expect same kind of failure we see over and over again, what interventions they do take probably botched attempts to compete themselves, acceleration of tech to ‘beat China’ and regulations that stop good things while failing to stop bad things, assume as always systemic incompetence and lies by all authorities involved.
I can’t exactly argue with any of that.
A general call for short normie explanations, and some responses.
Reasonable AI NotKillEveryoneism Takes
Planned Obsolescence, a new blog. Self-recommending but haven’t checked it out yet.
Nobody is on the ball on AGI alignment. You don’t say. Ball is, however, available.
Is GPT-4 the fire alarm for artificial general intelligence? Some are saying yes.
Jessica Taylor gives the optimistic take on the big scaling push by OpenAI and others.
Current AI is opaque and we got it without new insights, which bodes ill if it leads to AGI, but bodes fine if it actually leads everyone down a path that doesn’t involve AGI.
John Wentworth discusses the possibility of building weak AI tools to help with AI alignment research (e.g. figuring out how to ensure that if we build a powerful enough AI it does not kill everyone). He offers a key insight: If you do not have domain knowledge of the problem your tool is attempting to solve (for almost any domain), your tool will avoid rather than help solve the hard problems of the domain. AI alignment is the opposite of an exception here. What you need to do is first try to solve the problem, then notice what tools would be helpful. Then build them. If you are trying to help by building meta-science tools for pre-paradigmatic fields? Same problem, same strategy.
John says if he was to build a tool, he’d want to take the things good researchers track in their head, and help externalize that tracking. You write out math equations, AI provides prototypical examples, visuals or stories. Or you write text, AI outputs ‘what it’s imagining when reading the text.’ Expanding things like IDEs showing variable-values to things like all values of a big tensor, or showing Fermi estimates of code chunk run times (Fermi runtime estimates would have been valuable to me in the past, I expect this is typical).
As a writer, I notice I am excited by ‘AI tells you what it thinks when it reads your stuff’ as an auto-generated side thing. That sounds highly useful.
The Lightcone offices in Berkeley for AI alignment research are shutting down, the linked post explains the reasoning behind that decision - they were worried that the office wasn’t fostering the type of culture that is capable of making the situation better, and that both it and the broader related ecosystem might be making things much worse by accelerating capabilities. They are still processing the implications.
Benjamin Todd says that while the AI Safety movement (now known as AI NotKillEveryoneism) plausibly sped up AI progress by several years, the alternative world is one in which we get the AI a few years later but safety concerns aren’t a thing. This seems like a plausible alternative timeline.
How much worse would it be to take the models that everyone says not to worry about because they’re not agentic, and train them explicitly to make them agentic?
I too share the intuition that this makes a difference. That does not make it obvious whether the effect of ‘oh I get exactly why this kills us’ outweighs the effect of it killing us somewhat faster and more reliably.
Quintin Pope lays out his specific concrete disagreements with what Eliezer expressed in his podcast about how we are all going to die. This was a good exercise, I hope more people do similar things. On most of these points I think Quintin is wrong, especially about the issue of security mindset, which seems highly central to most such disagreements.
Quintin says ‘there is no creative intelligence plotting your demise’ and that seems like the biggest crux-like thing. Right now, to me, it seems like RLHF and similar alignment techniques, that lack security mindset, work reasonably well when there is no creative intelligence plotting your demise, there are big problems to solve but I would have a lot of hope and optimism there with work and time.
However, when someone is trying to jailbreak the system - a creative intelligence plotting your demise - advantage jailbreaker and the situation seems hopeless, any solutions that actually worked would cripple the system. And I expect the system, made powerful enough to do what we need, to effectively be plotting things - in the best case scenario the things humans tell it to do, but that’s still going to end up being ‘your demise.’
There’s also importantly Quintin essentially being optimistic that humans learn values in a basic way that will carry over well to AIs, and many other points. Again, it’s a mix of ‘that’s interesting’ and ‘good point’ together with a healthy dose of me face palming or being confused. I don’t think the right move is to invest in diving further into details and arguments, but yes there are real arguments out there and this counts.
Either you approach the first AGI capable of enabling a pivotal act (something that would prevent others from building their own AIs that could do their own pivotal acts, and that includes the action ‘melt all the GPUs’) with a security mindset or you don’t. If you do, then you don’t want to fully align this first AGI fully with humans because that’s super difficult and messy. You would want to instead get a narrow channeled capability, within a use where alignment is relatively easy.
I am very worried about the idea of teaching AIs human morality, and am confused by those who seem to have strong faith in a strong form of moral realism. I do not have that much optimism that human morality is coherent or consistent, or that it would function well outside of its training data and original contexts in terms of ensuring the universe has high value or causing us all to not die. My view of the potential implementation-at-scale of the morality of the majority of humans, perhaps most humans, is essentially abject horror, no I don’t see why I should expect combining them to fix this, no I don’t much expect CEV or other forms of simulated reflection to solve this problem either. Like many other problems we will need to solve, I don’t know of a solution I expect to work, I am not even that confident a solution exists to be found, and that seems super scary to me.
Eliezer tries, once again, to explain his view of optimization. GPT takes a shot.
Eliezer tries again to explain, with a long Tweet about the idea that a sofa-dwelling human who mostly sits around not trying to conquer the world is still highly capable of things like navigating to the store, buying a board game about taking over the world, learning the game, challenging their friends and deploying some highly innovative strategies - in human psychology this person is relatively safe, but that is mostly a statement about our psychology.
Eliezer post a very interesting prediction market on Manifold that asks, if we do survive and solve the AI alignment problem, how did we do that? Rob Bensinger lists his detailed thoughts on this.
Odd to use a number as high as >10% there.
Freaking out, and why Elon Musk should not do this: Freak out if and only if and to the extent that it causes you to do something useful that you can’t accomplish better some other way. Good advice in all contexts, not only AI risk.
What would it actually take to usefully limit AI? Yo Shavit essentially says in practice, via a paper and Twitter thread, that the only place to do this is limiting compute or impose safety requirements during training, as that is the only bottleneck.
Ask the AI about how to do something safely, get the news.
I notice that ‘safely’ did not appear in the initial instruction at all. It is good to see AgentExecutor quickly figure out the unfortunate implications. There’s also a follow-up in the thread where it is asked to prevent an apocalypse.
If I was going to run a follow-up (I don’t have AgentExecutor) it would be to see if paperclips are required to alert the AI here, or if it realizes that the key part is having a maximalist goal, or even better any goal at all.
Jon Stokes on the AI Safety Landscape
One post I didn’t get to in time was by Jon Stokes. There are essentially two parts: An objection to the letter that I hadn’t seen before and that I disagree with, and thoughts on the divisions into factions in the new AI policy debate landscape, which I found interesting.
His big challenge to the letter is that the letter calls for us not to make systems that are ‘more powerful than GPT-4’ but we don’t actually have a definition of what makes something more powerful than GPT-4. We don’t even know, he says, what it is that makes GPT-4 more powerful than GPT-3.5 although we agree that it is. And because the letter only calls for a halt in model training, it doesn’t even call for a halt to AI development.
I think this is wrong.
While OpenAI is indeed working to put together good metrics, it has one very good and very predictable measure of performance, which is next word prediction, here as a function of compute applied to GPT-4.
Thus, we could say ‘if your model does at least this well on next word prediction you have to stop.’
Or we could say that there is F(inputs) where inputs is measurable model things like compute, data and parameters, and your F(inputs) can’t exceed X via one of the well-known formulas. Or you simply put a hard cap on each one. It all works.
That is indeed what I believe was the intent of the letter, and is the only reasonable path forward.
Does this prevent performance from being improved in other ways? No. Of course not. So it would not entirely halt development. That is understood.
It is not a universal call to stop all enhancements of power, only to put a cap on plausible power so it doesn’t cross over a dangerous threshold. Would it be reliably safe from here? I mean no, of course it wouldn’t, but it would be a vast improvement.
In general, when the response to ‘make X safer’ is ‘that wouldn’t make X fully safe so let’s instead keep using X without making it safer’ I get confused, unless it is an economic argument.
His second section notes that it can be hard to tell which players are on which sides of which debates, and that often people get them confused. A lot of this is because this is an issue on which the positions are anti-intuitive and highly depend on everyone’s actual model of the world and its physical possibilities and what people actually worry and care about and value.
It is important to notice that those most loudly concerned with existential risk, myself included, call for halting and regulating certain very particular lines of technological development and research - centrally AI, also usually gain of function research and a few similar other things. On every other technology and issue, however, those same people turn around and are accelerationists and for it being time to build.
Then there is the confusion between ‘why haven’t those worried about x-risk started killing people’ and ‘look at these calls for violence’ where the violence in question is state action to enforce regulations.
As Jon notes, the there are a bunch of different concerns about AI, which can be weird when someone shares some concerns but not others.
Jon predicts a Butlerian Jihad, that the accelerationists are outnumbered, and that everyone will be forced to sort into one camp or the other. There will be no room for the kind of nuance I adore so much.
This Is An Ex-Stochastic Parrot
As I’ve mentioned before, when asking what GPT is up to, I am not on Team Stochastic Parrot. I do still see the frequent Stochastic Parrot shaped outputs.
Suggestions include the similar ‘write about X in the style of Y,’ that time when Bing tried to help a toddler and got censored for it, several copies of the coffee cup and the diamond, the chicken finger Earth map and many others. In many ways, what people think counts as the strongest evidence is the most interesting thing here.
Bad NotKillEveryoneism Takes
Yann LeCun comes out and predicts that in five years, ‘no one in their right mind will be using autoregressive language models.’ Reasons here seem real enough - he says that AR-LLMs:
Cannot be made factual, non-toxic, etc.
They cannot be made to reliably give correct answers even in theory.
They are not controllable.
And yes, I believe that is all true, except we are absolutely going to use them anyway unless someone comes up with something ten times better, and fast. I wonder how one can write ‘our current AI systems cannot be made factual, non-toxic, correct or controllable’ while insisting that they aren’t dangerous.
Which is in response to:
That makes perfect sense. Knowing that a problem is a trick question is a big game.
Which is why, no, you don’t append this to every query. Only the right ones.
Yann also says things like ‘Then the probability that an answer of length n is correct is (1-e)^n except as Kevin Fisher points out what, no, come on, really, that doesn’t mean anything. That definition of correct is supremely unhelpful.
I am confused where to even respond to such a statement. This seems like it is very clearly long past when it’s all over.
Perhaps I can try his qualifier at the end. Kling says ‘a lot rides on whether one or the other or neither behaves like a conscious entity.’ By assumption here, the AI might as well be, it’s an agent using consequentialism. By observation, human culture doesn’t do that, that’s not what it is.
If you’ve actually gamed out a consequentialism-using super-intelligence with access to the internet that wants to take over control of the future, versus ‘human culture,’ and you think humans would stand any chance at all? I am deeply, deeply confused what you think happens. Even with lots of unrealistic restrictions on the AI how can it possibly lose?
A different kind of bad take is it is bad to take people’s money, that’s theft and theft is bad, whether or not your spending plan is just. Thus, FTX, and no Future Fund, and no planned $1 billion in AI safety prizes. I do think that if implemented well this would be a good use of funds.
Or you could, you know, think it’s a good idea to, and I quote, run wild. Don’t do that.
I mean, it won’t with that attitude.
The Lighter Side
This isn’t exactly a jailbreak, but it is worth noting that GPT responds quite well to the request ‘wrong answers only.’
Neoliberal one liners. This is the future neoliberals want, seems kind of great.
And his nemesis. Unless that’s him now.
Until next time, everyone.