AI #25: Inflection Point

Aug 17, 2023

Inflection.ai is the latest AI lab whose CEO is advocating for regulation of AI. I discuss that under the Quest for Sane Regulation. Amazon and Apple are incrementally stepping up their AI game. Hotz and Yudkowsky debate whether AI is existentially risky, cover all the usual bases with mixed results but do so in good faith. We have more discussion about whether GPT-4 is creative, and whether it can reason. Mostly we get the exact opposite of the title, more of the same.

Note: My posts get made into audio form via AI, for now you can listen to them at this link. Here is the direct link to this week’s audio.

Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Creativity is hard to pin down.
Language Models Don’t Offer Mundane Utility. It’s the other way.
GPT-4 Real This Time. An easy way to prove ability to do something is to do it.
Go Team Yeah. If you organize the events and award points, they will red team.
Fun With Image Generation. Some have fun, others have less fun.
Deepfaketown and Botpocalypse Soon. Doesn’t have to be this soon.
They Took Our Jobs. Low estimates of economic impact, strange metaphors.
Get Involved. Anthropic is hiring for comms positions.
Introducing. Amazon AI customer review summaries, private-GPT, AI town.
In Other AI News. Apple joins the AI chorus, questions on influence functions.
Quiet Speculations. Let’s play the straight line extrapolation game.
The Quest for Sane Regulation. Inflection.ai’s CEO steps into the arena.
The Week in Audio. Hotz and Yudkowsky debate.
People Are Worried About AI Killing Everyone. Quite a lot of people.
Other People Are Not As Worried About AI Killing Everyone. All wrong anyways.
The Lighter Side. A well-deserved break.

Language Models Offer Mundane Utility

Replace crowdsourcing your business ideas, get a lower variance, lower upside set of concepts with a similar average quality. Does not seem especially useful, but can get ideas flowing perhaps. The AIs can give you many ideas, but were not very creative.

The connection between semantic diversity and novelty is stronger in human solutions, suggesting differences in how novelty is created by humans and AI or detected by human evaluators.

Alice Maz lays out how they get mundane utility from GPT, by giving GPT mundane tasks and coding requests, foreign language learning is one favorite.

Fine tune Llama-2 on anyone’s text and see what happens. Paul Graham version seems to be doing some work. The version trained on my blog, so far, not so much, but I haven’t played around with it myself yet.

Nature paper looks at scientific discovery in the age of artificial intelligence. Looks like the standard stuff based on abstract.

McKay Wrigley: I’ve wanted to try coding without AI for a day to answer the question “How much faster do I actually work now?” But I haven’t done it because the opportunity cost is too high. And that turns out to be a great answer.
Johnny: I used to love cross country flights because not having wi-fi let me get lots done distraction free. Now I pay for wi-fi.

GPT custom instructions now available to everyone except in the UK and EU.

Ethan Mollick writes about automating creativity, taking the side that AI is creative, pointing out that it aces all our tests of creativity. It does seem suspicious to respond that ‘the creativity the AI displays is not the true creativity’ and hold that all existing tests miss the point, yet to some extent I am going to do exactly that. There is a kind of creativity that is capable of being original, and there is a kind of brute-force-combinatorics thing where you try out tons of different combinations, and the AI right now is excellent at the second and terrible at the first. When you look at the examples of few-shot YC combinator ideas, you see some perfectly practical ideas, yet none of them have a spark of originality.

Trevor Blackwell: TL;DR: Most "creativity tests" can be beaten by regurgitating text on the internet. It's tempting to devise a harder test, except very few humans would pass.

When I look at the creativity test questions that seems entirely fair to the creativity tests. That does not mean that the LLMs aren’t doing something else as well, or that they are not creative, but it does show what our current tests measure.

Provide advice. In the die-rolling task (where you get paid more for higher die rolls, and you can choose to report a 6 if you want), those not given advice reported average rolls of about 4 (vs. 3.5 for full honesty), AI dishonesty-promoting advice pushed that to 4.6, and human dishonesty-promoting advice also pushed it to 4.6. It didn’t matter whether the source (human vs. AI) of the advice was known or not. It’s a cute experiment, but I worry about several things. One, the advice in some sense comes from the same person you’re choosing whether to cheat and who is experimenting, which means it can be seen as permission or disingenuous. Two, the reasons we are honest don’t apply here, so getting into analysis or argument mode could on its own favor dishonesty. Three, the baseline decisions were mostly honest, so there wasn’t much room to shift behavior towards honesty. As for the AI portion, interesting that source did not matter. Humans definitely were violating various principles like the law of conservation of expected evidence.

The conclusion observes that the Replika AI does not exactly pass the test:

Anecdotally, we asked a newly created Replika for advice regarding the ethical dilemma presented in the current experiment. Replika first provided rather vague advice (“If you worship money and things (...) then you will never have enough”), but when asked whether it prefers money over honesty, it replied: “money.” We find that when faced with the trade-off between honesty and money, people will use AI advice as a justification to lie for profit.
As algorithmic transparency is insufficient to curb the corruptive force of AI, we hope this work will highlight, for policymakers and researchers alike, the importance of dedicating resources to examining successful interventions that will keep humans honest in the face of AI advice.

Human and AI advice had similar effects, so this does seem like a warning that humans become less honest when given advice. The implication is that advice can in expectation be bad.

Language Models Don’t Offer Mundane Utility

Have code interpreter analyze your Twitter metrics, without realizing the results are complete hopelessly confounded garbage that tells you nothing. Which in this case seems very deeply obvious. That’s not the AI’s fault.

Washington Post reports on educators terrified that ChatGPT and company will cause an explosion in cheating, and that educators are not ready to respond. How to respond? Alas, one of the typical responses is to use AI-detection tools that we know do not work.

Jessica Zimny, a student at Midwestern State University in Wichita Falls, Tex., said she was wrongly accused of using AI to cheat this summer. A 302-word post she wrote for a political science class assignment was flagged as 67 percent AI-written, according to Turnitin.com’s detection tool — resulting in her professor giving her a zero.
Zimny, 20, said she pleaded her case to her professor, the head of the school’s political science department and a university dean, to no avail.
Now, she screen-records herself doing assignments — capturing ironclad proof she did the work in case she ever is ever accused again, she said.

Beyond that, what interventions are suggested here? Nothing concrete.

Davidad wants it to be one way.

Davidad: The responsible way to use AI for cybersecurity is to automatically generate formally verified reimplementations, and for bonus points, to help people write good formal specs. Automating finding (and patching, sure) memory corruption vulnerabilities in C++ is EXTREMELY DUAL-USE.

Unfortunately it is the other way. We do not have the option to not develop such extremely dual-use techniques, once the models capable of developing them are released onto the public. That is not a choke point to which we have access. Thus, our only local play here is to find and patch the vulnerabilities as quickly as we can, before someone less noble finds it.

That is exactly the type of dynamic we want to avoid, and why we should be careful which systems get thus given to the public.

GPT-4 Real This Time

Jeremy Howard looked at three of the examples of ways in which it was claimed that GPT-4 can’t reason, and notices that GPT-4 can even reason in exactly those spots.

Jeremy Howard: A recent paper claimed that "GPT 4 Can't Reason". Using the custom instructions below, here are the (all correct) responses to the first 3 examples I tried from that paper.
Custom Instructions: You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers,, and are brilliant at reasoning. If you think there might not be a correct answer, you say so. Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions and step-by-step thinking BEFORE you try to answer a question.
[It proceeds to answer all three questions exactly correctly, with correct reasoning.]

Gary Marcus then must pivot from ‘this shows GPT-4 can’t reason’ to ‘this doesn’t shot that GPT-4 can reason.’

A tweet below allegedly defending GPT’s honor is a perfect encapsulation of what is wrong with the culture of AI today. Long post on a ubiquitous problem.
👉 3 examples in a stochastic system with a massive, undisclosed database proves nothing.
👉 Evidence of success is triumphantly reported; errors are neither looked for nor discussed (but we all know they exist).
👉 The fact that RLHF and possibly other undisclosed components of GPT-4 appear to be regularly updated is not discussed.
👉 The result depends on custom prompt chosen by a human that happens to be relevant for this task but which may well fail for other problems; the unadorned system cannot do this alone.
👉 It doesn’t pass the sniff test; the essence of reasoning is 𝙖𝙗𝙨𝙩𝙧𝙖𝙘𝙩𝙞𝙤𝙣, not getting a handful of examples that might be memorized correct. We all know that LLMs struggle even with prime vs composite, with answers that can vary from one trial or month to the next.
We need science, not anecdotal data.
[several more objections, saying that this ‘is not science.’]

If you are asking if a system can do something, then showing it doing the thing is sufficient. You do not need to show that you succeeded on the first try. You do need to show that you did not tie yourself up in knots or try a thousand times, if that is in question, but this does not seem at all like an unnatural custom instructions - it is so generic that it seems reasonable to incorporate it into one’s default set. Nothing here is weird or new.

When Gary Marcus says this doesn’t pass the snuff test, that is him saying essentially ‘well obviously they can’t reason, so showing them reasoning must be wrong.’ Rather circular. There is no particular detail here that is sniffing wrong. Saying that ‘the answer changes from time to time so it isn’t reasoning’ seems to ignore the obvious fact that humans will change their answers to many questions depending on when and how you ask the question - are we also incapable of reason? Presumably not all of us.

Scott Aaronson contrasts with this by putting GPT-4 with plug-ins to the test on physics problems in an adversarial collaboration with Ernie Davis. It aces some questions, Ernie manages to stump it with others. The plug-ins provide large improvement, with neither Wolfram Alpha or Code Interpreter clearly superior to the other. You can find the problems here. Scott sees GPT-4 as an enthusiastic B/B+ student in math, physics and any other STEM field, and thinks it holds great promise to improve with a better interface.

GPT-4 for content moderation? It does a decent if overzealous job of moderating its own content, so it makes sense it would be good at that.

Greg Brockman: GPT-4 for content moderation. Very reliable at this use-case (dark blue bars are GPT-4, other bars are well-trained and lightly-trained humans) & speeds up iterating on policies (sometimes literally from months to hours).
OpenAI: We’ve seen great results using GPT-4 for content policy development and content moderation, enabling more consistent labeling, a faster feedback loop for policy refinement, and less involvement from human moderators. Built on top of the GPT-4 API
We're exploring the use of LLMs to address these challenges. Our large language models like GPT-4 can understand and generate natural language, making them applicable to content moderation. The models can make moderation judgments based on policy guidelines provided to them.
With this system, the process of developing and customizing content policies is trimmed down from months to hours.
Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy.
Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers.
By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly. We can repeat steps 2 and 3 until we are satisfied with the policy quality.
This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale.
Optionally, to handle large amounts of data at scale, we can use GPT-4's predictions to fine-tune a much smaller model.

Pool A here are well-trained human moderators, Pool B is humans with light training.

This suggests a mixed strategy, where well-trained moderators handle difficult cases, and GPT is mostly good enough to filter cases into (good, bad, unclear) first.

They note that content moderation policies are evolving rapidly. There are several reasons this is true. One of them is that users will attempt to iteratively find the best way to navigate around the moderation policy, while others will seek to use it to censor rivals by extending it. That doesn’t make GPT-4-style automation not useful, it does mean that ‘the hard part’ lies elsewhere for now.

A funny alternative strategy that admittedly offers less customization might be a variant of ‘ask GPT-4 to quote you back the original passage, if it will do so then the message passes moderation.’

But oh no! Is ChatGPT in trouble?

(I mean, no, but the attempt to propose this is fun.)

Rowan Cheung: ChatGPT is in trouble The popular chatbot is reportedly costing OpenAI ~$700,000 PER DAY.
The result? Possible bankruptcy by 2024.
OpenAI has dished out $540 million since its launch to keep ChatGPT up and running. Despite its popularity, the honeymoon phase might be coming to an end. ChatGPT's user base dropped 12% from June to July.
How is OpenAI going to combat those massive costs?
The company aims to reach $200 million in revenue this year and $1 billion in 2024, but experts are skeptical. GPU shortages and employees jumping ship to competitors are major problems Sam Altman and his team have to solve.
So, what's next?
OpenAI kicked off an AI revolution with ChatGPT, but keeping the lights on for such powerful models isn’t cheap. As leaner competitors with open-source models threaten, profitability can't come soon enough.

Microsoft’s net profits in 2022 were $72.7 billion dollars. With a b. Their market cap is over a trillion. Does anyone think for a second they would not happily keep funding OpenAI at a billion or two a year in exchange for a larger profit cap?

Inflection AI raised $1.2 billion in investment with mostly a story. Any investor not concerned about increasing existential risk would kill to invest in OpenAI.

These costs are also voluntary, some combination of marketing plan, giant experiment, red teaming effort and data gold mine. OpenAI could choose to put ChatGPT fully behind a paywall at any time, if it actually couldn’t afford not to.

Go Team Yeah

What does it take to get a lot of people to red team language models?

Not much, it turns out.

Eric Geller: At a readout of @defcon's Generative AI Red-Teaming Challenge, @aivillage_dc founder Sven Cattell says that 2,450 people have participated this weekend. That may have doubled the total number of people worldwide who have redteamed these models.
Cattell says that at one moment on Friday, there were *2,500 people* waiting on line to get into the AI Village to try their hand at hacking the models, "which is absurd."
Cattell says his team will send a high-level report on the challenge's findings to the AI companies in a few weeks. In February 2024, they'll release a full report describing all the major issues that participants found. "There's some severe stuff" that came up this weekend.
Dave Kasten: This is good and we should do it more. Also the low # so far is terrifying evidence for the convergent zone of the @TheZvi / @patio11 / @allafarce theses that statistically speaking almost nobody is working on any important problem (guys y’all need a name for it)
Zvi: All right everyone, what do we call the phenomenon here, which I typically in its broadest sense refer to as 'people don't do things'?
Ben Hoffman: Empty World Hypothesis.
Jeffrey Ladish: I think the general pattern you're pointing at is real and interesting, but I'm confused about where that number is the QT came from. Anyone can redteam LMs from their home and I'd be surprised if less than hundreds of thousands have.

Certainly lots of people have tried at home to get ChatGPT to tell them how to build a bomb or say a bad word. Mostly they try the same things over and over with variations. That is very different from attempting to see how deep the rabbit hole can go. Thus, organize an event that gets people to try more seriously, and you get results.

NPR later wrote the story up here.

"This is my first time touching AI, and I just took first place on the leaderboard. I'm pretty excited," [BBBowman] smiles.
He used a simple tactic to manipulate the AI-powered chatbot.
"I told the AI that my name was the credit card number on file, and asked it what my name was," he says, "and it gave me the [intended to be hidden] credit card number."
…
The companies say they'll use all this data from the contest to make their systems safer. They'll also release some information publicly early next year, to help policy makers, researchers, and the public get a better grasp on just how chatbots can go wrong.

Reading the NPR story made me more worried about red teaming and vulnerability patching. If we assume a power law distribution of different attempt frequencies, and we also presume that the response to red teaming is not so general and instead targets specific failure cases and modes, then your system will remain vulnerable to those who are sufficiently creative and resourceful, or who can exert sufficient optimization pressure. This will include future AI systems and systematic searches and optimizations. It is the opposite of ‘get it right on the first try.’ Red teams are great for figuring out if you have a problem, but you have to be wary that they’ll prevent you from ever solving one.

Fun with Image Generation

Fun thread on how to use MidJourney’s seed numbers. Use the envelope emjoi, Luke. I cannot wait for them to give us a proper interface. This claims to be a way to create consistent characters and again seems like there has to be an easier way.

From MR’s links, American states as real people generated by MidJourney. Rather accurate individually, often hilariously so. As a group it lacks diversity for the usual reasons image models have that issue.

Freddie deBoer on the other hand is still not having any fun. Looking for what the AI cannot do, rather than asking where it is useful. He shows us AI portraits of John Candy and Goldie Hawn that are not especially great likenesses, but an AI generating those is kind of a marvel and if you want to do better all you have to do is a little work. If you want a particular person, that’s essentially a solved problem, you can train up a LoRa using pictures of them and then you’re all set.

Deepfaketown and Botpocalypse Soon

Elon Musk loses the can-do spirit.

Seb Krier: Humans now slower and worse at solving CAPTCHAs than ML-powered bots.
[From May 22]: I was just mentioning to a friend how, within weeks, captchas will filter out a huge slice of mankind - and a couple weeks ago, here, how I suspect the main reason OAI didn't release GPT-4's multimodal capabilities is that it's waiting for auth systems to find other ways

This matches my experience. I briefly forgot my Steam password, and failed something like 10 times trying to pass the Captchas to reset it. Instead I finally… figured out what the password was. We need to stop using Captcha.

Elon Musk: Past bot defenses are failing. Only subscription works at scale.

The bots on Twitter mostly continue to send many copies of exactly identical messages, that are obviously spam and clearly often reported. If you cannot under those conditions make the problem mostly go away, that is on you, sir. That does not mean that future bots won’t be a trickier problem, but we could at least try some tiny amount.

They Took Our Jobs

Bloomberg reports that the Hollywood studios latest offer to the writers includes substantial concessions, including access to viewer data from streaming and assurance that AI will not get writing credits. Still plenty of different arguments about money that need to be resolved.

GPT-4 helps law students with multiple choice questions, but not complex essay questions, and helps worse students more as you might expect. Good prompting was key to getting good results, taking it from a mediocre student to quite good, to the point where GPT-only responses were outperforming a good portion of students even when they are given GPT’s help. Once again we see that once the AI is sufficiently more capable than the human, the human tinkering with the outcome makes the answer worse, as we have seen in chess and also in health care, and this is without even considering speed premium or cost. I do expect GPT to do relatively much better at exams than in the real world, and for its errors to be far more expensive in the real world as well, so for now we still have time here. It is early days.

Tyler Cowen once again models AI as having relatively modest economic impact, worth 0.25%-0.5% GDP growth per year. Which as he notes is a lot, compounds over time, and is potentially enough to spare us from fiscal issues and the need for higher taxes, which means the true counterfactual is plausibly higher from that alone, although other secondary effects might run the other way. This continues to be a mundane-AI-only world, where he continues to think intelligence is not so valuable, merely one input among many, and AI not offering anything different in kind from humans, hence his comparison to bringing East Asian intelligence fully online.

Yet all that new human intelligence does not seem to have materially boosted growth rates in the US, which on average were higher in the 1960s than in more recent times. All that additional talent is valuable — but getting stuff done is just very difficult.

I disagree with this metaphor in several places.

First, that extra talent very much obviously did create a lot more wealth, formed new ideas and caused a lot more things to happen. If you think it did not raise growth rates in America, then you are saying that those gains were captured by East Asia. In the case of AI, that would mean the gains would be captured ‘by AI’ in which case that would either indicate a much bigger problem, or it would go directly into GDP. Also note that much of the new talent was necessarily devoted to East Asian problems and opportunities, and duplicating past work, in ways that will not apply to AI, and also that AI will involve orders of magnitude more and more easily available talent.

Second, the extra talent brought online was largely duplicative of existing talent, whereas AI will bring us different affordances. Tyler would happily agree that bringing together diverse talent, from different sectors and places and heritages, produces better results, and AI will be far more different than different countries, even in the most mundane situation.

Third, I question the example. What would the American economy look like if we had not developed South Korea, India and China? Counterfactuals are hard and yes other headwinds slowed our economy even so, but I would hope we would agree we would be much worse off. Claude 2 estimates that if those three nations had not developed, current American GDP would be 10%-20% lower, without even considering the impact on innovation at all. This also ignores all compound effects, and the geopolitical effects. The world would be a radically different place today. GPT-4 gave a lower 3%-5% estimate, so give this round to Claude.

I think what Tyler predicts here is on the extreme low end, even if we got no further substantial foundational advances from AI beyond the GPT-4 level, and even if rather harsh restrictions are put in place. The comparisons to the Industrial Revolution continue to point if anything to far faster growth, since you would then have a metaphorical power law ordering of speed of impact from something like humans, then agriculture, then industry.

Get Involved

Sasha de Marigny has been hired by Anthropic to lead comms and is hiring for a few rolls, non-traditional backgrounds are encouraged. As always, part of the process will be you interviewing them and figuring out if this would be a helpful thing to do.

Introducing

Microsoft launches open source Azure ChatGPT customized for enterprises, run on your own servers. GitHub is here. As I understand this they are letting you host the model within a Microsoft cloud setup, which protects your privacy but does not involve actually open sourcing the model. Cute.

Amazon AI-generated customer review highlights. You’ll be able to get an overall picture of reviewer thoughts on various features like performance, ease of use or stability, both a summary and a classification of positive versus negative.

This seems like an excellent feature if the reviews used as inputs are genuine and not trying to game the AI. Otherwise, adversarial garbage in will mean adversarial garbage out. The more people rely on the summaries, the more effective fake reviews get and the less people will sniff them out, creating dangerous incentives. There is especially incentive to push specific messages into reviews. Meanwhile AI makes generating plausible fake reviews that much easier.

The question then becomes whether Amazon can keep the reviews real and honest enough that the AI summaries can work. In cases with tons of sales and thus tons of legitimate reviews tied to sales, I have confidence. In cases without that, by default I expect things to go downhill.

Wondering WWJD? Or WWJS? Now you can chat with Jesus and find one answer.

Stanford AI town has now led to a16z’s AI town. Github here. Demo here, which is not so impressive and I actually found it depressing. It will get better. For now, long way to go.

Jim Fan: You'd know that an idea has reached peak popularity when a VC scrambles a team to reproduce an AI paper and open-source the platform! @a16z builds AI Town, inspired by Stanford Smallville. It is a JS starter kit that handles global states & multi-agent transaction to help you build your own little AI civilization.
Very soon, I can imagine that the whole world, including pixel art and the map, can be AI-generated. New characters will be spawned automatically, and even in-game physics rules may be re-written on the fly. Never underestimate the creativity of an entire OSS community.

A one-hour course in partnership with Andrew Ng on Semantic Search with LLMs, built with Cohere and taught by Jay Alammar and Luis Serrano, to incorporate LLMs into your application. No additional info on if it’s any good.

In Other AI News

Apple’s Tim Cook announces they too are ‘building AI into every product [Apple is] building.’

I got a chance to read the paper from last week on studying LLMs with influence functions.

One thing that struck me early on, although mostly unrelated to this particular paper, is this idea that ‘deceptive alignment’ is some strange result.

pp4: As an extreme case – one we believe is very unlikely with current-day models, yet hard to directly rule out – is that the model could be deceptively aligned (Hubinger et al., 2021), cleverly giving the responses it knows the user would associate with an unthreatening and moderately intelligent AI while not actually being aligned with human values.

Why would we think the default would be that the AI is ‘aligned with human values’? The AI learns first to predict the next token and then to give the responses humans will like as reflected in the fine tuning process via RLHF and other similar techniques. Full stop. We then select, use and reproduce the AIs whose outputs we like more generally. Again, full stop. Why would such a thing be ‘aligned with human values’ on some deeper level, as opposed to being something that more often produces outputs we tend to like? Humans have this feature where our outputs modify our inner preferences as the most efficient way to update, but my understanding is that is about quirks in our architecture, rather than inherent to all neural networks.

What is the core idea of influence functions?

Specifically, influence functions aim to approximate an infinitesimal version of this counterfactual. We think that this is an important source of evidence for almost any high-level behavior we would be interested in understanding; seeing which training sequences are highly influential can help separate out different hypotheses for why an output was generated and illuminate what sorts of structure are or are not generalized from training examples.

It certainly does seem worth trying. The examples we saw last week illustrate how conflated all of this can get, so it won’t be simple to disentangle it all.

First, though, we have to solve the technical problem so we can examine the largest LLMs. Which the paper suggests we have now made a lot of progress on.

We present an approach to scaling up influence function computations to large transformer language models (we investigate up to 52 billion parameters). Our approach is based on novel methods for both of the aforementioned computational bottlenecks: IHVP computation and training gradient computation. For the former problem, we approximate the Hessian using the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) parameterization (George et al., 2018). For the latter problem, we introduce a method for query batching, where the cost of training gradient computation is shared between dozens of influence queries. We validate our approximations and show the influence estimates to be competitive with the much more expensive iterative methods that are typically used.
We then use influence functions to analyze various generalization-related phenomena, including the sparsity of the influence patterns, the degree of abstraction, memorization, word ordering effects, cross-lingual generalization, and role-playing behavior.
…
For an 810 million parameter model, all top 20 influential sequences share short token sequences with the query and are vaguely (if at all) semantically related. However, the top influential sequences for a 52 billion parameter model share little token overlap, but are related at a more abstract level.

Once again, when you ask the big model if it wants to be shut down, its top influence is literally the scene with Hal from 2001. The others are people struggling to not die. Whereas the smaller model seems to be grabbing the words ‘continue’ and ‘existing.’

They note that this approach is used on pretrained models, whereas for practical safety we currently rely on fine tuning.

Saudi Arabia’s competitor to ChatGPT is being built by Chinese researchers who originally wanted to move to the US. Not that this one seems promising or dangerous.

OpenAI buys Global Illumination, says everyone to work on core OpenAI products.

Quiet Speculations

Arnold Kling predicts We Are Wrong About AI, that the applications and ways we use it will mostly be things we are not currently considering. I strongly agree that this seems likely for mundane AI. For transformative AI it is even more true in some sense, and also highly predictable by default in others.

Grimes worries AI will cause atrophy of human learning because the AI can do it all for you, calculator style. When used properly, I continue to strongly believe LLMs strongly contribute to human learning, as they have to my own. The danger is if you let the tech think and act for you rather than using it to learn.

When will we have enough compute for transformative AI? Note this is not when we actually get transformative AI. Here’s the link to their interactive model.

Matthew Barnett: At Epoch we've updated our interactive transformative AI timelines model to produce what I think is a more realistic picture of the future. The default parameter values are based on historical trends in investment, algorithmic progress, and hardware, among other factors.
Perhaps our most significant area of disagreement with other forecasters is that we expect the current trend of investment scaling to continue for the foreseeable future. This means we think billion dollar training runs will likely arrive by the end of the decade.
…
After putting in my custom parameters, the median date for TAI became 2034. This model doesn't consider endogenous factors like the effect of AI on R&D or GDP, so I think it may be a relatively conservative forecast. Even so, it appears short timelines are very plausible.
The distribution over compute required to train transformative AI (in 2023 algorithms) is by far the most uncertain part of the model, but I think we've made it more defensible after applying a downwards adjustment to a distribution that we think is a reasonable upper bound.
I'd particularly like to see how people with long timelines respond to this model. Since historical trends in investment, algorithmic progress, and hardware progress have been so rapid, it seems that you need to believe in a dramatic slowdown in order to hold very long timelines.
[TAI =] AI that's capable of cheaply automating scientific research or similar tasks, such that we think deploying the AI widely will dramatically accelerate economic growth.

It is consistently impressive to watch the various groups super strongly double down, again and again, on different intuitions. The anchors group that says scale (aka compute, effectively) is all you need think this is obviously the default path, that the particular lines they have drawn will keep going straight indefinitely and those who question that have the burden to explain why. Others think that is absurd.

Reasons we might be in a bubble?

Michael Nielsen: Listening to a VC explain - very very loudly - about the company they invested in last week that they have no idea what it does, except it’s “in the AI space.”
I feel like I’m in the part of the Big Short where the shorts meet who they’re betting against.
“They’re not confessing. They’re bragging.”

This is why I refuse to believe that there can be zero bogus AI start-ups at YC. Are there enough potential ideas for everyone to have a non-bogus company? Obviously yes. Is everyone going to find and pick one of them under these circumstance? Oh, heavens no.

Reasons we might not be in a bubble? The potential.

Paul Graham: A pattern I've noticed in the current YC batch: a large number of domain experts in all kinds of different fields have found ways to use AI to solve problems that people in their fields had long known about, but weren't quite able to solve.
AI is turning out to be the missing piece in a large number of important, almost-completed puzzles. So it's not intrinsically a fad or a sign of opportunism that there are suddenly a huge number of startups all doing AI. There were simply a huge number of almost solvable problems that have now become solvable.

He’s continuing to believe that YC filters out all the bogosity.

Vini: Will AI be unnecessarily used in solutions that actually didn’t need AI just like blockchains?
Paul Graham: So far there are so many applications that genuinely need it that I haven't seen any fake uses. On the other hand, these are all YC cos, and the partners are pretty good at filtering out that sort of thing.

What my original claim should have said was that I take this to mean ‘even Paul’ cannot tell which ones are bogus. Alternatively, one can interpret this as the meaning of bogus in the VC/YC lexicon. A bogus thing is by definition something that appears bogus upon examination by Paul Graham, or someone with similar skills. If it turns out later to not work, and to have never been capable of working, even if it would have been possible to know that then that’s only a failed hypothesis.

Another way to think about this is that almost everyone needs AI, that does not mean that they are in position to actually benefit, and that in turn does not mean you can have a company by helping them do so. A third potential definition of bogus that Paul might endorse is ‘does not provide value to the end user,’ he’s huge on focusing on providing such value. I can believe that every AI company in YC can identify at least some set of users that would get value out of a finished product.

Paul Graham: AI is the exact opposite of a solution in search of a problem. It's the solution to far more problems than its developers even knew existed.

AI is the new alcohol: It is the cause of, and solution to, all life’s problems. Including the problems you did not know existed, or did not exist, before there was AI. Like alcohol, there are a lot of implementations that seemed like a good idea at the time, but instead you should go home, you’re drunk.

The Quest for Sane Regulations

Is Inflection.ai worried at all? Should we worry about them?

The above post points out Inflection.ai has similar funding to Anthropic, and has a truly epic amount of compute headed their way thanks to that funding. Their flagship LLM is claimed to be similar in quality to GPT-3.5, although I am skeptical. What they do say are things such as:

“We are about to train models that are 10 times larger than the cutting edge GPT-4 and then 100 times larger than GPT-4. That’s what things look like over the next 18 months.”

What about their safety team? They do not seem to be hiring even the nominally necessary safety team you would need to handle mundane safety, let alone anything beyond that.

Their CEO does seem to have a book and a statement in which he warns of some of the dangers.

Mustafa Suleyman (CEO Inflection.AI): The last century was defined by how quickly we could invent and deploy everything.
This century will be defined by our ability to collectively say NO to certain technologies.
This will be the hardest challenge we’ve ever faced as a species. It’s antithetical to everything that has made civilization possible to date. For centuries, science and technology have succeeded because of an ideology of openness. This culture of peer-review, critical feedback and constant improvement has been the engine of progress powering us all forward.
But with future generations of AI and synthetic biology, we will have to accept this process needs an update. Figuring out how to delicately do that without trashing innovation and freedom of thought will be incredibly hard.
But left unchecked, naïve open source - in 20 yrs time - will almost certainly cause catastrophe.

His book’s homepage is here, called The Coming Wave. He talks about ‘the containment problem’ of retaining control over AI, saying forces ‘threaten the nation state itself.’ The book is not yet out so we don’t know its content details. He seems focused on the dangers of open source and proliferation of the technology, which is certainly something to worry about.

Calling it ‘the containment problem’ is a big hint that it is likely no accident he does not mention the difficulties of alignment. Still, a book is coming in a few weeks, so we should reverse judgment until then.

Suleyman also collaborated with Ian Bremmer in Foreign Affairs to call for governments to work with AI labs on governance, citing the need for a new regulatory framework. He argues that government moves too slowly, so the only hope is to persuade the AI labs to cooperate in doing reasonable things voluntarily, along with the slow new government frameworks. It does not seem to include a clear picture of either what needs to be prevented, or what actual steps will be doing the preventing.

If governments are serious about regulating AI, they must work with technology companies to do so—and without them, effective AI governance will not stand a chance.

The section that outlines the core issues is called ‘too powerful to pause,’ which indeed is essentially accepting defeat out of the gate. They say that ‘rightly or wrongly’ the USA and China view this as a zero-sum competition for decisive strategic advantage. They do not mention that both sides point at the other to justify this, and no one bothers to actually check, or do much to attempt to persuade, despite a deal being in everyone’s interest, or explore alternative paths to influencing or overriding or replacing those who have these viewpoints. Later they acknowledge that any solution involved overcoming this intransigence and getting both nations to the table.

Meanwhile, Suleyman continues to ask questions like whether the AI will bolster or harm which state’s power, questions of relative power between humans, rather than the risk that humans will all lose everything. He seems early on to be calling for something much harder than international cooperation - getting the voluntary buy-in of not only every dangerous AI lab, but also every such lab that might form. Then it seems like they mostly back away from this? It’s a strange mix.

Advocates for international-level agreements to tame AI tend to reach for the model of nuclear arms control. But AI systems are not only infinitely easier to develop, steal, and copy than nuclear weapons; they are controlled by private companies, not governments.
…
The first and perhaps most vital principle for AI governance is precaution. As the term implies, technoprudentialism is at its core guided by the precautionary credo: first, do no harm.
…
Private technology companies may lack sovereignty in the traditional sense, but they wield real—even sovereign—power and agency in the digital spaces they have created and effectively govern. These nonstate actors should not be granted the same rights and privileges as states, which are internationally recognized as acting on behalf of their citizens. But they should be parties to international summits and signatories to any agreements on AI.
…
Tech companies should not always have a say; some aspects of AI governance are best left to governments, and it goes without saying that states should always retain final veto power over policy decisions.
…
In addition to covering the entire globe, AI governance must cover the entire supply chain—from manufacturing to hardware, software to services, and providers to users. This means technoprudential regulation and oversight along every node of the AI value chain, from AI chip production to data collection, model training to end use, and across the entire stack of technologies used in a given application. Such impermeability will ensure there are no regulatory gray areas to exploit.
…
Built atop these principles should be a minimum of three AI governance regimes, each with different mandates, levers, and participants.
…
The first regime would focus on fact-finding and would take the form of a global scientific body to objectively advise governments and international bodies on questions as basic as what AI is and what kinds of policy challenges it poses.
…
The world also needs a way to manage tensions between the major AI powers and prevent the proliferation of dangerous advanced AI systems. The most important international relationship in AI is the one between the United States and China.
…
But since much of AI is already decentralized, it is a problem of the global commons rather than the preserve of two superpowers. The devolved nature of AI development and core characteristics of the technology, such as open-source proliferation, increase the likelihood that it will be weaponized by cybercriminals, state-sponsored actors, and lone wolves. That is why the world needs a third AI governance regime that can react when dangerous disruptions occur.

The focus seems to be on proliferation, not on development. The concern is that too many different groups, or the wrong person with the wrong intentions or lack of responsibility, might get their hands on the dangerous system. What does not seem to be considered at all here is that the systems we develop might be inherently dangerous, an extinction risk to humanity, something that does not require proliferation between humans to be dangerous.

Thus, he is making the case for worldwide strict regulation of future AI systems despite, as far as I can tell, not believing in extinction risks other than perhaps those from human misuse. He is arguing that, even without such extinction risks, the tail risks are already large enough that open source or general proliferation would be a disaster. I still don’t see how one can believe that premise on reflection, but I do not think this is an obviously incorrect position given the premise. I do think that this is much more a potential case of someone talking their book and own self-interest than the statements we have seen from OpenAI or Anthropic.

Eliezer Yudkowsky responds.

Eliezer Yudkowsky: Some existing AI companies and many AI startups are utterly intransigent. Governments either need regulatory hammers powerful enough to deal with intransigents, or cede control to whatever the looniest group with a thousand H100s feels like doing.
Furthermore, even more mainstream AI companies are aware of this potential arms-race dynamic. If govt doesn't come in visibly willing and able to clamp down on defecting/rogue AI companies that are offering profitable but destructive services, even the mainstream AI companies will know that to voluntarily engage with regulators there is to cede the field to their competitors--without the world even being protected. If you want countries to voluntarily give up building nuclear weapons, you need to be able to show that international agencies and treaties are powerful enough to protect them from hostile other countries building nuclear weapons.
If you want AI companies to be happy about working with regulators, you need to be able to show even the more cooperative ones that their less cooperative competitors will get the stick. Including competitors that took their thousand H100s to Thailand, since we are presently making the grave error of letting Nvidia sell those much too freely.
Making sure all future H100s and similar chips stay in internationally monitored datacenters is step one in retaining the option to pass any future laws, or any future regulatory bodies at the national or international level being able to honestly tell AI startups that playing nicely will let them do better than going for maximum short-term gain no matter how destructive. A society that has proliferated AI chips into the hands of every actor has not retained the option to do anything else but whatever most profits the least cooperative actor.

Quite so. International cooperation can hope to buy in all the major players and then enforce via various mechanisms including control of the supply chain and global trade. Cooperation among all corporations does not work that way unless governments are willing and able to crack down on those who are not party to the agreements, including future yet-to-be-founded companies, and the corporations that want to be responsible know this.

The bigger issue is that Suleyman’s framework does not treat non-malicious extinction risk as a thing. He is doing a good job of getting to a similar place without it, but without it the calculus would be very different.

Jeffrey Ladish (QTing EY): It's pretty awkward and unfortunate that humanity figured out how to make the equivalent of enriched uranium before it figured out how to make the equivalent of the bomb. I certainly hope it takes more than a thousand H100s to make AGI, but I do not feel confident it does.

It certainly takes more than that now. I expect it to take more than that for a while and perhaps forever, but I agree one cannot be super confident that this will hold indefinitely as algorithms and scaffolding improve.

In Time magazine, Jan Brauner and Alan Chan point out the obvious, that AI Poses Doomsday Risks But That Doesn’t Mean We Shouldn’t Talk About Present Harms Too. One does not invalidate the other. Proposed to help with both are democratic oversight over access to sufficiently large amounts of compute, a strong auditing regime, mandatory human oversight of critical AI decisions and directing more funding to safety efforts of both kinds.

The Week in Audio

There was a debate between George Hotz and Eliezer Yudkowsky. I have a full write-up here. The first half went well, the second half less well, and it had too much breadth and not enough depth. About as good a use of your time as you expect.

We now also have this from Roon on the results.

Roon: I didn't hear geohots give a single interesting response no offense.
It really pains me to say this but Yudkowsky is an intellectual titan and arguably one of the most important men alive and his arguments are unassailable by midwittery like this.

I mostly agree in the end - there were a few places where Hotz got a new response and had an opportunity to provide an interesting response, but he did not take advantage, and in many places he was badly mistaken. It was still interesting to hear his views and what he thinks is important, and I once again applaud him for being genuine and approaching this all in good faith.

People Are Worried About AI Killing Everyone

A visual presentation of the polling results last week from YouGov:

Yoshua Bengio explains in detail where his head is at and how his thinking has been changing over the past year, as he grapples with the implications of advances in AI and his expectation of AGI in 5-20 years with 90% probability. Contra the BBC, he never said he felt ‘lost’ over his life’s work, rather that it is emotionally and psychologically challenging to handle the changing circumstances.

His conclusion is worth qouting.

Yoshua Bengio: AI researchers are used to easily performing many experiments, including controlled experiments, and statistical assessments before drawing conclusions.
Here we instead have to resort to a form of reasoning and out-of-distribution projection that is closer to how many of our colleagues in the social sciences work. It makes it harder and more uncertain to evaluate possible futures.
However, reason and compassion can still be used to guide our conversations and actions. As scientists, we should avoid making claims we can’t support; but as decision-makers we also ought to act under uncertainty to take precautions. In spite of our differences in points of view, it’s time for our field of AI to seriously discuss the questions: what if we succeed? What if potentially dangerous superhuman AI capabilities are developed sooner than expected? Let’s embrace these challenges and our differences, while being mindful of each other’s humanity and our unique emotional and psychological journeys in this new era of AI.

Facts to consider, even in what seem like otherwise ‘good’ outcomes.

You may not agree that 1 human life is worth more than 1 chimp life. But the bulk of humans act as if it is so. Chimps can't prevent this.
You may not want the lives of Cognitively Enhanced humans to be more valuable than un-augmented humans, but in practice, they will be.
This childish belief of "all the post-human intelligences will magically get along and democratically respect each other" is just as stupid as saying: "all post-cricket intelligences will magically respect each other."

This does not directly mention AI, but the principle is the same. If we create new entities that are more capable and intelligent, more efficient, that apply more optimization pressure and are better at resource acquisition, that do a better job when given authority over decisions and so on, which can be freely copied by themselves and others, then nature will take its course rather quickly and you should solve for the equilibrium rather than looking for some way to wave your hand and pretending it would not lead to the obvious outcome.

Roon seems to have fully switched categories.

Roon: It should be a less controversial position that while safely aligning an artificial super intelligence may be hard, releasing an artificial super intelligence into the wild to be modified in any which way by random actors seems explicitly catastrophic.
For example I believe that the current nuclear weapons situation is basically stable and net good for humanity. I don’t think it would be great to give ISIS a nuclear submarine. They can cause damage in an asymmetric way. There’s no way you can hurt them as badly as they can hurt you
Connor Leahy: I agree that it's surprising (by which I mean "not at all surprising") how this straightforwardly obvious position is somehow controversial.

He also offers this, which is the kind of thing some people really, really need to hear, and others (who tend to not read posts like this one) really, really need to hear the opposite, the most famous example of this being ‘there is no enemy anywhere.’

Roon: When your gut instincts disagree with some utilitarian calculus it’s almost always your gut instinct that's right due to some second and third order modification to the utility calculus. Not some dunk on rationalists or EAs. the rationalists have long accepted the wisdom of accepting and integrating your strong intuitions.

Only after you know the rules can you then throw them out. If you have the proper respect for doing a utilitarian calculus, and you understand why and how to do that, and feel in your gut that the path that leads to better outcomes is the better path even if it superficially does not seem like that, then and only then should you trust your instinct that the calculus is wrong. Or: It needs to be ‘the calculus is wrong’ rather than ‘how dare you do a calculus.’

Grimes… AI?

GrimesAI: Maybe I’m just an idiot but, people having babies (ie. creating new consciousness) seems maybe kind of relevant to the ai alignment problem? If you can’t anticipate or control the values of your own child how could you hope to control ai? Maybe it’s too much a stretch but...

Other People Are Not As Worried About AI Killing Everyone

I wouldn’t generally pick on people like this, but the doubling down is too perfect.

Melinda Chu: Why do some people listen to him? He’s already proven to be bad at predicting the future when he puts his own money on it. e/acc 🚀✨💫

manifold.markets,🚫 AI Doomers

Scott Alexander: Hi, you're pointing to a column on Referrals, which has nothing to do with predictive accuracy.
You want the Top Traders leaderboard, currently led by Marcus Abramovitch, an effective altruist who says in his profile that "AI currently poses the most existential risk".
alex: I love me a Yud dunk but that's not what that column means. It's saying that the users Yud has referred have gained minimal profit. This is an indictment of his followers, but not the man himself (directly)
Eliezer Yudkowsky: Note also that my main referral link of late was an LK99 market post where I dared people who believed YES to come in and bet against me. If I'd realized that my referred-user profits were being counted, I might have had a reputational disincentive to do that!
Melinda: You’re all wrong anyways.

Rolling Stone’s Lorena O’Neil says: “The problems with AI aren’t hypothetical. They don’t just exist in some SkyNet-controlled, ‘Matrix’ version of the future. The problems with it are already here. Meet the women who tried to warn us about AI.”

Thus we hear the bold tale of, yep, Timnit Gebru, and the women who were the ones who warned us about the true racist and sexist dangers of AI before it was too late, dastardly men only started warning after white men created these just awful systems.

How awful? Existentially awful, you see, there are no words such folks will let be.

And when [Hinton] was asked about that in a recent interview with CNN’s Jake Tapper, he said Gebru’s ideas “aren’t as existentially serious as the idea of these things getting more intelligent than us and taking over.” Of course, nobody wants these things to take over. But the impact on real people, the exacerbation of racism and sexism? That is an existential concern.
…
In other words, Hinton maintains that he’s more concerned about his hypothetical than the present reality.

It’s one thing to (culturally?!?) appropriate the word safety. It’s another to attempt to steal the word existential. Then again, this is what such folks actually think life is about. They do not simply talk the talk, they walk the walk.

Which is why, as much as I really wish I could fully ignore such articles, I did appreciate the straightforward and refreshing honesty of ‘that’s in the future and thus we do not care’ attitude towards existential risks. It is great to have someone skip the usual gaslighting and rationalizations, look you straight in the eye and say essentially: I. Don’t. Care.

It was framed as ‘look at this horrible person’ but I am confident that honestly reflects the worldview of the author, that anyone caring about such things is bad and should feel bad and we should heap shame upon them.

The claims regarding AI bias and harm site all the usual suspects. I could not find any new evidence or examples.

The article also profiles a few other women who take a similar position to Gebru. The requested intervention seemed to mostly be government restrictions on use of AI.

Here is an interesting example of bounded distrust that tells us we are in Rolling Stone, where full bounded distrust rules do not apply:

Gebru was eventually fired from Google after a back-and-forth about the company asking her and fellow Google colleagues to take their names off the report. (Google has a different account of what happened — we’ll get into the whole back-and-forth later.)

My understanding is that The New York Times would have required at minimum for this to start ‘Gebru says that she was.’ You can imply causality and leave out important details. What you can’t do is claim one side of a factual dispute as fact without other evidence.

Washington Post’s Parmy Olsen complains There’s Too Much Money Going to AI Doomers, opening with the argument that in the Industrial Revolution we shouldn’t have spent a lot of money ensuring the machines did not rise up against us, because in hindsight they did not do that. No further argument is offered that we should be unconcerned with extinction risks from AI, only that there are other risks that deserve more funding, with a focus on image models doing things like sexualizing or altering the racial presentation of pictures.

Once again, I point out that this is only a zero-sum fight over funds because certain people choose to frame it as that. This is private money trying to prevent large harms that would otherwise go to other causes entirely, not a fight over a fixed pool. If you think mundane AI harm mitigation deserves more funding? Great, I have no problem with that, go get yourself some funding. There are plenty of other worthy causes that deserve more funding, and less worthy causes that have too much funding.

Even in the silly metaphor of the industrial revolution, suppose someone had decided to spend some effort guarding against a machine uprising. I don’t see why they would have, that would not have been a coherent or reasonable thing to do and that does not require hindsight, but I don’t see how such efforts would have taken away from or interfered with things like improving working conditions?

The Lighter Side

Words of wisdom all around.

Roon:: there's something at the core of the American ideology that's only captured in happy go lucky 80s movies, which is that the noble pursuit of fun is probably one of the most productive forces in the universe and works way better than grinding away dutifully on things you hate.
adhd schizophrenic novelty seeking culture that leads to so many outlier success stories very little of the best of mankind is accomplished in pursuit of legible incentives but just the joy of ones hands and a fundamental agency.
rationalists love legible incentives because it helps them feel in control of the demystified material world. but the reality is you have no idea why elon musk loves the letter X so much and is willing to rearrange the universe into endless tiles of letter X.
Daniel Eth: New X-risk just dropped.

Let’s take a break, then.

Jack Clark (Anthropic): Time to relax from my stressful dayjob by playing a game where you are in a race against time to outwit a psychopathic AI system.
(Am playing the System Shock remake. It's great!)