Don't Worry About the Vase

Comment removed

Expand full comment

Askwho Casts AI

https://open.substack.com/pub/dwatvpodcast/p/ai-88-thanks-for-the-memos

If you are looking for a high-quality audio conversion, I produce a podcast of Zvi's posts through ElevenLabs, with multiple voices to differentiate the different quoted sources:

Expand full comment

sean pan

Zvi, I have some involvement in the Texas bill. Can you message me? I may be able to help.

Expand full comment

Oct 31Edited

It's a laboratory of democracy case right. I feel like the most draconian AI blocking bill doesn't matter because there are 49 other US states, and Texas is not a center for AI startups. There are some in Texas but they can all leave for bay area or NYC, or just not deploy their products to Texas.

1047 was important because it takes time for startups to leave California and it delays the potential singularity or fizzle by the amount of time required to move.

For the laboratory of democracy model to work we need control groups. States like Texas who make using AI illegal, and then we can compare head to head the consequences. Will AI positive states have extreme economic growth, leading to AI banning states declining faster and worse than Detroit? Will they have mass unemployment from using AI in everything and a steady stream of incidents of AI failing and killing workers and customers?

I don't know but it seems like valuable information to try it out and find out.

Expand full comment

Reply (2)

sean pan

An AI genocide of humanity knows no borders.

Expand full comment

Nov 1Edited

If that's your threat model Texas is completely irrelevant. Probably in that case you should be for some group - probably the government - pushing forward as fast as possible to develop AI strong enough to be dangerous in an isolated lab. Prove it to everyone especially China that the threat exists. Texas is a waste of your time if this is what you genuinely believe.

California was a different case - 1047 or some other meaningful legislation that leads to a slowdown as AI labs leave the state or let China pull ahead - buys us a few extra months to live by your threat model.

Expand full comment

Jonathan Woodward

I'm not sure your model of AI risk matches Zvi's model of AI risk.

Expand full comment

Ian Crandell

An anecdote about your comment about how something that's so good for productivity being so slow to be adopted. I'm an avid AI user and work in a very quantitative, tech forward, and young department. Almost everyone that's adopted AI here has done it because I've walked them through using it for this or that problem. After, I dunno, a week of this kind of collaboration something suddenly clicks and they realize they love it. At that point they're off to the races and are way more than 25% more productive.

I mention the demographics because my department's profile is basically optimal for AI adoption. But here at least, adoption still moves at the speed of one person convincing another. When I talk to other departments about this, I often have conversations like: "I will gladly and patiently teach you to use this tool, and I can very credibly say that it will save you hours per week", "eh." People just be like that I guess.

Expand full comment

MichaeL Roe

I was recently in hospital, and basically everyone I interacted with checked that I was the right patient, and asked me to confirm that the relevant parts of my medical record in the computer were correct. Like, I was asked to confirm what medications I am on at least five times. I entirely understand why that’s the protocol, and I agree with doing that. Best one was them asking me to confirm stuff just before the intravenous sedation goes into my arm (like, this is the last possible moment you could ask me anything at all).

The whole system probably has a certain amount of redundancy against AI generated garbage in the online medical record.

(And I can merely guess at the medical negligence lawsuits that led to current protocols).

Expand full comment

{˳✦*𝓳𝓸𝔂//𝒔𝒐𝒓𝒓𝒐𝒘*✧˳¹⁷⁹¹}

Sure that might catch some of the clerical errors it'll make in the paperwork, but it won't have those kind of safeguards on the operating table when it gets there.

Expand full comment

Ultimately what matters is not whether risks are possible but what the odds are. As Zvi notes whisper hallucinations may not actually be worse than a physician dictating later and getting 2 patients confused.

Expand full comment

splendric the wise

“If AI is not a huge deal over the next 20 years, I presume either we collectively got together and banned it somehow, our else civilization collapsed for other reasons.”

I feel like it’s common for tech to have a lot of hype and even a lot of investment and then to end up not being a huge deal over the short-medium term.

Crypto, nuclear power, everything involving space, nanotech, VR, cloning/genetic engineering (until COVID vaccines), AI (for the entire history of AI research up to now)

Other stuff comes out and is genuinely really big for a while but then the world changes and in the end it’s a historical footnote. Mosquitoes develop resistance to DDT. Google Search gets overrun with SEO garbage. People get bored with the core gameplay loop for MMORPGs.

I can imagine AI going like that over the next 20 years. Maybe it needs more to keep scaling than what we can provide. Or maybe AI systems get kinda popular for a bit, but then fade when we end up being better at subverting them than securing them.

Expand full comment

Reply (2)

{˳✦*𝓳𝓸𝔂//𝒔𝒐𝒓𝒓𝒐𝒘*✧˳¹⁷⁹¹}

Praxis does take time for sure, VR's infancy was in the late 1970s and it's continuously suffered waves of being repeatedly overhyped (and fairly so). AI is going to be different. Even if capabilities don't develop any further beyond where they are today, it will have an outsized impact on society beyond any other technology before. Luckily the implementation efforts are still bound by human adoption, which does take time, but of course another big fear is that our rate-limiting step might soon be obviated.

Expand full comment

I think you can break your list and consider each category one by one. Ask yourself : how much money can I have made with this technology with the version developed or a reasonably improved version that was immediately possible?

Crypto- doesn't make net money

nuclear power- a very expensive way to boil water compared to coal or now cogen and solar

everything involving space - at $10,000 a kg to reach orbit how were you going to make money?

nanotech- without some way to mass produce nanotechnology this isn't making money. ICs are the kind we did get and they make lots

VR- headsets hurt your eyes. They make humans marginally more productive co-working in VR? Not a lot of money there.

cloning/genetic engineering (until COVID vaccines)- for the places this is legally allowed it makes money and is commonly used

AI (for the entire history of AI research up to now) - now stops in 2022. OpenAI is beginning to make money on gpt-4.

Please note by "money" I mean a shorthand for workers and resources. A command economy that didn't use money like the ussr would also find unprofitable technologies to be a waste.

Expand full comment

{˳✦*𝓳𝓸𝔂//𝒔𝒐𝒓𝒓𝒐𝒘*✧˳¹⁷⁹¹}

Great stuff as always Zvi. That YouGov poll from the Center for Youth and AI does smell a bit like opt-in online polling to me though. Many of your point still stand however.

Expand full comment

Victualis

1% probability of your device being taken over has been a thing since the days of the earliest browsers. Sometimes this goes down a lot (a critical image library vulnerability is patched), sometimes it goes up (a malicious actor uses a zero day while paying for ad coverage for a high profile keyword, or successfully commits a backdoor to a low level utility library and isn't discovered quickly). I don't think LLM prompt injection massively changes the attack surface over existing exploits by itself. Running a system in autonomous agent mode does change the risk because that allows multiple shots at the target.

Unfortunately, most people who use the Internet seem to have accepted that clicking on the occasional random link will let evildoers take over their machine and steal money from their bank account. This makes it less likely that the more flexible, adaptable kind of malware enabled by agents with a world model will get special consideration, even as over time the additional capabilities and potential for persistence make these systems harder to detect and defend against. Antivirus software that relies on pattern matching to trigger an alarm isn't going to help when the malware can adapt to the detector, dynamically work around it, or redirect the alarm.

That box meme? That's where we are headed, unless friction is added to the iterative agent loop.

Expand full comment

One way to make this specific use case work is reference sources and a whitelist of trusted websites. Ironically this is de facto a rollback to a prior era - basically your containerized AI agent isn't going to use anything but nonfiction with positive peer reviews from a set of websites that have reference books and journals and major news sources to do its assigned work. No random websites.

Not fundamentally different than going to the library and only reading the books they have there in nonfiction and the major newspapers. Just thousands of times faster.

Expand full comment

Mo Diddly

The timing on the AI memo is super weird, and frankly extremely stupid. What the Biden administration should do is say loud and clear that they are interested in the complete and total acceleration of all AI research. That way if Trump wins, he will immediately declare himself the AI safety president. If Harris wins, they can just change course immediately.

Expand full comment

https://open.substack.com/pub/dwatvpodcast/p/ai-88-thanks-for-the-memos

Nov 1Edited

The memo reads to me like "acceleration and fuck China but we are going to do this with so much bureaucracy that in practice nothing will be accomplished".

Basically "noop, cost hundreds of millions".

Expand full comment

Expand full comment

Podcast episode for this post

Expand full comment

Dustin

Google larger models are fantastic for one use case: video input! https://simonw.substack.com/p/video-scraping-using-google-gemini

Expand full comment

Ebenezer

https://www.amundi.com/usinvestors/Resources/Classic-Concepts/Crisis-Events-and-the-US-Stock-Market

>People toss off plausible-sounding stories about how, if X happens later, markets ought to behave like Y later; and then X happens; and then Y just doesn't happen. This happens ALL THE TIME. It happens to professional traders.

Eliezer is doing the thing where he speaks hyperconfidently with little to no supporting evidence.

A quick Google surfaced this report from professional traders:

"The US stock market has experienced its share of crisis events—from wars to political upsets, to many unforeseen human tragedies. The table below highlights 25 international crises that have occurred since 1940. During these events, the Dow Jones Industrial Average (DJIA) dropped by an average of about 6%. In all but four cases, the market returned to positive territory within six months of the end of each decline."

Here's another report: https://www.lpl.com/research/blog/middle-east-conflict-how-stocks-react-to-geopolitical-shock.html

I suspect Eliezer is falling prey to reporting bias. "Man bites dog" is more likely to make the news. "Catastrophe looms; stock prices rise" is more likely to make the annals of financial history.

[Note: I also disagree with Cowen for various reasons. I'm writing this because I'm much more worried about people reading these comments placing excessive trust in Eliezer.]

Expand full comment

Zvi Mowshowitz

The Cuban Missile Crisis was the example that I chose (one-shot!) when this came up previously, he's probably going with it because of that, and because it's the most clear parallel. If you have a political crisis of other kinds, and it goes badly, you're still around to deal with losing money, or to cash winning tickets, so it plays differently.

It's funny that the 'thing you're trusting' here when you trust Eliezer is... not to trust one's ability to predict reactions!

And the 6% number plus rebounds illustrates the point - if you know a 6% decline is coming at some point in the next 10 years, that is not very useful in terms of finding a good trade.

Expand full comment

Ebenezer

The Cuban Missile Crisis was the brink of nuclear war. There’s little point in selling assets to consume more if you’re worried about being annihilated tomorrow or the next day. Conceivably we could get a warning shot 6 months in advance for an AI catastrophe.

I agree that just knowing a 6% decline might happen at some point is not very useful.

My point was more of a local-validity one. I doubt that professional traders have all given up on predicting future events due to the difficulty of knowing how the market will respond.

Expand full comment

Andyman

My Mundane Utility of the day: we get health insurance through my wife's big company. They sent out their annual 35-page guide describing the different options. I uploaded the entire PDF into Claude, and prompted it to ask me questions (which it did) until it had enough info to make a recommendation. I was able to ask follow-up questions and it even ran simulations like, What if someone in my family has an expensive surgery next year?, or What if I decide to invest the maximum into an HSA?

Expand full comment

avalancheGenesis

Missed opportunity to use Lloyd's of London will be loaded when we go.

The deepfake report makes me wonder the same as Scott recently, of What's The Deal With Korea? Lot of crazy topline stats and reports, but I don't really understand what the ground reality causes are... (Beyond the fertility stuff, of course.)

Expand full comment

Lucas Wiman

Regardjng the section about software productivity and working code:

I'm a staff software engineer with 15 years of experience. I've been using Aider with Claude for the past ~6 weeks, and ChatGPT and Copilot for software since they came out. I also wrote some shell scripts to integrate the OpenAI API into my command line workflows and have been using them for ~1 year.

My experience is that LLMs often generate (almost) working code, but using them effectively requires the ability to read, evaluate, test and fix code quickly. These are activities that even many software developers are bad at. It also requires identifying when the LLM is going down a rabbit hole or confused and doing something else.

You've covered a lot of research showing that LLM tooling is more effective for novices than experienced developers, and I tend to agree that's overblown. If the task is "solve a relatively straightforward programming task that might take an experienced programmer an hour", I'd expect LLMs to raise the abilities of novices from "will never complete the task, or perhaps days or weeks of work" to "can complete the task in an hour some possibly high percent of the time". Huge improvement! The expert is starting from a baseline of extremely high productivity, so unless they get lucky and the LLM is able to basically do the task by itself, it can't improve productivity on the task that much.

But the thing is that developers face those sorts of tasks constantly, like 30+ hours per week, so sometimes solving them in 5 minutes and often solving them in 45 minutes is an enormous productivity booster.

The problems as I see them are:

1. It's easy to get stuck debugging code the LLM wrote. Debugging code not written by yourself in the past hour is several times harder than debugging your own code. LLM code can't be "trusted" to be even marginally correct, so the prior probability that any line of code is causing a bug is more diffuse than human-generated code. This must be carefully managed.

2. LLMs are much better at writing code than they are at fixing code. This means that getting the LLM to fix issues is often not tractable, though is worth attempting. Also the more you try, the worse it gets (see context window management below).

3. Copilot suggestions are often subtly wrong, and it often shows you quite long code blocks. I think this is largely a UX thing, where it doesn't have full context on what you're trying to do since the prompt is the code, not your explanation of what you want. Its next-token predictions are often quite good though. I use the accept-next-token hotkey almost exclusively and never accept code suggestions longer than a few lines.

4. You need lots of checks to make sure the LLM didn't break something already implemented in order to make cumulative progress. With Aider this happens quite frequently! This requires good software development practices like code linting, tests, static analysis, version control, etc. I disable Aider's functionality of auto-committing in Git, and use `git add -p` and `git checkout -p` to add/remove individual changes. This also helps to ensure I read all of the code that is generated.

5. Performance of Aider/Claude seems to degrade rapidly for longer contexts, even far below the context limit of the model. I think this is true in web chats as well, but it's less noticeable since with Aider, command output and multiple files can make the context orders of magnitude larger than your specific prompt. This is one reason I like the UX model Aider uses of adding only particular files to the chat. I often exit Aider and restart when it stops working very well. This also means it works far better for short, self-contained files and can be a challenge to use well on larger poorly factored codebases (I.e. most codebases).

Strategies I've found useful:

* Read every line of code that is generated.

* Use aider adversarially in writing test cases. Eg I might write out a method signature and a doc string, then have aider write tests for it, then have aider write the function, then edit the implementation, removing obvious bugs, run the test cases, manually fix them, then have aider write more tests against the new implementation (with a fresh context). Or I might write test cases I don't let aider see, and tell it about the failures. Etc.

* Lock in progress with frequent git commits. Be prepared to backtrack to an earlier state rather than trying to debug (`git reset ... --hard`)

* Don't use Aider in cases where it will do a bad job. This is subjective and I'm still working on my intuition here. Basically cases where it needs a lot of files in context to do the right thing, or where what I want is unclear or very complicated.

* Keep files small and atomic where possible.

* When it does something wrong, revert the change and start the context over again. I wish Aider had better UX for this (something like manual tree search, backtracking to earlier parts of a chat history). Cursor or Copilot Chat might have something. In practice, I quit the program and restart the chat.

* Switch UX modes when needed. I'll go to o1 to help plan out a project, ask GPT-4o or Claude for library recommendations or self-contained one-off methods, use aider or copilot depending on how much input I think the LLM needs from me or vice versa, etc.

Also noteworthy: I find that using LLMs often means the quality of my output is much better, even if I don't directly "get more done" in that session. The code ends up better documented, better tested, with a full command line interface or visualizations, etc. I think that probably has indirect effects on productivity, but most code has a short half life, so it might not matter all that much.

Expand full comment

Luke