28 Comments

Maybe I should start a Substack. I can title it "I Told You So, You Fucking Fools" 😂

Expand full comment

Perhaps a relevant clip to include a pointer to: https://www.youtube.com/watch?v=mDLS12_a-fk ?

Expand full comment

"And in general, there’s what looks to me like a bunch of other confusions around this supposed need for an environment, where no you can simulate that thing fine if you want to."

Can you? This maybe seems like where our disconnect is. When we fly to Mars, we create a flight plan and then we do lots of micro adjustments to stay the course. This is much easier than just getting the calculation perfect on the first shot, right? The further out in time you want to go, the bigger this disconnect is, just as a simple consequence of chaos in dynamical systems. And as soon as you combine this with entities who know they're being observed, you get adversarial dynamics and now there's not even a prize for being close, because every correlation that can easily be reversed shall be.

I agree that if environments could be reliably simulated to a sufficient level of detail there's lots of existential risk. But I've never understood where the "we can simulate that thing fine" attitude comes from, when complexity science and simple observation seem to both point to precisely the opposite.

(Fully prepared to take the L on bagels and croissants though. The honest truth is that my partner likes bagels and I like croissants so I've never taste tested both from the same source.)

Expand full comment

Quick intuition pump: ...with astronauts who train mostly in flight simulators and have since the 60s. It's about whether you can create synthetic data and synthetic testing environments sufficiently well to use it as training data to create the minds in question, not that when you actually finally fly to Mars you don't need to make any adjustments. It's a very strange example. Also you can totally use self-play to train on adversarial dynamics, many such cases.

Expand full comment

Which AI systems do the best job of compressing books?

Expand full comment

This is why industrial processes have control systems for things like the setting of a valve to control flow. You might think you know exactly what the valve setting should be when the process starts but the uncertainties in the process as it runs require the valve to be adjusted, perhaps continuously, while the process is running.

Flying airplanes guiding boat/ships, driving a car are other examples. The valve control system can be very simple, with one sensor for level or temperature, or it can be very complex, because the process is complex. Flying an airplane or driving a car on the highway can be simple, or very complex and hard. depending on the actual environment.

Astronauts and operators of complex process equipment get training that includes upset training, where they have to react to normal change, to unanticipated events, and even to serious accidents. The people who train them cannot anticipate every possible occurrence, but have to provide enough examples and other training to ensure that the operators do the right thing when something bad happens. This requires judgement and intuition. And even then, the operators have been known to mis-diagnose the problems and do something else that turned out to be VERY bad.

I used to train operators for nuclear power plants. On real, operating reactors. Commercial plants use simulators, but the Navy uses the real thing. And you hope that when they are confronted with an "interesting event", they are knowledgeable enough and thoughtful enough, to be able to do the right thing.

Expand full comment

I wonder when a section on "humans are dying to AI" becomes salient

Expand full comment

> At this point I expect the effective ‘hallucination’ rate for LLMs to be lower than that for humans, and for them to be more predictable and easier to spot (and to verify).

this is not my experience! I periodically ask chatgpt a question along the lines of "i am trying to remember an episode of a TV show in which these events occurred" or "i am trying to think of a magic card with these properties" and it has, in my experience, a 100% hit rate of producing hallucinations. now, these hallucinations are pretty easy to verify - naming a magic card that doesn't exist is pretty obvious - but it seriously reduces my confidence that any *other* factual information the LLM produces is correct, given it confidently spouts total nonsense every time i ask.

i don't know what kind of questions you're asking LLMs but man my experience is grim

Expand full comment

Prompt engineering skill issue? You need it to vibe that it's not supposed to always know the answer and to admit it doesn't know if it doesn't know.

Expand full comment

Several of my standard test prompts are designed to trigger hallucinations.

The model gets points if it tells me, "Uh, I think you're wrong?" Or in my favorite real-world version, "Normally people calculate dx/dy here. Are you sure you weren't mistaken when you asked for dy/dx?"

Expand full comment

My system prompt for Claude includes the following lines: "If you're uncertain, say so: saying you don't know something is much preferable to hallucinating. Do not ever hallucinate! Every time you make a statement, carefully evaluate your confidence level in the statement, and if your confidence in its accuracy is less than 100%, include a confidence probability in parentheses. After every response, include a statement about your confidence in the accuracy of the response, and the most likely sources of error."

It's not a magic fix (clearly the confidence level estimates are subject to the same systemic issues as the original statements), but I've found that there is a clear and strong correlation between Claude giving low confidence percentages and it hallucinating.

Expand full comment

Shouldn't the default be that the AI admits if it doesn't know the answer, rather than the user having to elicit this behavior through prompt engineering?

Expand full comment

I think most humans have an awareness of their wrongness. Normally, if I sense myself wandering onto epistemically shaky ground, I start hedging ("if I remember correctly...", "I think I heard that...", "I'm not an expert but...") so nobody relies on what I say too much. I think this is a good and ethical thing to do.

Very few humans "hallucinate" in the LLM sense. If we met a person who confabulates elaborate, detailed false facts, and delivers them with utter confidence, we'd call them a compulsive liar—a severely mentally-ill person who is possibly dangerous to be around.

Expand full comment

How's the model supposed to know if you're asking for brainstorming ideas or facts? Note the OP's claim: "i am trying to think of a magic card with these properties"

Expand full comment

For the past several weeks, I've been pondering why there is such a disconnect between people's reactions to LLM hallucinations - whether hallucinations are frequent or not, and whether they are a problem or not. My current hypothesis is that Those Who Code are used to bugs and constant debugging, so spotting and fixing errors is not only expected but part of a normal process. Those who don't code, and especially those who deal with important facts where the accuracy of the facts is critical to the work at hand, find hallucinated information not merely useless but infuriating. Benedict Evans wrote a nice piece talking about how sometimes you just want basic, accurate information, and LLMs are mostly still very bad at finding and presenting such information (unless, I would add, it's a fact that has been discussed extensively in the model's training set, in which case the next token prediction will supply the correct fact - not because the model found the fact and determined its accuracy). The models are getting better, but there is still a lot of work to be done, and not just on the prompt engineering side. It's easy to dismiss this criticism by saying LLMs are not databases, but *intelligent* models can and should use the information in databases properly and accurately. What's especially dangerous is when a model performs somewhat interesting and complex analysis based on facts that are just wrong. That is why students (and lawyers!) continue to get into trouble submitting linguistically persuasive arguments based on non-existent facts and authority.

I just checked this with a simple series of questions about sports results. In the past, I've run similar queries regarding stock market info., industrial production statistics, case law, etc. The current(ish - I don't have access to Deep Research) models accurately identify the Super Bowl winner that scored the most points (the 49ers in 1990), but then in follow up questions about the next highest scorer in a Super Bowl starts hallucinating like crazy, with one saying it was the New England Patriots with 53 points in 2004 (wrong) and another saying the New Orleans Saints with 47 points in 2010 (also wrong). Prompt engineering only helps when the answer is known and is spoon-fed in the query.

Expand full comment

Zvi is raising a crucial point about the potential gutting of AI expertise within the US government. This highlights why the role of independent AI safety institutes is critical. As government commitment to AI safety wavers or even actively diminishes, these independent bodies are essential to keep the flame alive, to continue crucial research, and to develop the frameworks we will need when public salience reaches the 'time to do something, but what?' point.

It's of course completely net negative that frontier models won't be checked, by experts, against classified WMD and other secret information (zero day exploits, for example). But this is not an all is lost moment. Not yet.

The model to have here is something similar to the Mont Pelerin Society. In the post-war era of Keynesian dominance, they tirelessly championed free market ideas, even when those ideas were deeply out of favor politically. They weren't in government, but they laid the intellectual groundwork that became influential decades later when the political climate shifted.

Independent AI safety institutes can serve a similar function now. They can continue vital research, unburdened by political pressures, generate and refine ideas, and build a body of expertise. They can train researchers, build networks, and ensure that a critical mass of knowledge exists, ready to be deployed when needed. Ready to be hired into those roles the administration is right now dropping.

As you rightly point out, "public salience is still coming." When it does hit a critical point – when the wider public and political establishment finally grasp the gravity of AI safety – we need to have ideas ready to go. What we want to have is someone other than ideologues or crackpots ready to come forward with genuine, actionable solutions when societies find themselves in dire need.

Expand full comment

It's easy to dunk on Blumenthal, but if you do want to interest policymakers in AI issues, then the two plausible hooks are jobs or competition with China. And if you want them in the right frame of mind, the jobs hook is a lot more promising.

Given that AGI + robotics also pushes human wages below subsistence levels, "jobs" risk is a form of existential risk for the vast majority of humans, who need wages in order to eat.

Expand full comment

Blumenthal to be clear is unusually good on AI, overall, I'm actually a fan - it's just too good a moment not to reference.

Expand full comment

"slop is not bad"

I'm starting to notice it more, for example in Substack comments. Part of it might be paranoia.

I'm a big reader and long-time reply guy, so Cummins' observation resonates

Expand full comment

Slop where you don't want it is very bad. And yeah, one of the comments in your batch struck me as 'oh I think an AI wrote that, probably' when I saw it, but I just skipped it rather than spend time trying to be sure.

Expand full comment

Sorry, what do you mean by “comments in my batch”?

Expand full comment

Zvi, if hypothetically building rights were "do whatever you want UNLESS the government explicitly says you can't" and a new department was just formed. Call it "National zoning institute". And right now it's only that you send them your building plans before construction of anything over 50 stories.

Well, what is going to make it less likely there is a future housing shortage?

A. A new president finds a way to fire everyone in "national zoning institute"

B. Business as usual and you wait 20 years while "national zoning institute" keeps giving itself more and more power through federal rulemaking

In terms of "make the Singularity likely to happen while Musk and Vance are still alive", this was absolutely the right move.

In terms of "make Musk and Vance likely to live much longer after the Singularity is deep in the exponential growth phase", well.

All I am saying as live players they are playing to win. They just have a different view of the board state and a different tolerance for risk than you do.

Expand full comment

> Forget “tapestry” or “delve” these are the actual unique giveaway words for each model, relative to each other.

Goddamnit, these are almost every conjunctive phrase with even the smallest smidgen of style. Won’t be able to write anything soon without people going “tHiS iS Ai, dUrR…!!!”

Source: someone who was using “delve” since long before OpenAI’s pre-trainers made it a meme

Expand full comment

Couple of interesting AI things I saw this week:

1. A predictiction by the blog Gödel's Lost Letter and P=NP from way back in 2011 (https://rjlipton.com/2011/02/17/are-mathematicians-in-jeopardy/):

> February 22, 2026: Thomas J., the IBM automated mathematician, has just claimed to have solved the Riemann Hypothesis. Experts have started to look at the paper that Thomas J. has submitted to the math archive. The paper is over 100 pages, but is well written says one of the top number theorists in the world, Peter Smith. He goes on to explain that while the proof is not yet checked it contains a number of very interesting new ideas {\dots}

Seems like a pretty good guess in hindsight and reminded me of Andrej Karpathy tweet mentioned in the Grok-3 post:

> I like that the model *will* attempt to solve the Riemann hypothesis when asked to

2. Learned about the GAIA benchmark that faces off reasoning models against each other in complex tasks: https://bit.ly/41uxdIT

- Leader is Trase Agent, unfortunately no easy way to demo: https://www.trasesystems.com/multi-agent/

- Second place is h2oGPTe Agent, https://h2o.ai/platform/enterprise-h2ogpte/#AgenticAI, also closed

Expand full comment

Notion does the personal data integration quite well!

Expand full comment

Is there some particular reason you've been occasionally slipping into using the royal "we" lately? Upcoming challenge for the title of Rightful Caliph?

The road to ASI is paved with good evals: in addition to a strangely defective "fire alarm" wrt AI progress on red-sand-line benchmarks, it's also routinely depressing on the other end where some hacks put out an Official Paper Or Study(tm) that shows AI fails at x, y, z, and of course that either already got solved and/or it's testing an anachronism like GPT-2.5. Such things do seem to generate news and diffuse into popular consciousness. It's like the IN MICE(tm) of ML findings, except even less relevant.

Expand full comment

Nit about notkilleveryoneism:

Strictly speaking, an outcome where an/the ASI has a hobby keeping humans, even if the number is very small, doesn't precisely count as an extinction event (even if it is extinction-in-the-wild).

( I _think_ I remember one survey where the threshold was set at 5000 humans surviving, though my google-fu isn't good enough to find it. An ASI with a hobby of keeping humans could easily exceed this bound. )

I set a query to o3-mini-high (query would be in earnest, response is something of a joke):

Q: "What is the probability that more than 10% of the human population survives more than 10 years after an ASI is built?"

A: "Any numerical estimate here must be taken with extreme caution. That said, if I were pressed to offer a best‐guess number based on the range of expert opinions and the assumptions underlying AI alignment challenges, I’d roughly estimate about a 90% probability that more than 10% of the human population survives more than 10 years after an ASI is built.

This number implies a roughly 10% chance of an outcome so catastrophic that fewer than 10% survive—a figure that some researchers (like Toby Ord in discussions of existential risk) have loosely entertained when considering AI among other global risks. However, it’s important to stress that this estimate is highly speculative, relies on numerous uncertain factors (such as the nature of ASI development, safety measures, and response strategies), and reflects a broad averaging of very diverse opinions rather than hard data."

https://chatgpt.com/share/67b9336e-2a48-8006-a13d-e387a1281e99

Expand full comment