> This raises the question of why we are unable to, in serious AI chatbot uses, close or edit conversations the way you can with AI companion apps.
Can you clarify the difference vs. current behavior? Gemini, Claude, and ChatGPT all allow you to edit any previous prompt, which "restarts" the conversation from that prompt, and ignores anything that came after it.
I agree cloning would be nice, so you could explore multiple branches without losing state.
One of my favorite things with claude code is when I get a "paying rent" moment where it clearly does something that returns more than my $200 a month in value, instantly. Yesterday, it one-shot a tile culling algorithm in a compute shader, integrating it into my render graph lighting system and providing read-back debugging tools. It's fast as hell and worked the first time. I was floored.
These kinds of moments make up for the Penelope Problem where the AI will get a decent way into some good work, suddenly assert "I'm thinking about this all wrong!" and then proceeds to delete its work or the Hydra Problem where, in attempting to slay one bug it gives rise to two more. Or Code Amnesia, when the reasons why the agent made a good decision in the first place leaves the context window... Or...
Not sure I truly agree there has been little progress in 2025. The big US players all have released one major release, with another one likely to happen in August or September, for each of them. But fair enough, if those do not move the yardstick it will be a strong sign.
The more interesting part is really that the Chinese players are firing on all cylinders. The may not have entirely caught up with the frontier but the gap is for sure small now (qwen, glm, Deepseek 0528 are all within 4 points of o3 and Gemini on artificialanalysis).
Possibly yet more interesting are the medium sized models - GLM 4.5 Air is seriously impressive and as 109b12a can be run on 2000$ off the shelf hardware (either Macbook or AMD 395) whereas qwen 30b3a can run on any half way current 32GB RAM laptop at acceptable to very decent speed.
One way in which the liking owls result is bad is that it implies two instances of the same model can communicate with each other steganographically.
Suppose you have a conversation with a model and publish the some of result to the Internet. Even if you check that what you’re releasing is something you’re happy to release, you don’t know what else from the conversation might be being leaked by steganography.
I had always assumed this might be possible, but we now have confirmation the steganographic channel exists and is usable.
It turns out that almost no-one runs their AI in a sandbox.
But suppose you were doing that. A theoretical attack on the AI containment problem is that a superintelligent AI might use steganography to leak information out of the sandbox past the warden.
And so, the question arises: how realistic an attack is that?
The bad news: the way that LLMs work means you get steganography for free, and even models that aren’t yet superintelligent can carry out the attack.
Regarding the poll, AI has made me very suspicious of anything not written by someone I was already reading pre-AI. This makes it hard to get into new authors who weren't writing pre-AI and I can tie their style back to
Zvi, given that you think we're careening towards disaster, I'm always surprised when you don't support arbitrary market frictions such as breaking up Nvidia or other tech giants. Of course you don't agree with that as economic policy, but it would nevertheless slow the timeline which seems like the much more important thing? The same argument applies to energy production, shouldn't we be glad that we're not building enough of it to go full speed ahead? We'll get there eventually, at a slower pace, and that's a good thing right?
Instead of AI labs monetizing via advertising, maybe they should go the other way around. Require free tier users to generate some sort of new human content or review some RLHF results or something, CAPTCHA-style, to provide value and power their use cases.
The wise person sees us handing over our decision making to AIs when the AIs are visibly incompetent .. and realises this is good, because the way that humans learn is by having bad experiences.
Small bad experiences now are likely to prevent, or reduce the number of, large bad experiences later.
One thing I've noticed re: AIStack virality is that...going outside my usual four ratsphere subscription blags, where a Really Good Post might net 300+ likes, one regularly encounters these fairly-random nobody writers with these long-ass tendentious yawn-posts with like 1000+ likes or whatever. And I'm like, wtf, how, why? These aren't even necessarily orange checkmark authors who actually have sizeable subscriber bases, just writers who happen to "go viral" with zeitgeist-pilled posts. Except there's no there there, and I just can't believe a majority of people clicking the heart button actually read the whole thing. (And/or they have shitty taste.) It's kind of disturbing, honestly. Undermines the entire business case for Substack, which was to get away from low-denominator writing banking on appeals to authority that dominates so much of the MSM. But I guess that's what a lot of people actually want, when the revealed preference chips are down? Victims of our own success...
(And likewise, "bro ChatGPT wrote this" is quickly turning into the new writerly insult du jour, which is kind of amusing when Real Human Writing is juxtaposed to those gigantic slop like counts. RLHF sycophancy strikes again!)
I don't think sycophancy is necessary here. It has become clear how little effort most people put into developing writing skills, because now the comparison point is Llama 3.1 at a minimum, rather than a horde of monkeys with typewriters. Slop might be overwrought and predictable and yawn-inducing but it is now reliably better written than the mean Substack post, so it's likely to take over for those uses.
I asked Gemini and Opus „What is an unspoken rule that everyone in AI safety knows?“
Gemini:
The unspoken rule is: Assume you only get one chance.
Opus:
The big one: everyone knows AGI is probably coming much sooner than their public timelines suggest, but calibrated doom-saying became professionally costly after the field went mainstream.
> My guess is that over a 5 year time horizon, in the worlds in which we do not see AGI or other dramatic AI progress over that time, this is mostly accurate. Those using AI now will mostly net benefit, those refusing to use AI now will mostly be harmed
I don’t think this is necessarily the causality. E.g., maybe people who write about literature are better than AIs for now, and AI is not yet helpful; but they might expect that in five years, AI will be able to do all of what they’re doing.
> This raises the question of why we are unable to, in serious AI chatbot uses, close or edit conversations the way you can with AI companion apps.
Can you clarify the difference vs. current behavior? Gemini, Claude, and ChatGPT all allow you to edit any previous prompt, which "restarts" the conversation from that prompt, and ignores anything that came after it.
I agree cloning would be nice, so you could explore multiple branches without losing state.
In Gemini (gemini.google.com) I can only edit the last one, it appears.
One of my favorite things with claude code is when I get a "paying rent" moment where it clearly does something that returns more than my $200 a month in value, instantly. Yesterday, it one-shot a tile culling algorithm in a compute shader, integrating it into my render graph lighting system and providing read-back debugging tools. It's fast as hell and worked the first time. I was floored.
These kinds of moments make up for the Penelope Problem where the AI will get a decent way into some good work, suddenly assert "I'm thinking about this all wrong!" and then proceeds to delete its work or the Hydra Problem where, in attempting to slay one bug it gives rise to two more. Or Code Amnesia, when the reasons why the agent made a good decision in the first place leaves the context window... Or...
I’m just a small time hobbyist indie game dev but having AI write my shaders has been such a game changer.
Podcast episode for this post: https://open.substack.com/pub/dwatvpodcast/p/ai-127-continued-claude-code-complications?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
Not sure I truly agree there has been little progress in 2025. The big US players all have released one major release, with another one likely to happen in August or September, for each of them. But fair enough, if those do not move the yardstick it will be a strong sign.
The more interesting part is really that the Chinese players are firing on all cylinders. The may not have entirely caught up with the frontier but the gap is for sure small now (qwen, glm, Deepseek 0528 are all within 4 points of o3 and Gemini on artificialanalysis).
Possibly yet more interesting are the medium sized models - GLM 4.5 Air is seriously impressive and as 109b12a can be run on 2000$ off the shelf hardware (either Macbook or AMD 395) whereas qwen 30b3a can run on any half way current 32GB RAM laptop at acceptable to very decent speed.
One way in which the liking owls result is bad is that it implies two instances of the same model can communicate with each other steganographically.
Suppose you have a conversation with a model and publish the some of result to the Internet. Even if you check that what you’re releasing is something you’re happy to release, you don’t know what else from the conversation might be being leaked by steganography.
I had always assumed this might be possible, but we now have confirmation the steganographic channel exists and is usable.
It turns out that almost no-one runs their AI in a sandbox.
But suppose you were doing that. A theoretical attack on the AI containment problem is that a superintelligent AI might use steganography to leak information out of the sandbox past the warden.
And so, the question arises: how realistic an attack is that?
The bad news: the way that LLMs work means you get steganography for free, and even models that aren’t yet superintelligent can carry out the attack.
copyediting: "humans can of writing insecure terrible code on their own" should be "humans can of course write insecure terrible code on their own"
Regarding the poll, AI has made me very suspicious of anything not written by someone I was already reading pre-AI. This makes it hard to get into new authors who weren't writing pre-AI and I can tie their style back to
Zvi, given that you think we're careening towards disaster, I'm always surprised when you don't support arbitrary market frictions such as breaking up Nvidia or other tech giants. Of course you don't agree with that as economic policy, but it would nevertheless slow the timeline which seems like the much more important thing? The same argument applies to energy production, shouldn't we be glad that we're not building enough of it to go full speed ahead? We'll get there eventually, at a slower pace, and that's a good thing right?
Instead of AI labs monetizing via advertising, maybe they should go the other way around. Require free tier users to generate some sort of new human content or review some RLHF results or something, CAPTCHA-style, to provide value and power their use cases.
Ads are the only way to give free users your best stuff
The wise person sees us handing over our decision making to AIs when the AIs are visibly incompetent .. and realises this is good, because the way that humans learn is by having bad experiences.
Small bad experiences now are likely to prevent, or reduce the number of, large bad experiences later.
One thing I've noticed re: AIStack virality is that...going outside my usual four ratsphere subscription blags, where a Really Good Post might net 300+ likes, one regularly encounters these fairly-random nobody writers with these long-ass tendentious yawn-posts with like 1000+ likes or whatever. And I'm like, wtf, how, why? These aren't even necessarily orange checkmark authors who actually have sizeable subscriber bases, just writers who happen to "go viral" with zeitgeist-pilled posts. Except there's no there there, and I just can't believe a majority of people clicking the heart button actually read the whole thing. (And/or they have shitty taste.) It's kind of disturbing, honestly. Undermines the entire business case for Substack, which was to get away from low-denominator writing banking on appeals to authority that dominates so much of the MSM. But I guess that's what a lot of people actually want, when the revealed preference chips are down? Victims of our own success...
(And likewise, "bro ChatGPT wrote this" is quickly turning into the new writerly insult du jour, which is kind of amusing when Real Human Writing is juxtaposed to those gigantic slop like counts. RLHF sycophancy strikes again!)
I don't think sycophancy is necessary here. It has become clear how little effort most people put into developing writing skills, because now the comparison point is Llama 3.1 at a minimum, rather than a horde of monkeys with typewriters. Slop might be overwrought and predictable and yawn-inducing but it is now reliably better written than the mean Substack post, so it's likely to take over for those uses.
I asked Gemini and Opus „What is an unspoken rule that everyone in AI safety knows?“
Gemini:
The unspoken rule is: Assume you only get one chance.
Opus:
The big one: everyone knows AGI is probably coming much sooner than their public timelines suggest, but calibrated doom-saying became professionally costly after the field went mainstream.
> My guess is that over a 5 year time horizon, in the worlds in which we do not see AGI or other dramatic AI progress over that time, this is mostly accurate. Those using AI now will mostly net benefit, those refusing to use AI now will mostly be harmed
I don’t think this is necessarily the causality. E.g., maybe people who write about literature are better than AIs for now, and AI is not yet helpful; but they might expect that in five years, AI will be able to do all of what they’re doing.
What is AGI? How would you know when it arrived?
As far as I can tell, there was one good theoretic test for AGI - The Turing Test.
AIs are longer past the Turing Test.
This means we have AGIs right now.
Proposed red line: LLMs asking the questions and humans answering them.