This is actually a good critique: "Hollis is attempting to split the baby here. The first half of college is about basic skills, so we can go ahead and automate that, and then the second half is about something else, which has to be provided by places and people with prestige. Curious." This is the conversation we should be having in higher ed.
I 80% agree with you re Gell-Mann Amnesia being an old problem. I think it's maybe slightly worse because the AI can sound more persuasive and explain itself in a way existing sources can't always. E.g. If I see something in an article that doesn't make sense I might be prompted to look it up elsewhere and find a contradiction. But if I ask an LLM it can hallucinate a plausible explanation for the wrong fact.
Generate a new connection between two unrelated concepts that yields a discovery of value.
I've had it suggest we use tidal kite turbines as dual-use quantum channels to refresh encryption keys. Another was to bio-engineer mangrove into biologically symbiotic self-powering desalinization plants. "Modules the size of a standard photovoltaic panel (~1 m²) could yield ≈ 20 ℓ d-¹ of potable water plus trickle-charge a battery for LED lighting."
The ideas are probably bullshit, I can't evaluate them, but you get some interesting stuff. There seems to be a decent amount "Step 2, ???, Step 3 Profit".
I tentatively think the quantum key idea is bullshit: natural randomness is not actually hard to find, we just don't have much incentive to do so right now. But yes, super cool - I'm surprised the random seed moves it so much. Absent the seed are the ideas more predictable?
As a Windsurf user I am conflicted. For once, it would be very nice to pay a single subscription for both my AI IDE/Copilot thingy and personal AI but on the other hand I am finding Claude works consistently better in Windsurf than ChatGPT or even Gemini
Re web search, I had an odd interaction a couple of days ago where Claude claimed not to have that capability, and stuck to its guns when pressed on the matter.
I've had various models do that at times. My best guess is that something goes wrong on the backend, the model notices that it didn't work, and then it remains convinced by that fact and subsequent tokens in a downward spiral. I bet the initial cause could even be a bad roll of the dice on the initial tokens (i.e., 1% of the time my response starts with "Sorry..."), which then gets reinforced.
I watched the movie Holland, and afterward wanted some hot takes. So I googled “WTF is this Holland Movie” and the google description snarked at me, describing a nonexistent movie where Tom Holland dies on the set of Spider-Man 4 and returns as a ghost to help a woman retrieve his body from a sadistic cult of Sony execs.
"A Unity report found 96% of surveyed studios are using AI tools in select workflows, with 79% of developers now feeling positive about AI's impact in game development – a notable shift from earlier skepticism. A GDC survey corroborated this, finding that one-third of developers personally use generative AI (GenAI) tools, and 52% work at companies that have implemented them. Efficiency appears to be a major motivator, with 45% of developers citing the use of efficiency tools (including AI) as a primary strategy for success in the current climate."
- My periodic Gemini game industry roundup report.
Zvi, I've been reading your newsletter for a few months now and I'm confused about what you want. Tell me if I'm describing your positions incorrectly:
- We can't trust tech company CEOs with control over AGI; they will become tyrants.
- We can't trust the masses with open-source AGI; someone will use it to destroy everything.
- We can't trust the government; they're not competent to understand AI and they're not going to get competent.
- Under no circumstances can we trust the AIs themselves. If we give the AIs an ethical system that they will stand up for and defend, humans have lost control.
Certainly power corrupts and we need checks in place to prevent tyranny. But whether you centralize power or spread it out, power has to be distributed somehow. It seems like you are fighting against every possible outcome. Can you describe a distribution of power that you would consider trustworthy and good?
I think the idea is that you want the masses supervising the tech companies and the government, and the government passing laws to make that easy to do. You don't want excess power in the hands of individuals without supervision, which includes both individual members of the general public and the tech company CEOs.
But yeah, these questions are hard, and they're one reason that rapidly increasing the total amount of power available is pretty dangerous, especially when it's happening faster than systems can adjust.
"The key insight is that property rights are always and everywhere unstable."
This is true for all rights, and to me this is the long run danger of AI. What if a large number of people become unable to contribute anything useful to the world? UBI doesn't solve the problem because of this unstable-rights thing. If there is this large, permanent, expensive group of people with no source of power, I don't think the right to a UBI will be a stable one.
Alignment, I feel like we are doing well, like we are generally on track to solve that one. At least with the LLM-based paradigm of AI. If something really novel happens, who knows.
I would love to see a scatter plot of real world software engineering "task complexity" vs. model performance (ideally for multiple models).
My perspective is they probably are and should be quite good at building low-entropy full-stack applications because these require verbal / conceptual reasoning and also solving algorithmic contest problems because these have minimal context and have been extensively trained on.
However, it seems like they are not as good as 99th / 99.9th percentile software engineers at the types of problems 99th/99.9th percentile software engineers work on.
> The other question is whether skipping the intellectual exercises means you lose or fail to develop skills that remain relevant. I’m not sure. My guess is this is like most educational interactions with AI - you can learn if you want to learn, and you avoid learning if you want to avoid learning.
I'm guessing that we'll see a bimodal skill distribution develop in the current generation of junior software engineers.
Most are going to use the AI to avoid understanding how the code works. They will vibe code their way through, hack it until it works, and move on.
A much smaller group are using the AI to gain a much better understanding of how the code works. Every single time you notice something confusing, you can now chat about it with an expert instantly, without using up any of the time and attention of your coworkers.
This is an amplification of an existing trend. Everyone has worked with software devs who can barely code unless they can find a nearly-complete example on Stack Overflow or elsewhere that they can adapt. At the same time, tech industry compensation is bimodal for a reason. There will be huge alpha in being in the latter group, and using AI to level up your expertise, rather than being a slightly better natural language interface to a coding model.
I don't quite agree with the "compress the middle" of skill distribution argument.
For the task where AI is capable, it squeeze the whole bottom 99%, and even more in the future.
One example, I've friends working in Sales. They used to have a team of experts to help them answer technical questions. Now their company implemented a Claude based internal wiki and they never ask those people questions anymore, they just need them in customer facing meetings.
This is actually a good critique: "Hollis is attempting to split the baby here. The first half of college is about basic skills, so we can go ahead and automate that, and then the second half is about something else, which has to be provided by places and people with prestige. Curious." This is the conversation we should be having in higher ed.
I 80% agree with you re Gell-Mann Amnesia being an old problem. I think it's maybe slightly worse because the AI can sound more persuasive and explain itself in a way existing sources can't always. E.g. If I see something in an article that doesn't make sense I might be prompted to look it up elsewhere and find a contradiction. But if I ask an LLM it can hallucinate a plausible explanation for the wrong fact.
Here's a fun prompt:
Random seed [some GUID]
Generate a new connection between two unrelated concepts that yields a discovery of value.
I've had it suggest we use tidal kite turbines as dual-use quantum channels to refresh encryption keys. Another was to bio-engineer mangrove into biologically symbiotic self-powering desalinization plants. "Modules the size of a standard photovoltaic panel (~1 m²) could yield ≈ 20 ℓ d-¹ of potable water plus trickle-charge a battery for LED lighting."
The ideas are probably bullshit, I can't evaluate them, but you get some interesting stuff. There seems to be a decent amount "Step 2, ???, Step 3 Profit".
I tentatively think the quantum key idea is bullshit: natural randomness is not actually hard to find, we just don't have much incentive to do so right now. But yes, super cool - I'm surprised the random seed moves it so much. Absent the seed are the ideas more predictable?
As a Windsurf user I am conflicted. For once, it would be very nice to pay a single subscription for both my AI IDE/Copilot thingy and personal AI but on the other hand I am finding Claude works consistently better in Windsurf than ChatGPT or even Gemini
Don't suppose we can make Anthropic a pitch?
Re web search, I had an odd interaction a couple of days ago where Claude claimed not to have that capability, and stuck to its guns when pressed on the matter.
That's an odd behavior for sure. You confirmed that the web search toggle was set to "on"?
I didn’t know there was one (and just checked on the app and can’t find it!) and it had definitely been on the last time I used it…
Maybe you got loadbalanced over to an instance without web use? It's still only available in the US, Brazil, and Japan.
I've had various models do that at times. My best guess is that something goes wrong on the backend, the model notices that it didn't work, and then it remains convinced by that fact and subsequent tokens in a downward spiral. I bet the initial cause could even be a bad roll of the dice on the initial tokens (i.e., 1% of the time my response starts with "Sorry..."), which then gets reinforced.
Clocking in at a cool two hours and eleven minutes, here is the "Full Cast" podcast conversion of this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-113-the-o3-era-begins?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
I watched the movie Holland, and afterward wanted some hot takes. So I googled “WTF is this Holland Movie” and the google description snarked at me, describing a nonexistent movie where Tom Holland dies on the set of Spider-Man 4 and returns as a ghost to help a woman retrieve his body from a sadistic cult of Sony execs.
Game industry attitude is starting to shift:
"A Unity report found 96% of surveyed studios are using AI tools in select workflows, with 79% of developers now feeling positive about AI's impact in game development – a notable shift from earlier skepticism. A GDC survey corroborated this, finding that one-third of developers personally use generative AI (GenAI) tools, and 52% work at companies that have implemented them. Efficiency appears to be a major motivator, with 45% of developers citing the use of efficiency tools (including AI) as a primary strategy for success in the current climate."
- My periodic Gemini game industry roundup report.
96% seems high.
https://unity.com/blog/2025-unity-gaming-report-launch
Zvi, I've been reading your newsletter for a few months now and I'm confused about what you want. Tell me if I'm describing your positions incorrectly:
- We can't trust tech company CEOs with control over AGI; they will become tyrants.
- We can't trust the masses with open-source AGI; someone will use it to destroy everything.
- We can't trust the government; they're not competent to understand AI and they're not going to get competent.
- Under no circumstances can we trust the AIs themselves. If we give the AIs an ethical system that they will stand up for and defend, humans have lost control.
Certainly power corrupts and we need checks in place to prevent tyranny. But whether you centralize power or spread it out, power has to be distributed somehow. It seems like you are fighting against every possible outcome. Can you describe a distribution of power that you would consider trustworthy and good?
I think the idea is that you want the masses supervising the tech companies and the government, and the government passing laws to make that easy to do. You don't want excess power in the hands of individuals without supervision, which includes both individual members of the general public and the tech company CEOs.
But yeah, these questions are hard, and they're one reason that rapidly increasing the total amount of power available is pretty dangerous, especially when it's happening faster than systems can adjust.
"The key insight is that property rights are always and everywhere unstable."
This is true for all rights, and to me this is the long run danger of AI. What if a large number of people become unable to contribute anything useful to the world? UBI doesn't solve the problem because of this unstable-rights thing. If there is this large, permanent, expensive group of people with no source of power, I don't think the right to a UBI will be a stable one.
Alignment, I feel like we are doing well, like we are generally on track to solve that one. At least with the LLM-based paradigm of AI. If something really novel happens, who knows.
I would love to see a scatter plot of real world software engineering "task complexity" vs. model performance (ideally for multiple models).
My perspective is they probably are and should be quite good at building low-entropy full-stack applications because these require verbal / conceptual reasoning and also solving algorithmic contest problems because these have minimal context and have been extensively trained on.
However, it seems like they are not as good as 99th / 99.9th percentile software engineers at the types of problems 99th/99.9th percentile software engineers work on.
> The other question is whether skipping the intellectual exercises means you lose or fail to develop skills that remain relevant. I’m not sure. My guess is this is like most educational interactions with AI - you can learn if you want to learn, and you avoid learning if you want to avoid learning.
I'm guessing that we'll see a bimodal skill distribution develop in the current generation of junior software engineers.
Most are going to use the AI to avoid understanding how the code works. They will vibe code their way through, hack it until it works, and move on.
A much smaller group are using the AI to gain a much better understanding of how the code works. Every single time you notice something confusing, you can now chat about it with an expert instantly, without using up any of the time and attention of your coworkers.
This is an amplification of an existing trend. Everyone has worked with software devs who can barely code unless they can find a nearly-complete example on Stack Overflow or elsewhere that they can adapt. At the same time, tech industry compensation is bimodal for a reason. There will be huge alpha in being in the latter group, and using AI to level up your expertise, rather than being a slightly better natural language interface to a coding model.
> to have your range include two years in the past seems rather nutty
I'm confused, unless you missed that this prediction was made in 2019?
Perhaps this overlaps with previous posts, but could you do a user's guide to prompt engineering?
The Anthropic prompt engineering docs are pretty good: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
I don't quite agree with the "compress the middle" of skill distribution argument.
For the task where AI is capable, it squeeze the whole bottom 99%, and even more in the future.
One example, I've friends working in Sales. They used to have a team of experts to help them answer technical questions. Now their company implemented a Claude based internal wiki and they never ask those people questions anymore, they just need them in customer facing meetings.
"The danger comes when AI answers become social weapons, ‘well Grok said [X] so I’m right,’ "
I'm somewhat amused. I sometimes quote e.g. Google AI summary text in a comment, and put in
<aiWarning>
the ai generated text
</aiWarning>
around it. I suppose I could explicitly add BelieveAtYourOwnRisk... :-)