Responding to the last comments on class discrimination: it made me finally understand why talking about IQ existing is this Verboten thing and somehow completely filtered out of mainstream society mostly everywhere and class discrimination is also why companies don't filter more strongly for IQ.
Every time I have seen the AI 2027 team discuss their timelines after releasing their scenario, they have added a delay (2027 -> 2028 -> 2029 -> 2030). I respect them for being honest and candid, and it's logical to update as evidence comes along, but it doesn't really give me confidence that they are actually able to effectively predict the future if their prediction is completely non-robust to new events.
They seem pretty consistent, an adjustment of a few years on a mode doesn't change the global message of AI 2027 at all. Don't get stuck in the drapes.
"At the time we began writing AI 2027 in mid-2024, 2027 was my median but other authors had longer timelines (late 20's early 30's). All of our timelines shifted around a bit over the course of writing; by the time we published in early 2025 my median was 2028 and Eli's was 2031, for example. After publishing, our timelines continued to lengthen a bit as new evidence rolled in and we improved our model (partly in response to feedback!)."
The following tweet also includes the actual probability distribution.
NVIDIA's behavior needs to be understood not just as corporate greed or lust for gain, but as loss avoidance. The export restrictions massively cut into their revenue, and that stings. It's not just the loss of future revenue, it's the loss of revenue they believed they would get.
Thata.not justifying, just interesting. They're going all out to make back a loss, not just secure a gain.
Does Claude app still have prohibitively low limits for Plus users, especially in the case of Opus? I tend to access Claude via OpenRouter, but would like to try the app again.
I think I was one of the few people actually using Gemini 2.5 intensively (I like it for reading papers, writing, or getting one-on-one tutoring, I've probably sent it 1-3k messages). I've continued to use it for the past week.
I don't think I've had any moment where Gemini 3 felt distinctly smarter than 2.5, although I acknowledge that chatbots are getting to a point where a user's ability to discern intelligence is hard, and in one place. Gemini still failed to catch something that GPT-5.1-thinking noticed
"The value of an always-available-on-demand performance of $54 in professional labor is vastly in excess of $54 per use. A huge percentage of the cost of hiring a human is finding them, agreeing on terms, handling logistics and so on. "
Agreed! To add two other factors:
Claude is available 24/7, which for humans at least requires overtime pay or equivalent and
for humans, usually degrades quality at odd hours when the human, even when cooperative, is likely to be half asleep.
I came to the comments to observe that this sentence is more helpful than many newsletter series at capturing one way in which LLMs are valuable tools.
"I think it’s fine to say ‘p(doom) is confused, here’s my number for p(extinction)’ but then people like Vie turn around and think full extinction is some sort of extraordinary outcome when creating minds universally more competitive and capable than ours that can be freely copied seems to be at best quite dense?"
I think it is reasonable to expect p(machines completely take over) > p(complete human extinction)
since there is some probability that either ASIs do wind up keeping us as pets (potentially in large numbers) or at least one ASI winds up keeping a small human breeding pool as a hobby, much as humans sometimes keep ant farms as hobbies. This is another probability contributor under p(machines completely take over) distinct from p(complete human extinction) and disjoint from it.
I'm unconvinced by Yudkowsky & Soares's argument that ASIs would never keep unmodified humans as pets since it would also predict that humans would never keep unmodified cats as pets. Looks like overreach.
"The Strategic Force was tasked with developing a four-stage “leap” strategy to integrate AI-based unified management systems for storing, operating, and commanding nuclear weapons, as well as launching nuclear counterattacks."
I asked Google
"If Skynet were North Korean, what would be a Koreanization of its name (as Skynetski would be for a Russian Skynet)"
and it/they suggested:
"A Koreanized name for Skynet would likely be
Sul-in-net (셜인넷) or Seol-in-net (설인넷), a transliteration of "Skynet" into Korean, or Cheon-ryuk-wang (천륙왕), which translates to "Heavenly Land King," a more propagandistic and powerful name fitting a North Korean-style AI. The first option is a direct phonetic transliteration, while the second is a more creative, ideologically driven name, similar to how North Korea renames things to fit their political narrative."
"Once again those involved in AI 2027 have displayed a far higher level of epistemic responsibility than we typically observe, especially from those not from the rationalist ethos, either in debates on AI or elsewhere."
Will be interesting to see any coverage of AI safety advocates who are connecting with the Pope's message in one of your future updates - so far all I've seen is the bizarre reactions from Mark & Peter et al. which have been extensively covered.
I tried to answer the question Jack Clark raised 6 weeks ago: what does "turning the light on" look like when the system isn't a language model but something used at home, a community, or part of civic infrastructure?https://rajeshachanta.substack.com/p/honey-we-hit-the-jackpot
I haven't read the strip in years, but had to do a double-take at Dilbert having a mouth, before reading the caption that it's AI-generated. That'd be like drawing him with his tie flat (although iirc even Scott Adams eventually admitted that fans didn't seem to care about breaking that artistic taboo the one time he tried it). PHB should be stupider too, but horse's accent, etc.
Remain confused whether it's Nano or Nana Banana, since both terms get used in your writing and no one's called it out yet.
" Rolling your own LLM from scratch does not get you alignment or trust or meaningful ownership and it rarely will make sense to ‘roll your own’ even for vital functions."
This is such a bad take..
Do you think it is safe for China to rely on US LLMs?
Obviously the answer is a big fat no, and it follows that it is the same for everyone else.
Of course most countries can't really consider developping their own AI, but from first principles it is definitely make sense.
“Essentially any stylistic shift or anything else that preserves the content while taking you out of the assistant basin is going to promote jailbreak success rate, since the defenses were focused in the assistant basin.” — crucial insight into how LLMs work.
Responding to the last comments on class discrimination: it made me finally understand why talking about IQ existing is this Verboten thing and somehow completely filtered out of mainstream society mostly everywhere and class discrimination is also why companies don't filter more strongly for IQ.
Podcast episdoe for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-144-thanks-for-the-models
Every time I have seen the AI 2027 team discuss their timelines after releasing their scenario, they have added a delay (2027 -> 2028 -> 2029 -> 2030). I respect them for being honest and candid, and it's logical to update as evidence comes along, but it doesn't really give me confidence that they are actually able to effectively predict the future if their prediction is completely non-robust to new events.
They seem pretty consistent, an adjustment of a few years on a mode doesn't change the global message of AI 2027 at all. Don't get stuck in the drapes.
"At the time we began writing AI 2027 in mid-2024, 2027 was my median but other authors had longer timelines (late 20's early 30's). All of our timelines shifted around a bit over the course of writing; by the time we published in early 2025 my median was 2028 and Eli's was 2031, for example. After publishing, our timelines continued to lengthen a bit as new evidence rolled in and we improved our model (partly in response to feedback!)."
The following tweet also includes the actual probability distribution.
https://x.com/DKokotajlo/status/1991739681831285044
Interesting that nano banana does Dilbert with a visible mouth in dialogue. Possible overgeneralization?
NVIDIA's behavior needs to be understood not just as corporate greed or lust for gain, but as loss avoidance. The export restrictions massively cut into their revenue, and that stings. It's not just the loss of future revenue, it's the loss of revenue they believed they would get.
Thata.not justifying, just interesting. They're going all out to make back a loss, not just secure a gain.
Does Claude app still have prohibitively low limits for Plus users, especially in the case of Opus? I tend to access Claude via OpenRouter, but would like to try the app again.
I think I was one of the few people actually using Gemini 2.5 intensively (I like it for reading papers, writing, or getting one-on-one tutoring, I've probably sent it 1-3k messages). I've continued to use it for the past week.
I don't think I've had any moment where Gemini 3 felt distinctly smarter than 2.5, although I acknowledge that chatbots are getting to a point where a user's ability to discern intelligence is hard, and in one place. Gemini still failed to catch something that GPT-5.1-thinking noticed
I really thought we already had the poetry jailbreak a couple of years back.
"The value of an always-available-on-demand performance of $54 in professional labor is vastly in excess of $54 per use. A huge percentage of the cost of hiring a human is finding them, agreeing on terms, handling logistics and so on. "
Agreed! To add two other factors:
Claude is available 24/7, which for humans at least requires overtime pay or equivalent and
for humans, usually degrades quality at odd hours when the human, even when cooperative, is likely to be half asleep.
I came to the comments to observe that this sentence is more helpful than many newsletter series at capturing one way in which LLMs are valuable tools.
Many Thanks!
nit, re:
"I think it’s fine to say ‘p(doom) is confused, here’s my number for p(extinction)’ but then people like Vie turn around and think full extinction is some sort of extraordinary outcome when creating minds universally more competitive and capable than ours that can be freely copied seems to be at best quite dense?"
I think it is reasonable to expect p(machines completely take over) > p(complete human extinction)
since there is some probability that either ASIs do wind up keeping us as pets (potentially in large numbers) or at least one ASI winds up keeping a small human breeding pool as a hobby, much as humans sometimes keep ant farms as hobbies. This is another probability contributor under p(machines completely take over) distinct from p(complete human extinction) and disjoint from it.
I'm unconvinced by Yudkowsky & Soares's argument that ASIs would never keep unmodified humans as pets since it would also predict that humans would never keep unmodified cats as pets. Looks like overreach.
Re "The central short term idea is to limit AI access to critical systems",
https://www.dailynk.com/english/from-drones-to-nukes-north-korea-pushes-ai-military-modernization-plan/ says, amongst other military AI applications
"The Strategic Force was tasked with developing a four-stage “leap” strategy to integrate AI-based unified management systems for storing, operating, and commanding nuclear weapons, as well as launching nuclear counterattacks."
I asked Google
"If Skynet were North Korean, what would be a Koreanization of its name (as Skynetski would be for a Russian Skynet)"
and it/they suggested:
"A Koreanized name for Skynet would likely be
Sul-in-net (셜인넷) or Seol-in-net (설인넷), a transliteration of "Skynet" into Korean, or Cheon-ryuk-wang (천륙왕), which translates to "Heavenly Land King," a more propagandistic and powerful name fitting a North Korean-style AI. The first option is a direct phonetic transliteration, while the second is a more creative, ideologically driven name, similar to how North Korea renames things to fit their political narrative."
"Once again those involved in AI 2027 have displayed a far higher level of epistemic responsibility than we typically observe, especially from those not from the rationalist ethos, either in debates on AI or elsewhere."
Agreed! FWIW, at the moment
https://www.metaculus.com/questions/5121/when-will-the-first-general-ai-system-be-devised-tested-and-publicly-announced/
has July 2033 as the median (they used to also show 25% and 75% estimated, but changed the UI and I can't find them now)
Will be interesting to see any coverage of AI safety advocates who are connecting with the Pope's message in one of your future updates - so far all I've seen is the bizarre reactions from Mark & Peter et al. which have been extensively covered.
I tried to answer the question Jack Clark raised 6 weeks ago: what does "turning the light on" look like when the system isn't a language model but something used at home, a community, or part of civic infrastructure?https://rajeshachanta.substack.com/p/honey-we-hit-the-jackpot
I haven't read the strip in years, but had to do a double-take at Dilbert having a mouth, before reading the caption that it's AI-generated. That'd be like drawing him with his tie flat (although iirc even Scott Adams eventually admitted that fans didn't seem to care about breaking that artistic taboo the one time he tried it). PHB should be stupider too, but horse's accent, etc.
Remain confused whether it's Nano or Nana Banana, since both terms get used in your writing and no one's called it out yet.
According to https://deepmind.google/models/gemini-image/ it's Nano Banana. I assume Zvi is referencing the 2019 Netta song.
" Rolling your own LLM from scratch does not get you alignment or trust or meaningful ownership and it rarely will make sense to ‘roll your own’ even for vital functions."
This is such a bad take..
Do you think it is safe for China to rely on US LLMs?
Obviously the answer is a big fat no, and it follows that it is the same for everyone else.
Of course most countries can't really consider developping their own AI, but from first principles it is definitely make sense.
“Essentially any stylistic shift or anything else that preserves the content while taking you out of the assistant basin is going to promote jailbreak success rate, since the defenses were focused in the assistant basin.” — crucial insight into how LLMs work.