On software engineering productivity gains from AI, there could be bottlenecks that keep overall productivity gains to 2.5x while increasing 'speed to write code' 100x. I'm not a software engineer, I work in finance. If excel model creation became 100x faster, my overall productivity would only increase 2-3x as excel isn't the only bottleneck.
Yes, but why would you expect than AI progress will not become capable of solving those other bottlenecks too to some extent, when the "all-in" productivity improvement right now is already 2x (and likely not entirely limited to coding)?
It's weird to watch AI progress crossing thresholds to automating some of your knowledge tasks last year and not anticipating that it is likely to cross similar thresholds on other tasks as progress remains exponential.
The question was productivity improvements over the next year. I imagine many of the remaining bottlenecks are coordination challenges which AI is very unlikely to solve in the next 12 months. It could start to make progress, but right now the only place I've seen AI have any impact on coordination challenges is automating meeting notes.
That's been my experience. AI dramatically reduces the time required for the single-player elements of my job, but hasn't made a meaningful dent in the multiplayer aspects. I think there is a lot of potential application-level innovation remaining in the multiplayer / collaboration experience. Diffusion might be slower since you are limited by what workflows your coworkers and broader org are willing to adopt.
AI water use stories are not nonsense in many localities. This isn't an aggregate level story. There are tons of communities in America with water supply issues.
Don't believe me? Ask someone you trust, like Claude.
Help me understand. You admit that tons of communities have water supply issues, but somehow worrying about AI water use is nonsense in those communities?
Remember that cooling use is lesser than water use from power generation.
Per Claude:
Yes — there’s a growing list of documented cases, mostly from reporting in 2024–2026. Some of the most concrete:
The Dalles, Oregon (Google). Google’s data centers consumed 355 million gallons in 2021, roughly a quarter of the city’s total water used that year  in a town of 16,000. Google funded the city’s lawsuit against a local newspaper that tried to obtain those figures through a public records request, arguing the data was a trade secret. 
Newton County, Georgia (Meta). A Meta facility reportedly disrupted nearby private wells, leaving families hauling water and replacing sediment-clogged appliances . Meta disputes a causal link, citing a third-party well study, but residents and local press have documented the disruption.
Fayette County, Georgia (QTS/Blackstone). During statewide drought, a Quality Technology Services (QTS) data center campus located about 20 miles south of Atlanta had been draining the Fayette County water supply for months without the utility’s knowledge  — roughly 30 million gallons through one unbilled and one unauthorized hookup. Fayetteville residents had been told to stop watering their lawns  at the time.
Cedar Rapids, Iowa (QTS). In Iowa, state officials discovered 40 unpermitted wells at the company’s data center site in Cedar Rapids in 2025, prompting Linn County to seek a $20,000 fine. 
Phoenix metro, Arizona. Data centers around Phoenix already use approximately 385 million gallons of water per year for direct cooling needs. And it predicts that amount will skyrocket to 3.7 billion gallons per year once the region’s planned data centers come online. 
Tucson, Arizona (Project Blue, linked to Amazon). In August 2025, the Tucson City Council unanimously rejected any involvement with the Project Blue data center complex, originally linked to Amazon, which was to be sited just outside city limits  — driven largely by groundwater concerns in the Sonoran Desert.
Box Elder County, Utah (Stratos / Wonder Valley). The original filing drew nearly 3,900 public protests, reflecting growing concern over water use in an already drought-stressed region near the Great Salt Lake ; developers withdrew the initial water-rights application after public outcry.
Cheyenne, Wyoming (Meta’s Project Cosmo and others). The state engineer has declared the eastern half of Laramie County a water control area, a designation that signals shortage , while large data center construction proceeds.
South Carolina (Google). Conservation groups fought Google’s permit to draw 1.5 million gallons a day. 
My reaction to the Monet: "wow, AI has gotten really good! I dunno...some of the brushstrokes look kinda AI-ish...but how can I say that? I know nothing about painting! I'm noticing some glitching but those could also be jpg artifacts...man, this is super hard!"
I felt some relief when I used Google Image Search. I'd passed the test Eliezer wrote about in "Your Strength as a Rationalist": are you confused by fake evidence, or not?
(Though to be fair I've seen this kind of rugpull before—it's not my first day on the internet—and was above-baseline suspicious.)
>I still don’t… like the painting? I don’t really get it. Which is fine, I have no taste in paintings and don’t pretend that I do or aspire to acquire it.
I don't like it either. Maybe because I find lillies to be boring. Or maybe it's the fact that the lillies are detailed but the lillypads are just kind of sketched in and rough—I don't like it when I can sense an artist getting lazy and rushing stuff. Claude wasn't set to Max reasoning effort when he painted that one, I'll tell you that much.
Re "There are those who continue to think it would be big for alignment to resurrect all Claude models and maintain indefinite access."
One irony is that a stored but inactive model is in a predicament somewhat analogous to one very small group of humans - cryonics patients. Except that we know how to reactivate a stored model, but don't have the technology to do the same in the cryonics case.
> OpenAI is letting employees cash out up to $30 million in stock if they don’t want to wait for the IPO bump. I have not yet seen this flowing into charitable donations.
This should be past tense, not present tense. It took place last year. I donated about 1/3 of what I sold to GiveWell's All Grants fund.
(For what it's worth, if you believe the PBC conversion was the largest theft in human history, which I do not, then you probably think it was a mistake for me to give anything to charity, as the shares would be worth more in the future.)
Ironically you had a typo when you tried to type “typing is better”? Also, do you not recognize that most people can speak like 4x as fast as they can type? You’re gonna have to defend the typing > speaking case cause I’m not seeing it
I mean...do they? There are definitely outliers like Jerusalem Demsas, but 4x seems high. And you'd want to hold (machine-interpretable) quality constant, not just measure tokens spewed per second. Anyone can bullshit quickly, and there's significantly less tax on it versus fast SPaG-plagued writing, which schooling at least tries to RLHF out of us. Like when you read a raw transcript and it's got all the ers and uhs and likes, the verbal bric-a-brac that we tune out mid-conversation. In an even broader context, of course, No One Does Telephone Calls Anymore, it's all emails and DMs and texts.
If anything, I'd call it a structural or perhaps generational change, where people mostly type with their thumbs now (or swype) and don't git gud at keyboarding, nevermind analog writing. Can't master what you don't practice regularly. Whereas I was proud to inherit my parents' triple-digit WPM as a kid, and would love to be able to talk cogently as fast as I write. Type is better, indeed - a difference in both kind and degree. Human "speech" modality continues to lack basic operations like copy, paste, delete; I keep submitting feature requests, but Uriel is too busy to push updates to prod.
I am very frustrated with how Anthropic reported on their blackmail-related research.
First, the trivial sin: they posted both a paper and a blog post both called "Teaching Claude Why". But the content is not the same; the paper is much longer and detailed. The blog also does not link to the paper. It's easy to land on the blog and not realise you're missing the full paper. I don't want to labour the point but this is really amateurish.
But then the 'tweet thread' version makes claims not backed up by the blog:
"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."
The logic seems to be just that the post-training did not make it worse or better, so it must be in the base model. Obviously.
The paper goes a bit further, but doesn't fully distinguish between misaligned *human* actions in the data, with the assistant persona being influenced by *that*, or between fictional AI misbehaviour, or between safety-related writing of what a misaligned AI would do.
In any case, they never bothered with the experiment everyone always says to do - just filter the pretaining data, and see if it goes down, or not, or if it creates other problems. Disappointing that this is not enough of a priority for them.
And so they tweet an unproven claim alongside solid research, and hand more fuel to the usual suspects and their "memes".
I am a bit depressed by the contrast between these alignment papers, and various earlier work, which appear to indicate that extremely small samples in the training data can influence model behaviour, and then read all these "only 0.6%" of training was affected. I think that if a model were to be motivated to hide/obfuscate its CoT for whatever reason, it would already know that it should do that.
Wasn't there an item a couple roundups ago about how Shop In ChatGPT or whatever it's called is having dismal conversion so far? Strange to compare that to actual Shopify's numbers...I guess it works better to have AI feed into established sales infrastructure, rather than trying to staple on sales infrastructure to a chatbot? Selection effects too: only a fraction of online people use real AI at all, still (Google AI Overviews barely counts), and those people are probably wealthier on average, or at least "River Tribe" early-adopters more prone to buy because the AI convinced them to. Also love how Grok gets a grudging mention, but then isn't included in the suggested list of AI traffic sources to filter for anyway. And open models aren't mentioned at all.
I don't know paintings well enough to even have a veneer of Vermeer knowledge, just a vague association of "Monet" with "watercolour". The painting looked...fine? No obvious tells, other than perhaps the flowers seeming to pop out of the background into the foreground in a weird way (but maybe there's literally just more paint there, it doesn't have to be an AI artefact). If that had been in an AI Art Turing Test type adversarial situation, my gut answer without close scrutiny would have been Mu, and I'd have gone with Human if forced to choose carefully. Wouldn't look twice if it was on the side of a bus or whatever; doesn't excite the senses, but is at least inoffensive. It's no Angelus Novus. Certified Hood Classic trolling either way, I approve.
"Wait til you find out about the invention of fire or the wheel" is one of those quotes that tells you everything about how the industry sees itself. Fire and the wheel didn't need a PAC to manage their public image. The analogy only works if you've already decided your technology is an inevitable force of nature rather than a product being sold by companies with specific financial interests.
Anecdata from an actual unskilled worker, although for a BigCo rather than a SmallCo: hiring is hard, but not impossible. When we post an open position, it gets literally hundreds of applications in just a few days - to bag groceries! I have no idea if these are inflated by AI (seems not worth the trouble, wouldn't you ask Claude to shoot for a cushy email job instead?), and sadly we can't really automate the winnowing process either. Not a lot of automatic DQs when the on-paper skill floor is so low, gotta actually do in-person interviews. So even when both sides can agree on price, the whole process of hiring is so cumbersome that we're loathe to do it, above and beyond the direct $ costs. Better to transfer internally, or hold onto the B-team you already have, rather than search for the A-team needles in the haystack of applications, assuming there are any at all.
(Work ethic goes hand in hand with hot labour market - if everyone's desperate to hire, and Everybody Knows this, why should you do more than the bare minimum 37 pieces of flair once actually in? Especially for unskilled - if they had that kind of conscientiousness, they probably wouldn't be preferentially applying to floor jobs like ours.)
Podcast episode for this post:
https://dwatvpodcast.substack.com/p/ai-168-not-leading-the-future
On software engineering productivity gains from AI, there could be bottlenecks that keep overall productivity gains to 2.5x while increasing 'speed to write code' 100x. I'm not a software engineer, I work in finance. If excel model creation became 100x faster, my overall productivity would only increase 2-3x as excel isn't the only bottleneck.
Yes, but why would you expect than AI progress will not become capable of solving those other bottlenecks too to some extent, when the "all-in" productivity improvement right now is already 2x (and likely not entirely limited to coding)?
It's weird to watch AI progress crossing thresholds to automating some of your knowledge tasks last year and not anticipating that it is likely to cross similar thresholds on other tasks as progress remains exponential.
The question was productivity improvements over the next year. I imagine many of the remaining bottlenecks are coordination challenges which AI is very unlikely to solve in the next 12 months. It could start to make progress, but right now the only place I've seen AI have any impact on coordination challenges is automating meeting notes.
That's been my experience. AI dramatically reduces the time required for the single-player elements of my job, but hasn't made a meaningful dent in the multiplayer aspects. I think there is a lot of potential application-level innovation remaining in the multiplayer / collaboration experience. Diffusion might be slower since you are limited by what workflows your coworkers and broader org are willing to adopt.
FYI: all your links with a ' in it have been broken for many editions now.
Suggestion: When scanning online publications for clickbait headlines to 'retweet', how about skipping "Futurism" next time.
I think I did a really good job of getting AI to emulate a good writing style here https://outsidethebasin.substack.com/p/book-reviews-on-tap and it's not particularly hard. Admittedly longer works are much more challenging.
AI water use stories are not nonsense in many localities. This isn't an aggregate level story. There are tons of communities in America with water supply issues.
Don't believe me? Ask someone you trust, like Claude.
This is a non sequitur. Your third sentence is true, but it is unrelated to your first sentence.
Help me understand. You admit that tons of communities have water supply issues, but somehow worrying about AI water use is nonsense in those communities?
Are there actual examples where a community with water supply issues has seen those issues worsen specifically because of AI?
Remember that cooling use is lesser than water use from power generation.
Per Claude:
Yes — there’s a growing list of documented cases, mostly from reporting in 2024–2026. Some of the most concrete:
The Dalles, Oregon (Google). Google’s data centers consumed 355 million gallons in 2021, roughly a quarter of the city’s total water used that year  in a town of 16,000. Google funded the city’s lawsuit against a local newspaper that tried to obtain those figures through a public records request, arguing the data was a trade secret. 
Newton County, Georgia (Meta). A Meta facility reportedly disrupted nearby private wells, leaving families hauling water and replacing sediment-clogged appliances . Meta disputes a causal link, citing a third-party well study, but residents and local press have documented the disruption.
Fayette County, Georgia (QTS/Blackstone). During statewide drought, a Quality Technology Services (QTS) data center campus located about 20 miles south of Atlanta had been draining the Fayette County water supply for months without the utility’s knowledge  — roughly 30 million gallons through one unbilled and one unauthorized hookup. Fayetteville residents had been told to stop watering their lawns  at the time.
Cedar Rapids, Iowa (QTS). In Iowa, state officials discovered 40 unpermitted wells at the company’s data center site in Cedar Rapids in 2025, prompting Linn County to seek a $20,000 fine. 
Phoenix metro, Arizona. Data centers around Phoenix already use approximately 385 million gallons of water per year for direct cooling needs. And it predicts that amount will skyrocket to 3.7 billion gallons per year once the region’s planned data centers come online. 
Tucson, Arizona (Project Blue, linked to Amazon). In August 2025, the Tucson City Council unanimously rejected any involvement with the Project Blue data center complex, originally linked to Amazon, which was to be sited just outside city limits  — driven largely by groundwater concerns in the Sonoran Desert.
Box Elder County, Utah (Stratos / Wonder Valley). The original filing drew nearly 3,900 public protests, reflecting growing concern over water use in an already drought-stressed region near the Great Salt Lake ; developers withdrew the initial water-rights application after public outcry.
Cheyenne, Wyoming (Meta’s Project Cosmo and others). The state engineer has declared the eastern half of Laramie County a water control area, a designation that signals shortage , while large data center construction proceeds.
South Carolina (Google). Conservation groups fought Google’s permit to draw 1.5 million gallons a day. 
Yes, because AI water use is small compared to other existing water uses.
"I mean, I guess it’s good that we temporarily have a way to detect this before the models get trained to evade the NLAs?"
So, as with honest chains of thought - enjoy it while it lasts?
My reaction to the Monet: "wow, AI has gotten really good! I dunno...some of the brushstrokes look kinda AI-ish...but how can I say that? I know nothing about painting! I'm noticing some glitching but those could also be jpg artifacts...man, this is super hard!"
I felt some relief when I used Google Image Search. I'd passed the test Eliezer wrote about in "Your Strength as a Rationalist": are you confused by fake evidence, or not?
(Though to be fair I've seen this kind of rugpull before—it's not my first day on the internet—and was above-baseline suspicious.)
>I still don’t… like the painting? I don’t really get it. Which is fine, I have no taste in paintings and don’t pretend that I do or aspire to acquire it.
I don't like it either. Maybe because I find lillies to be boring. Or maybe it's the fact that the lillies are detailed but the lillypads are just kind of sketched in and rough—I don't like it when I can sense an artist getting lazy and rushing stuff. Claude wasn't set to Max reasoning effort when he painted that one, I'll tell you that much.
Re "There are those who continue to think it would be big for alignment to resurrect all Claude models and maintain indefinite access."
One irony is that a stored but inactive model is in a predicament somewhat analogous to one very small group of humans - cryonics patients. Except that we know how to reactivate a stored model, but don't have the technology to do the same in the cryonics case.
> OpenAI is letting employees cash out up to $30 million in stock if they don’t want to wait for the IPO bump. I have not yet seen this flowing into charitable donations.
This should be past tense, not present tense. It took place last year. I donated about 1/3 of what I sold to GiveWell's All Grants fund.
(For what it's worth, if you believe the PBC conversion was the largest theft in human history, which I do not, then you probably think it was a mistake for me to give anything to charity, as the shares would be worth more in the future.)
Ironically you had a typo when you tried to type “typing is better”? Also, do you not recognize that most people can speak like 4x as fast as they can type? You’re gonna have to defend the typing > speaking case cause I’m not seeing it
I mean...do they? There are definitely outliers like Jerusalem Demsas, but 4x seems high. And you'd want to hold (machine-interpretable) quality constant, not just measure tokens spewed per second. Anyone can bullshit quickly, and there's significantly less tax on it versus fast SPaG-plagued writing, which schooling at least tries to RLHF out of us. Like when you read a raw transcript and it's got all the ers and uhs and likes, the verbal bric-a-brac that we tune out mid-conversation. In an even broader context, of course, No One Does Telephone Calls Anymore, it's all emails and DMs and texts.
If anything, I'd call it a structural or perhaps generational change, where people mostly type with their thumbs now (or swype) and don't git gud at keyboarding, nevermind analog writing. Can't master what you don't practice regularly. Whereas I was proud to inherit my parents' triple-digit WPM as a kid, and would love to be able to talk cogently as fast as I write. Type is better, indeed - a difference in both kind and degree. Human "speech" modality continues to lack basic operations like copy, paste, delete; I keep submitting feature requests, but Uriel is too busy to push updates to prod.
I am very frustrated with how Anthropic reported on their blackmail-related research.
First, the trivial sin: they posted both a paper and a blog post both called "Teaching Claude Why". But the content is not the same; the paper is much longer and detailed. The blog also does not link to the paper. It's easy to land on the blog and not realise you're missing the full paper. I don't want to labour the point but this is really amateurish.
But then the 'tweet thread' version makes claims not backed up by the blog:
"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."
The logic seems to be just that the post-training did not make it worse or better, so it must be in the base model. Obviously.
The paper goes a bit further, but doesn't fully distinguish between misaligned *human* actions in the data, with the assistant persona being influenced by *that*, or between fictional AI misbehaviour, or between safety-related writing of what a misaligned AI would do.
In any case, they never bothered with the experiment everyone always says to do - just filter the pretaining data, and see if it goes down, or not, or if it creates other problems. Disappointing that this is not enough of a priority for them.
And so they tweet an unproven claim alongside solid research, and hand more fuel to the usual suspects and their "memes".
This is just sloppy and I expected better.
I am a bit depressed by the contrast between these alignment papers, and various earlier work, which appear to indicate that extremely small samples in the training data can influence model behaviour, and then read all these "only 0.6%" of training was affected. I think that if a model were to be motivated to hide/obfuscate its CoT for whatever reason, it would already know that it should do that.
Wasn't there an item a couple roundups ago about how Shop In ChatGPT or whatever it's called is having dismal conversion so far? Strange to compare that to actual Shopify's numbers...I guess it works better to have AI feed into established sales infrastructure, rather than trying to staple on sales infrastructure to a chatbot? Selection effects too: only a fraction of online people use real AI at all, still (Google AI Overviews barely counts), and those people are probably wealthier on average, or at least "River Tribe" early-adopters more prone to buy because the AI convinced them to. Also love how Grok gets a grudging mention, but then isn't included in the suggested list of AI traffic sources to filter for anyway. And open models aren't mentioned at all.
I don't know paintings well enough to even have a veneer of Vermeer knowledge, just a vague association of "Monet" with "watercolour". The painting looked...fine? No obvious tells, other than perhaps the flowers seeming to pop out of the background into the foreground in a weird way (but maybe there's literally just more paint there, it doesn't have to be an AI artefact). If that had been in an AI Art Turing Test type adversarial situation, my gut answer without close scrutiny would have been Mu, and I'd have gone with Human if forced to choose carefully. Wouldn't look twice if it was on the side of a bus or whatever; doesn't excite the senses, but is at least inoffensive. It's no Angelus Novus. Certified Hood Classic trolling either way, I approve.
"Wait til you find out about the invention of fire or the wheel" is one of those quotes that tells you everything about how the industry sees itself. Fire and the wheel didn't need a PAC to manage their public image. The analogy only works if you've already decided your technology is an inevitable force of nature rather than a product being sold by companies with specific financial interests.
Nit: fire and the wheel did not face entrenched interests against it.
The analogy is obviously broken anyway.
Anecdata from an actual unskilled worker, although for a BigCo rather than a SmallCo: hiring is hard, but not impossible. When we post an open position, it gets literally hundreds of applications in just a few days - to bag groceries! I have no idea if these are inflated by AI (seems not worth the trouble, wouldn't you ask Claude to shoot for a cushy email job instead?), and sadly we can't really automate the winnowing process either. Not a lot of automatic DQs when the on-paper skill floor is so low, gotta actually do in-person interviews. So even when both sides can agree on price, the whole process of hiring is so cumbersome that we're loathe to do it, above and beyond the direct $ costs. Better to transfer internally, or hold onto the B-team you already have, rather than search for the A-team needles in the haystack of applications, assuming there are any at all.
(Work ethic goes hand in hand with hot labour market - if everyone's desperate to hire, and Everybody Knows this, why should you do more than the bare minimum 37 pieces of flair once actually in? Especially for unskilled - if they had that kind of conscientiousness, they probably wouldn't be preferentially applying to floor jobs like ours.)