** Disclaimer, I am an engineer (a network engineer), but not the type that writes code, nor are any of my coworkers. I am taking courses to get a Master's in CS, though.**
What are people measuring when they say they are "10x more productive"? Are they measuring anything at all, or is it all impressions and intuitions?
I use ChatGPT every day, but I'm very hesitant to say that it's had a big impact on my productivity. It certainly hasn't 10x'd anything for me.
In particular, I've found that while it has been extremely good at helping me with graduate level homework assignments, it's not great at helping me with my job. I suspect this is because there is almost no available prior work when it comes to "debug bizarre network problem on this un-popular platform that has little in the way of public documentation", but there are reams and reams of discussion available for popular homework problems. Plus, the homework problems are all designed to be solvable, often simply so.
In my everyday life, LLMs have mostly replaced search engines for me, which has probably made me slightly more productive. Very hard to say, though. Is replacing a trip to wirecutter with an o3 query a gain in productivity? Or does it just feel like one?
Something I continue to notice with LLMs is that the more I use a model, the more aware I am of its limitations. When I first try one I feel like there's endless utility, but the more I get to know it the more aware I am of what they cannot do. I don't think this happens with other very useful tools that I use. Time spent using a great text editor, for instance, only gives me more and more appreciation for the abilities of the tool.
In any case, I eagerly await the economic data that demonstrates the productivity gains in terms of verifiable measurements.
It appears that the first study found a 56% increase in "production speed" and the second a 21%. Is that your reading of the studies?
Which is great, but that's really far from a 1000% increase. I would expect gains of that level to be glaringly obvious in economic data *somewhere*. Is there a company out that that has laid of 90% of its workforce and seen no change in economic productivity metrics? Do we predict that sort of thing will happen?
I don't think it's a 10x boost for anyone right now, not overall at least. I've had individual tasks where Claude Code has accelerated me 10x or more, but those were generally tasks where I didn't really know what I was doing to start with (writing PowerShell scripts is a great example -- hate that syntax).
I think Amdahl's Law will kick in. The tools will get better and better, and will be able to do more and more, but the remaining process bottlenecks (including the humans) will limit overall throughput until they in turn can be automated away.
This study that I just saw today actually purports that people get slower when using AI tools to code!
I don't know what reality is, my personal opinion is that AI tools probably provide a marginal benefit. It's absolutely not 10x improvement though! That's not found anywhere in the data!
Yeah, weird that that one just came out today! It definitely updates me in the direction of damping down anecdotal reports (including mine!) and looking harder at measurable results.
OpenAI sponsored a car last weekend in IndyCar. Full OpenAI paint job.
The OpenAI car almost won the race. But, right at the end, the driver didn’t decelerate quite enough going into a turn, and lost control of the car, so they didn’t win.
Car analogies have long been used in computing. It may not be a bad thing to return to tradition here. After all, elements that let you control the car through turns (tires, brakes, aerodynamics) are as or more important than engine power.
Being ahead in a race but then failing to decelerate at the end and losing everything as a result doesn't sound like a thing that could possibly happen to OpenAI.
> I disagree that there is little room for future progress even if we confine ourselves to the narrow question of individual practical user queries of the types currently asked. I do not think that even on current queries, LLM answers are anywhere close to optimal, including in terms of taking into account context and customizing to a given user and their situation.
I suspect TC in particular can just append "for Tyler Cowen" to his queries and get extremely great answers for his taste and style. He might have a 1-2 year edge on customization already.
I don't know what Zvi uses but I've had good luck with this prompt and o3:
Give me clear and direct answers without sycophancy. If I propose an idea or thought respond critically and try to find holes in my logic. Don't worry about simplifying things for me, I'm capable of understanding anything you throw my way.
If I ask a question that requires you to do research please give me a summary of ideas you pursued and abandoned. Also give me summaries you thought to provide but did not for one reason or another. I want to see how you think and trace your thoughts.
I don't know if this would have been worth including but there was a lot of big news in video games recently about Microsoft Xbox shutting down a bunch of studios and firing some ridiculous number of employees. The reactionary take is that they're starting to buckle under the unsustainability of game pass but, given that it's known that this was a directive from Microsoft writ large rather than their Xbox division, it's significantly more likely they're trying to free up capital for AI.
This link is broken --> "this could be another hint. Moving on seems wise."
Consider this a gentle nudge to use fewer external links (especially to twitter) that need to be clicked to actually know what you're saying. Some of us don't have a twitter account.
"I strongly agree that it would be very good if the main chat services like ChatGPT, Claude and Gemini offered branching (or cloning) and undoing within chats, so you can experiment with different continuations. I remain confused why this is not offered."
In AI Studio, you can edit previous prompts and responses, and rerun any turn. The little pencil icon is for editing, and the gemini icon reruns that turn.
Very useful for putting together more complex flows.
>There were still some quite bad posts, such as the ‘noticing’ that still seem entirely unprovoked.
Wasn’t it provoked by a QT of a tweet saying something along the lines of “people find it harder to enjoy Hollywood movies once they notice”, and then the poster asking grok “notice what?” (I know this might seem like pretty thin “provocation”, but the word “notice” has a long association with antisemitism, e.g. when Martin Amis asked his father, Kingsley, what it was like to be an antisemite, Kingsley replied “You just… notice things.”)
I’m a little confused about your note about wanting the ability to branch within chats - both ChatGPT and Claude have had edit and retry buttons since basically the beginning, and there are small arrows at the bottom of the promo that allow you to go back to previous versions. I use that to branch exactly the way you’re talking about all the time. The only issue is that you can’t easily go back to a specific branch if you have a lot of them, especially if you’re trying to go back hours or days later. But as far as I can tell this is exactly what you’re asking for, so I notice that I am confused
I interpreted Zvi as asking for branching in Claude Code and Cursor. Currently you have to use revision control scaffolding. It would be nice if the agent was trained to use various git workflows by default instead of trying to reverse out changes via imperfect interpretation of its record of what was done. Something like
** Disclaimer, I am an engineer (a network engineer), but not the type that writes code, nor are any of my coworkers. I am taking courses to get a Master's in CS, though.**
What are people measuring when they say they are "10x more productive"? Are they measuring anything at all, or is it all impressions and intuitions?
I use ChatGPT every day, but I'm very hesitant to say that it's had a big impact on my productivity. It certainly hasn't 10x'd anything for me.
In particular, I've found that while it has been extremely good at helping me with graduate level homework assignments, it's not great at helping me with my job. I suspect this is because there is almost no available prior work when it comes to "debug bizarre network problem on this un-popular platform that has little in the way of public documentation", but there are reams and reams of discussion available for popular homework problems. Plus, the homework problems are all designed to be solvable, often simply so.
In my everyday life, LLMs have mostly replaced search engines for me, which has probably made me slightly more productive. Very hard to say, though. Is replacing a trip to wirecutter with an o3 query a gain in productivity? Or does it just feel like one?
Something I continue to notice with LLMs is that the more I use a model, the more aware I am of its limitations. When I first try one I feel like there's endless utility, but the more I get to know it the more aware I am of what they cannot do. I don't think this happens with other very useful tools that I use. Time spent using a great text editor, for instance, only gives me more and more appreciation for the abilities of the tool.
In any case, I eagerly await the economic data that demonstrates the productivity gains in terms of verifiable measurements.
There are a few existing actual studies. Here are a couple:
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot: https://arxiv.org/pdf/2302.06590
How much does AI impact development speed? An enterprise-based randomized controlled trial: https://arxiv.org/html/2410.12944v2
Looks like a moderate (20-50%) boost, but in contrived situations and with much older tools and models.
Thank you for linking those!
It appears that the first study found a 56% increase in "production speed" and the second a 21%. Is that your reading of the studies?
Which is great, but that's really far from a 1000% increase. I would expect gains of that level to be glaringly obvious in economic data *somewhere*. Is there a company out that that has laid of 90% of its workforce and seen no change in economic productivity metrics? Do we predict that sort of thing will happen?
I don't think it's a 10x boost for anyone right now, not overall at least. I've had individual tasks where Claude Code has accelerated me 10x or more, but those were generally tasks where I didn't really know what I was doing to start with (writing PowerShell scripts is a great example -- hate that syntax).
I think Amdahl's Law will kick in. The tools will get better and better, and will be able to do more and more, but the remaining process bottlenecks (including the humans) will limit overall throughput until they in turn can be automated away.
I have no idea when to expect that.
https://x.com/METR_Evals/status/1943360399220388093
This study that I just saw today actually purports that people get slower when using AI tools to code!
I don't know what reality is, my personal opinion is that AI tools probably provide a marginal benefit. It's absolutely not 10x improvement though! That's not found anywhere in the data!
That's because you aren't properly accounting for unverifiable self reports on Twitter as data.
All too true.
Yeah, weird that that one just came out today! It definitely updates me in the direction of damping down anecdotal reports (including mine!) and looking harder at measurable results.
There are studies that show productivity losses. If you want more details I recommend following Gary Marcus (https://garymarcus.substack.com/).
OpenAI sponsored a car last weekend in IndyCar. Full OpenAI paint job.
The OpenAI car almost won the race. But, right at the end, the driver didn’t decelerate quite enough going into a turn, and lost control of the car, so they didn’t win.
Is this a metaphor?
Can something be a metaphor if it actually happened?
If you mention it at the right time, sure. I didn't know this actually happened.
https://youtu.be/TT4sie-KY6k?t=1330
Here it is!
Car analogies have long been used in computing. It may not be a bad thing to return to tradition here. After all, elements that let you control the car through turns (tires, brakes, aerodynamics) are as or more important than engine power.
Alignment is also a central concern.
Yes, some of the best metaphors really happened, eg the Titanic
Maybe more of a parable, then, or an allegory?
Being ahead in a race but then failing to decelerate at the end and losing everything as a result doesn't sound like a thing that could possibly happen to OpenAI.
Failing to accelerate, crashing to the wall, setting fire to the entire stadium…
well they didn't lose everything, it's just that due to this error, in this particular race they came in second to their orange competitor.
Did the car unexpectedly make a sharp left turn?
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-124-grokless-interlude
> I disagree that there is little room for future progress even if we confine ourselves to the narrow question of individual practical user queries of the types currently asked. I do not think that even on current queries, LLM answers are anywhere close to optimal, including in terms of taking into account context and customizing to a given user and their situation.
I suspect TC in particular can just append "for Tyler Cowen" to his queries and get extremely great answers for his taste and style. He might have a 1-2 year edge on customization already.
Would you mind sharing your "do not be sycophantic" prompt?
I don't know what Zvi uses but I've had good luck with this prompt and o3:
Give me clear and direct answers without sycophancy. If I propose an idea or thought respond critically and try to find holes in my logic. Don't worry about simplifying things for me, I'm capable of understanding anything you throw my way.
If I ask a question that requires you to do research please give me a summary of ideas you pursued and abandoned. Also give me summaries you thought to provide but did not for one reason or another. I want to see how you think and trace your thoughts.
I don't know if this would have been worth including but there was a lot of big news in video games recently about Microsoft Xbox shutting down a bunch of studios and firing some ridiculous number of employees. The reactionary take is that they're starting to buckle under the unsustainability of game pass but, given that it's known that this was a directive from Microsoft writ large rather than their Xbox division, it's significantly more likely they're trying to free up capital for AI.
Please write this article - I am also not yet covering Anthropic’s latest alignment faking paper,
This link is broken --> "this could be another hint. Moving on seems wise."
Consider this a gentle nudge to use fewer external links (especially to twitter) that need to be clicked to actually know what you're saying. Some of us don't have a twitter account.
In this case it might have been a deliberate choice not to repeat the content. Still, he could have mentioned that.
"I strongly agree that it would be very good if the main chat services like ChatGPT, Claude and Gemini offered branching (or cloning) and undoing within chats, so you can experiment with different continuations. I remain confused why this is not offered."
In AI Studio, you can edit previous prompts and responses, and rerun any turn. The little pencil icon is for editing, and the gemini icon reruns that turn.
Very useful for putting together more complex flows.
"I rated the [community] note as helpful."
⚰️ That's the kind of deadpan humor I come here for.
Bizarrely OpenAI seems to have responded to this grok stuff by making ChatGPT...sound like grok by default now??
Bizarrely OpenAI seems to have responded to this grok stuff by making ChatGPT...sound like grok by default now??
Bizarrely OpenAI seems to have responded to this grok stuff by making ChatGPT...sound like grok by default now??
I bought The Butlerian Jihad but am no worried I won't get around to reading it before AGI. :(
>There were still some quite bad posts, such as the ‘noticing’ that still seem entirely unprovoked.
Wasn’t it provoked by a QT of a tweet saying something along the lines of “people find it harder to enjoy Hollywood movies once they notice”, and then the poster asking grok “notice what?” (I know this might seem like pretty thin “provocation”, but the word “notice” has a long association with antisemitism, e.g. when Martin Amis asked his father, Kingsley, what it was like to be an antisemite, Kingsley replied “You just… notice things.”)
I’m a little confused about your note about wanting the ability to branch within chats - both ChatGPT and Claude have had edit and retry buttons since basically the beginning, and there are small arrows at the bottom of the promo that allow you to go back to previous versions. I use that to branch exactly the way you’re talking about all the time. The only issue is that you can’t easily go back to a specific branch if you have a lot of them, especially if you’re trying to go back hours or days later. But as far as I can tell this is exactly what you’re asking for, so I notice that I am confused
I interpreted Zvi as asking for branching in Claude Code and Cursor. Currently you have to use revision control scaffolding. It would be nice if the agent was trained to use various git workflows by default instead of trying to reverse out changes via imperfect interpretation of its record of what was done. Something like
https://github.com/anthropics/claude-code/issues/150 but this being the robust default with a deterministic UI rather than something prompted imperfectly.