First off, major WTF about them bragging about recursive self improvement? I thought that was a big no no?! At some point the strategy feels like “allude to being able to end the world in order to get more VC bux”
Secondly, this seems like the start of OpenAIs profitability push and monetization of their free & paid tiers. They are able to route way more queries to their mini models with this update, and their GPT-5 cost per token is what, 1/3rd of Opus4.1?
Means they are figuring out how to get their GPUs to work a lot less hard for minimum viable intelligence per query. Monetization push!
OpenAI seems to have abandoned any pretense of safety. With the new prompt injection attacks on agents showing real world attack surface just reported in Wired from the Black Hat conference, we are headed to something that is not p(doom) but p(annoying).
> This was not OpenAI’s attempt to blow us away or to substantially raise the level of underlying capabilities and intelligence. That will come another time.
Their choice to use the gpt-5 name on something like this indicates they're not very confident in their ability to release a truly impressive model any time soon.
I'm other words: seems like people are starting to feel the AGI a bit less over there.
I'm not sure about that. The GPT-5 naming is not for the folks reading this blog; it's for the 700M daily users who (will, when everything is working right) have access to a much more powerful model for regular use. The GPT-N naming is for the mass market. I don't think it says a ton about what they may have in the pipeline for more specialized use cases.
The divergence between models that are for 'the masses' and for experts is itself evidence of dimishing returns. If we were in the 'scale is all you need'/'situational awareness' world OpenAI would just throw more compute/data at the problem and get a gpt-5 that impresses everyone because it's way smarter than gpt-4. This is what 'higher general intelligence' is *by definition*.
"Impresses everyone" is unachievable, as the "#save4o" movement proves. I think you underestimate how much better this makes ChatGPT look to people who are smart but not AI-savvy, and who aren't in it for the glazing. I was just in a meeting earlier today with smart senior executives and heard two anecdotes about how powerful and amazing GPT-5 was for what they asked of it. Maybe o3 could have done the same thing for them, but they never would have selected it because they didn't know it was there, or that a "3" model would be more powerful than a "4" model. GPT-5 "Just Works" for them and they're really impressed with it, so for OpenAI it's mission accomplished. They have no motive to spend more on their consumer model at this point in time.
I'm thinking similarly - my baseline guess is that they have multiple sizes of model in training at all times, and what they release depends on doneness / benchmark-SOTA / immediate business goals / competitive landscape / best use of limited compute. Like, they almost definitely have a big GPT-4.5-sized model, if not something larger (or smaller, Opus-sized), but on the new CoT paradigm - and this release tells us that one isn't ready yet. I'm guessing those are OOMs more expensive to train.
But I'm also interpreting this release as *positioning* themselves better for training bigger ones - larger revenue streams (coding users), cheaper default model, and unifying somewhat so they can drop in a more powerful GPT-6 or w/e. Sam said they're getting a lot more compute this year.
I appreciate that the names finally make sense. "Think", "Fast", and "Auto" are intuitive, and moreover, the naming scheme has room to grow. Will they need a separate "Coding" one, or will they lump that into a larger "Logical" or "Problem Solving" model ? Will they ever make an "Intuitive" or "Narrative" one for people who are shooting the bull and telling collaborative stories? It is flexible and allows the company to adjust based on the market and on how the technology improves.
From the outside, it sounds like a smart product manager got involved, redid the names to make sense, and then bumped the version number to 5 to get a clean slate. Then, marketing got ahold of it and sold it as more of a revolution than it is. I don't think the public should read much more into it than that, and it may have been a mistake for the company to over-glamorize things just because of the version number bumps.
Sensible names are a big improvement. The effective power of a tool is limited by people's ability to use it. It is roughly multplicative: real power = theoretical power X percentage of the power that a user can figure out. Improving the names will improve the level at which people can actually use the things.
It seems that the crux is not so much whether the current ML paradigm can get itself over the line to recursive self-improvement, as whether optimal LLMs can consult labs usefully enough on promising research avenues to break the impasse collaboratively - to effect partial recursive self-improvement en route to taking it in hand independently, and inexorably.
The hallucination/error rate reduction in GPT5-Thinking feels like it has been underreported and appreciated. It is a dramatic improvement on the standard for models before. The level of reliability feels crazy in my usage so far, especially compared to o3 which was basically full of shit 10-20% of the time. I feel more confident on its capacity to handle business level tasks than I have with any other model, including Opus 4.
I was thinking they were running out of data to train on and synthetic data has its limits so GPT-5 is just a product to raise the floor to get more data for the Big Boi
Everyone's stuck in a box with their thinking on this. It's not a new model! it's just the old model RL'ed more. It's not a good model, not a bad model, it's the same model with some minor fine tuning & twitter threadbois are seeing ghosts trying to distinguish it from the old
“and even if it did happen the resulting underclass probably does damn well compared to today”
This ignores the fact that happiness is experienced relative to expectations, not in absolute terms.
By definition, a underclass has low quality of life, because QoL is always considered relative to other people today, not years ago. If that was not the case, then all 70 year olds would be deliriously happy all the time, based on how much progress has been made over their lifetimes.
I heard "do the dumb things you gotta do" on the podcast version of this post, and thought "could it be that Zvi likes TMBG?!"
Perhaps a new headline for your 4o sycophancy discussions- "I'm your only friend, I'm not your only friend, but I'm a little glowing friend, but really I'm not actually your friend...but I am"
Please try Claude Code, even if it's just for something small. Nothing else is quite like it right now, and you really need to use it to understand why.
Is this a useful framing? Code works by using many parallel calls to smaller models to build high quality context for queries to more capable models. It's not magic but it does qualitatively different things than tab completion in the editor, or using more thinking.
There’s merit to product design. Just because the underlying model isn’t special doesn’t mean the product isn’t. Claude Code is useful as a productivity tool in a way that almost nothing else in the AI space is.
Mass adoption and brand dominance. They aim to reach 3 billion users and make "OpenAI" synonymous with "AI". Since this is the first time the general public has access to a state-of-the-art frontier model, they just might succeed.
Monetizing developer tools. They’re eyeing significant revenue from coding. In response to Anthropic morphing Claude Code into a blunt instrument, OpenAI partnered with Anysphere. My guess is that OpenAI has made specific commitments regarding collaboration in the coding space with players like Anysphere, Lovable, and others.
Building goodwill through healthcare. Their focus on medical applications suggests they may have secured deals with major medical publishers—perhaps with exclusive rights. Such partnerships would amplify the tool’s value for patients and healthcare consumers.
Look, this isn't the model WE wanted, but it might be the right model at this time for OpenAI.
> You should worry more about not surviving or humanity not remaining in control, or your place in the social and economic order if transformational AI does not arrive soon, and less about your place relative to other humans in positive post-AI worlds.
Expected decades of regret are somehow more salient tho. Maybe because dying somehow fails to propagate emotionally.
That this is all they could manage (and there are some nice improvements as you note) i think is suggestive that their approach is in fact hitting walls of some sort, that the texture of future progress is not just a smooth slope you can simply throw more resources at to crest.
i think it is quite muddy, has always been muddy, and like most totalizing pitches for funding they obscured this muddiness, most likely to themselves as well.
I feel like this pt (or some more sophisticated variant) is missing a bit from your post. Maybe it is just too muddy what it will do for the research side? But i have a hard time imaging what problem it can grind on that yields continual gains from anything i have read. But there is a huge hole in my deeper understanding here so hard to feel secure in that pt.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/gpt-5s-are-alive-synthesis?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
First off, major WTF about them bragging about recursive self improvement? I thought that was a big no no?! At some point the strategy feels like “allude to being able to end the world in order to get more VC bux”
Secondly, this seems like the start of OpenAIs profitability push and monetization of their free & paid tiers. They are able to route way more queries to their mini models with this update, and their GPT-5 cost per token is what, 1/3rd of Opus4.1?
Means they are figuring out how to get their GPUs to work a lot less hard for minimum viable intelligence per query. Monetization push!
OpenAI seems to have abandoned any pretense of safety. With the new prompt injection attacks on agents showing real world attack surface just reported in Wired from the Black Hat conference, we are headed to something that is not p(doom) but p(annoying).
> This was not OpenAI’s attempt to blow us away or to substantially raise the level of underlying capabilities and intelligence. That will come another time.
Their choice to use the gpt-5 name on something like this indicates they're not very confident in their ability to release a truly impressive model any time soon.
I'm other words: seems like people are starting to feel the AGI a bit less over there.
I'm not sure about that. The GPT-5 naming is not for the folks reading this blog; it's for the 700M daily users who (will, when everything is working right) have access to a much more powerful model for regular use. The GPT-N naming is for the mass market. I don't think it says a ton about what they may have in the pipeline for more specialized use cases.
The divergence between models that are for 'the masses' and for experts is itself evidence of dimishing returns. If we were in the 'scale is all you need'/'situational awareness' world OpenAI would just throw more compute/data at the problem and get a gpt-5 that impresses everyone because it's way smarter than gpt-4. This is what 'higher general intelligence' is *by definition*.
"Impresses everyone" is unachievable, as the "#save4o" movement proves. I think you underestimate how much better this makes ChatGPT look to people who are smart but not AI-savvy, and who aren't in it for the glazing. I was just in a meeting earlier today with smart senior executives and heard two anecdotes about how powerful and amazing GPT-5 was for what they asked of it. Maybe o3 could have done the same thing for them, but they never would have selected it because they didn't know it was there, or that a "3" model would be more powerful than a "4" model. GPT-5 "Just Works" for them and they're really impressed with it, so for OpenAI it's mission accomplished. They have no motive to spend more on their consumer model at this point in time.
>"Impresses everyone" is unachievable
It used to be achievable. Didn't gpt-3 and gpt-4 impress everyone?
I'm thinking similarly - my baseline guess is that they have multiple sizes of model in training at all times, and what they release depends on doneness / benchmark-SOTA / immediate business goals / competitive landscape / best use of limited compute. Like, they almost definitely have a big GPT-4.5-sized model, if not something larger (or smaller, Opus-sized), but on the new CoT paradigm - and this release tells us that one isn't ready yet. I'm guessing those are OOMs more expensive to train.
But I'm also interpreting this release as *positioning* themselves better for training bigger ones - larger revenue streams (coding users), cheaper default model, and unifying somewhat so they can drop in a more powerful GPT-6 or w/e. Sam said they're getting a lot more compute this year.
At 3000 weekly gpt5t queries on plus just default to that and never mess with model selector...
I do wonder how much more efficient it is vs o3 were I think it was 200?
I appreciate that the names finally make sense. "Think", "Fast", and "Auto" are intuitive, and moreover, the naming scheme has room to grow. Will they need a separate "Coding" one, or will they lump that into a larger "Logical" or "Problem Solving" model ? Will they ever make an "Intuitive" or "Narrative" one for people who are shooting the bull and telling collaborative stories? It is flexible and allows the company to adjust based on the market and on how the technology improves.
From the outside, it sounds like a smart product manager got involved, redid the names to make sense, and then bumped the version number to 5 to get a clean slate. Then, marketing got ahold of it and sold it as more of a revolution than it is. I don't think the public should read much more into it than that, and it may have been a mistake for the company to over-glamorize things just because of the version number bumps.
Sensible names are a big improvement. The effective power of a tool is limited by people's ability to use it. It is roughly multplicative: real power = theoretical power X percentage of the power that a user can figure out. Improving the names will improve the level at which people can actually use the things.
It seems that the crux is not so much whether the current ML paradigm can get itself over the line to recursive self-improvement, as whether optimal LLMs can consult labs usefully enough on promising research avenues to break the impasse collaboratively - to effect partial recursive self-improvement en route to taking it in hand independently, and inexorably.
I keep trying to replicate the failures, but I can't. GPT5-Thinking nails it.
Wing lift: https://chatgpt.com/share/689cf129-9a3c-8004-b25c-1d29558e7769
Boysenberry: https://chatgpt.com/share/689cf17c-188c-8004-a22f-7e3e49c3975b
GPT5-Fast agreed that bread is improving and basically said I was wrong about it getting worse.
Is it because of custom instructions or what?
The hallucination/error rate reduction in GPT5-Thinking feels like it has been underreported and appreciated. It is a dramatic improvement on the standard for models before. The level of reliability feels crazy in my usage so far, especially compared to o3 which was basically full of shit 10-20% of the time. I feel more confident on its capacity to handle business level tasks than I have with any other model, including Opus 4.
I was thinking they were running out of data to train on and synthetic data has its limits so GPT-5 is just a product to raise the floor to get more data for the Big Boi
Everyone's stuck in a box with their thinking on this. It's not a new model! it's just the old model RL'ed more. It's not a good model, not a bad model, it's the same model with some minor fine tuning & twitter threadbois are seeing ghosts trying to distinguish it from the old
“and even if it did happen the resulting underclass probably does damn well compared to today”
This ignores the fact that happiness is experienced relative to expectations, not in absolute terms.
By definition, a underclass has low quality of life, because QoL is always considered relative to other people today, not years ago. If that was not the case, then all 70 year olds would be deliriously happy all the time, based on how much progress has been made over their lifetimes.
I heard "do the dumb things you gotta do" on the podcast version of this post, and thought "could it be that Zvi likes TMBG?!"
Perhaps a new headline for your 4o sycophancy discussions- "I'm your only friend, I'm not your only friend, but I'm a little glowing friend, but really I'm not actually your friend...but I am"
"don't let's start" for building superintelligence? I could go on
Please try Claude Code, even if it's just for something small. Nothing else is quite like it right now, and you really need to use it to understand why.
Is this a useful framing? Code works by using many parallel calls to smaller models to build high quality context for queries to more capable models. It's not magic but it does qualitatively different things than tab completion in the editor, or using more thinking.
There’s merit to product design. Just because the underlying model isn’t special doesn’t mean the product isn’t. Claude Code is useful as a productivity tool in a way that almost nothing else in the AI space is.
I fully agree that Code is special, but I'm not sure handwaving about its specialness is useful in this particular venue.
Excellent post!
My thoughts:
Mass adoption and brand dominance. They aim to reach 3 billion users and make "OpenAI" synonymous with "AI". Since this is the first time the general public has access to a state-of-the-art frontier model, they just might succeed.
Monetizing developer tools. They’re eyeing significant revenue from coding. In response to Anthropic morphing Claude Code into a blunt instrument, OpenAI partnered with Anysphere. My guess is that OpenAI has made specific commitments regarding collaboration in the coding space with players like Anysphere, Lovable, and others.
Building goodwill through healthcare. Their focus on medical applications suggests they may have secured deals with major medical publishers—perhaps with exclusive rights. Such partnerships would amplify the tool’s value for patients and healthcare consumers.
Look, this isn't the model WE wanted, but it might be the right model at this time for OpenAI.
> You should worry more about not surviving or humanity not remaining in control, or your place in the social and economic order if transformational AI does not arrive soon, and less about your place relative to other humans in positive post-AI worlds.
Expected decades of regret are somehow more salient tho. Maybe because dying somehow fails to propagate emotionally.
That this is all they could manage (and there are some nice improvements as you note) i think is suggestive that their approach is in fact hitting walls of some sort, that the texture of future progress is not just a smooth slope you can simply throw more resources at to crest.
i think it is quite muddy, has always been muddy, and like most totalizing pitches for funding they obscured this muddiness, most likely to themselves as well.
I feel like this pt (or some more sophisticated variant) is missing a bit from your post. Maybe it is just too muddy what it will do for the research side? But i have a hard time imaging what problem it can grind on that yields continual gains from anything i have read. But there is a huge hole in my deeper understanding here so hard to feel secure in that pt.