I think calling it misleading is unfair. It's using log scale and it's very explicit about it. Not just the axis labels but also the legend that says what the lines are calls one "Exponential" and the other "Superexponential". I don't support the idea that plots aren't allowed to use log scale.
To be super clear, I in no way think it was intentionally misleading. But if you don’t look at the axes carefully, it really does look like it’s comparing linear versus exponential. Hell, even Zvi made that mistake. And importantly, if your goal is clear communication of complex ideas with a large audience, then I believe it is the author’s job to make graphics that don’t require close examination to get the point.
What am I missing in those first two graphs? Assuming that Gemini 3 and Opus 4.5 are better than 5.1 Codex Max this looks quite a lot like fast take-off??!!
If the line we seem to be on bent logarithmic and we got diminishing returns in leui of a continuous learning breakthrough (which the creator of the Transformer algo claims to have nailed and published a recent paper about) we'd see the white knuckle capex warfare in the industry collapse on itself as marginal, non-bitter-lesson-pilled improvements in efficiency and marketing spend proceeded to extract corporate revenue displacing some jobs but not perhaps 20% of jobs immediately. It'd be all within the realm of typical human travails.
What is with the vertical axis of the first graph? Mostly around 4x increases then a 3x increase, then back to around 4. I get that it's annoying to have consistent 4x if you want it to fit in with our measuring of time, but you either use annoying numbers or change your lines, right?
You mean that random 21x increase where it goes from 8 hours to 1 week? Wow, weird.
Oh, maybe since this is talking about task time for humans, the assumption is that "I worked on it for a week" actually means "I worked on it for ~32 hours".
Yeah, look, I don't want to admit this, but my memory was that it went from 8 hours to a day then something (I don't know what I was thinking here) got it to a week. That's clearly not what's happening, the 21x increase is wild, and I guess through luck, my initial point of wtaf is happening with the scale still stands. I guess the 8 hours to a 32ish hour work week solve it, because then everything above that is a work month.
OK, I looked it up and AI 2027 says: "All time amounts in the timelines forecast are in "work time", i.e. the amount of time that a human would work during that time period: for example, a work week is 40 hours and a work year is 2,000 hours."
So you're basically right about human task time. And it's a 5x rather than 4x. I guess we start looking at the effect this has in mid-next year, depending on the trajectory.
"If our projections hold, we expect further OpenAI development in the next 6 months is unlikely to pose catastrophic risk via automated AI R&D or rogue autonomy."
I contrast this with a line from Tom Lehrer's 1965 song "Who's Next":
"China got the bomb but have no fears; they can't wipe us out for at least five years"
Zvi or anyone else reading. Do you know or have contacts with search engineers at Google who’ve caused major server side issues. Are you either able to talk to them and ask them to talk to me or raise the issues on your blog? This would be helpful
"That’s in between the two lines, looking closer to linear progress."
The lower line is exponential, not linear.
Oof good catch. It’s a misleading plot, as Edward Tufte might say.
I think calling it misleading is unfair. It's using log scale and it's very explicit about it. Not just the axis labels but also the legend that says what the lines are calls one "Exponential" and the other "Superexponential". I don't support the idea that plots aren't allowed to use log scale.
To be super clear, I in no way think it was intentionally misleading. But if you don’t look at the axes carefully, it really does look like it’s comparing linear versus exponential. Hell, even Zvi made that mistake. And importantly, if your goal is clear communication of complex ideas with a large audience, then I believe it is the author’s job to make graphics that don’t require close examination to get the point.
I see where you're coming from. What do you think they could have done better?
Podcast eisode for this post:
https://open.substack.com/pub/dwatvpodcast/p/chatgpt-51-codex-max
What am I missing in those first two graphs? Assuming that Gemini 3 and Opus 4.5 are better than 5.1 Codex Max this looks quite a lot like fast take-off??!!
Big assumption
True, but early noises are compatible with that
https://manifold.markets/BenAybar/opus-45s-metr-time-horizon-beats-ge?utm_source=substack&utm_medium=email
https://manifold.markets/BenAybar/will-gpt51-have-a-longer-metr-task
Hmm interesting, exciting if true
If the line we seem to be on bent logarithmic and we got diminishing returns in leui of a continuous learning breakthrough (which the creator of the Transformer algo claims to have nailed and published a recent paper about) we'd see the white knuckle capex warfare in the industry collapse on itself as marginal, non-bitter-lesson-pilled improvements in efficiency and marketing spend proceeded to extract corporate revenue displacing some jobs but not perhaps 20% of jobs immediately. It'd be all within the realm of typical human travails.
Nice to fantasize about.
What is with the vertical axis of the first graph? Mostly around 4x increases then a 3x increase, then back to around 4. I get that it's annoying to have consistent 4x if you want it to fit in with our measuring of time, but you either use annoying numbers or change your lines, right?
You mean that random 21x increase where it goes from 8 hours to 1 week? Wow, weird.
Oh, maybe since this is talking about task time for humans, the assumption is that "I worked on it for a week" actually means "I worked on it for ~32 hours".
Yeah, look, I don't want to admit this, but my memory was that it went from 8 hours to a day then something (I don't know what I was thinking here) got it to a week. That's clearly not what's happening, the 21x increase is wild, and I guess through luck, my initial point of wtaf is happening with the scale still stands. I guess the 8 hours to a 32ish hour work week solve it, because then everything above that is a work month.
OK, I looked it up and AI 2027 says: "All time amounts in the timelines forecast are in "work time", i.e. the amount of time that a human would work during that time period: for example, a work week is 40 hours and a work year is 2,000 hours."
So you're basically right about human task time. And it's a 5x rather than 4x. I guess we start looking at the effect this has in mid-next year, depending on the trajectory.
Re Metr's thread:
I'm morbidly amused by:
"If our projections hold, we expect further OpenAI development in the next 6 months is unlikely to pose catastrophic risk via automated AI R&D or rogue autonomy."
I contrast this with a line from Tom Lehrer's 1965 song "Who's Next":
"China got the bomb but have no fears; they can't wipe us out for at least five years"
Events move faster these days!
Zvi or anyone else reading. Do you know or have contacts with search engineers at Google who’ve caused major server side issues. Are you either able to talk to them and ask them to talk to me or raise the issues on your blog? This would be helpful
Great write up, ty! Note that Claude 4.5 Opus graph doesn't include the just behind 5.1 Codex Max. The numbers are still confusing.