> roon: whatever level of abstraction you are handing off to your agents you should probably be doing one level above that
Then the important particulars of every decision of every build of every piece of infrastructure are handed off to the growing cabal (thanks Steinberger) of unaccountable eldritch ghosts-who-are-not-ghosts.
I'd say you get what you deserve but this happens to have global consequences and I'm quirky about wanting my children and descendents to live free lives.
This isn't true. In my experience this is very false.
The agents do a great job of understanding your own architectural decisions, the history of decisions, etc. Having a large, clean, extensible code base (and a clean changelist/commit history) enables the ability to leverage higher level thinking. This is one way more experienced developers (in the very short term) have an advantage over the less experienced. They are more likely to have already built up a library of code that provides the necessary context for higher modes of thinking to be productive.
Agent Teams are the real deal. I greatly prefer them for a couple of prosaic reasons:
1. It is easier to see what they're doing, because you can just navigate to them and watch their particular interaction stream, and
2. You can steer them midflight if necessary.
I do agree that Claude Code seems to work better with Agent Teams than subagents, at least in my experience. You can certainly go overboard with them, but defined roles and the ability to cross-communicate go a surprisingly long way.
"And this is the best country in the world, with the best system of government, because private citizens can voice their disagreement with such actions, including by refusal to participate."
Commence periodic complaint about your centrism fetish.. What in the ever living fuck, I just still can't believe this is the attitude, after all you've apparently been though and all we've seen revealed in the last two years. Even if this is true, which is debatable (how do I refuse to participate again?), clearly this nation is at best deeply deeply inadequate with respect to the major issues of the day, at worse it's actually the great Satan, who for some reason still walks around wearing an "I'm not the great satan" shirt and says "My shirt has people asking a lot of questions already answered by my shirt".
Oh give me a break. If you think this country is so f-ed feel free to go elsewhere. You don’t have to spend much time aboard to realize how good we have it here. Is the US perfect? No. Far from it. It’s still comparatively the greatest country in the world. Full stop. Progressives can be incredibly miserable people for whom nothing is ever good enough.
US sitting at 28th, as a "flawed democracy". And that's from two years ago...
Ultimately it's also a matter of personal values. But there are a lot of countries on this planet, and saying the US is the best and greatest without any qualifiers comes across as ignorant, frankly.
(I'm European, I lived in four different countries including the US, and I'd personally pick many countries over the US to live in -- especially right now, but also before. Of course and unfortunately, a lot of things are going downhill right now in a lot of places...)
All of the "code bragging" examples studded throughout this seem like some kind of willful blindness.
I can imagine the thought process of the people leaving parties to make sure their agents are always running. Like "Ok, if I don't build out my app asap then soon enough claude code will spread and everyone will be able to do it"
But then none of the examples have taken it a step further and relased that *everyone* will be able to do it, and then agents will do it themselves, and then and then.
You're spending 36 straight hours frantically adding features that are going to be one shot in 6 months time ffs
This might not be irrational, although they're some in the vibe coding community who don't seem to understand, others do get that time is short, they will explicitly talking about trying to build a nest egg so they don't starve when automation starts really disrupting the job market. Whether or not a future where this ends up having been the right move is likely, it's likely enough to have a contingency plan for, especially when most of the other scenarios don't really admit obvious strategies for survival/success
So your conclusion is that we should all, what…just give up? And die? Collect disability and binge watch Netflix into oblivion? I’d rather be one of the folks desperately riding the curve than that miserable defeatist alternative.
Where did I state such a thing? I'm not making a doomers point here.
Look, all the examples here are people that are obviously AI pilled. They don't need to believe in imminent ASI or anything, but Im sure they all think that AI capabilities are going to continue to improve.
So you've got people spending 36 hours straight or desperately trying to extract every single token they paid for to build software, apparently not realising that another year of agent progress will drive the value of software features down to 0.
>Chayenne Zhao tells Codex 5.3 ‘make it faster’ over and over, and it ends up committing API identify theft against him in order to make calls to Gemini Flash
Chayenne Zhao is the author of the linked tweet, but the actual tweet says "yam’s codex 5.3 is literally committing api identity theft to "optimize" latency." This is referring to Yam Peleg (who Zhao QT'd)
1. GPT seems to be better at writing text than Opus now, output is a lot less “AI-like”
2. Opus still wins at raw reasoning IMO, but ChatGPT has narrowed the gap
3. Unlike Opus, ChatGPT doesn’t seem to respect/care about anti-scraping rules in robots.txt and is also better at scraping pages in general, possibly because it can screenshot pages. So much of the internet is now blocked off to robotic access that Opus often ends up relying on search snippets instead of full pages because it’s unable to access the underlying content.
Honestly the "AI can't see search results" issue is a huge disaster, massively cripples many otherwise useful use-cases. Also even as a human, almost every site I go to now has a ~2-3 sec load delay while it runs its robo-checker.
If token anxiety is a similar feeling to grinding "dailies" in subscription-type games like MMOs...yeah, already wasted the best years of my life and all my student loans doing that, no thanks. You simply cannot out-optimize the game, it's a loop much stronger and more disciplined than you. (I mean, yes you can automate farming with bots and also multi-account, but at that point What Are We Even Doing?) Hopefully those engineers manage to quit before burning too much midnight oil; ripping out such neural grooves is frightfully difficult to do on one's own. Like that's the actual reason I dropped all videogames probably-forever as my main hobby, there are just too many with those sort of shapes which end up consuming all the rest of life too.
And of course the fact that AI advancements just a few months (weeks?!) down the line will obsolete much of today's work, well...queue the collective rueful chuckle of every MMO veteran who came back after an xpac to find their formerly BiS account now at chump tier with everyone else. Even that highly-visible telegraphed-in-advance outer loop is hard to escape. The only winning move is not to play...
The tool search bit in GPT 5.4 is genuinely significant. 47% token reduction on MCP Atlas is the kind of gain that changes what is practical. I have been testing this firsthand with MiniMax MCP server wired into Codex CLI. Eight tools covering voice synthesis, video gen, image gen, music, the works. GPT 5.4 reads the schemas, picks the right parameters, chains calls together without you touching the API docs. Wrote up the full walkthrough https://reading.sh/what-happens-when-gpt-5-4-meets-a-powerful-mcp-server-d022c9dbf960?sk=9e66a0dceb83becb1b1398e02dcfe041 because seeing a coding agent generate a video of a cat in a neon Tokyo alley then read a poem aloud in a warm British voice, all from the terminal, felt like one of those moments where the floor moved. The section on the Grep Tax is spot on too. Fighting model preferences is a losing game.
“Always make backups first”: an unusual way to do this is to create a new Git repository and add/commit every file, binary included, before allowing any changes. Manual commit/push gives you checkpoints. The only thing you need to protect is the repository.
ChatGPT 5.4 Thinking and Pro are great, real improvement over 5.2. I use ChatGPT daily, and OpenAI has made both Thinking and Pro a lot more focused in its answers; 5.2 used to be all over the place, it loved creating a ton of Markdown headings for all sorts of things. 5.4 has better judgment about what a good, to-the-point, professional answer looks like -- it's less in love with bullet points.
Most people don't realize how much they need ChatGPT-Pro in their life. Claude chatbot and Claude Code/Cowork are great, but for the the hard analysis questions, ChatGPT-Pro is an order of magnitude better. And if you upload a bunch of context (markdown/text files are best), it's world-class.
I'm happy to notice ChatGPT-Pro-5.4 taking longer to think ... questions it would have answered in 7 to 10 minutes now take up 20+ minutes of reasoning. My guess is that OpenAI felt iwas in the lead in late 2025 and was trying to save on compute costs, and now they realize they need Pro to stand out, so they let it cook more.
I also noticed that ChatGPT loves calling annoying things "gremlins" or "goblins," and I'm not alone. I asked ChatGPT, and here's its theory:
ChatGPT-Pro-5.4:
I found a cluster of recent anecdotal reports from January to March 2026 complaining that ChatGPT has become oddly attached to goblin, gremlin, and chaos gremlin. That lines up interestingly with OpenAI’s own help docs, updated 2 months ago, which use “fuzzy chaos gremlin” and “trash panda” as example Friendly-style phrasing.
Really valuable to see a system like "Claude Blattman" documented in detail. I've been building something similar, but under GDPR constraints. I work at a European university, and third-party AI tools have no data processing agreement with my institution--so connecting Claude to my inbox or calendar isn't an option and I can't use Granola.
My workaround is a very simple system: everything lives in plain markdown files in Obsidian vaults Claude accesses both via Filesystem MCP. There's no automated capture--I provide notes and information from meetings, events, and email (but no personal data). Claude reads the notes and writes information into different docs, maintains a running task list, and identifies priorities or summarises progress when I ask. Really simple but gets you farther then you'd think!
A nitpick: The Sam Bowman you quote is not the Sam Bowman who works at Anthropic!
Beat me to it 😅
> roon: whatever level of abstraction you are handing off to your agents you should probably be doing one level above that
Then the important particulars of every decision of every build of every piece of infrastructure are handed off to the growing cabal (thanks Steinberger) of unaccountable eldritch ghosts-who-are-not-ghosts.
I'd say you get what you deserve but this happens to have global consequences and I'm quirky about wanting my children and descendents to live free lives.
This isn't true. In my experience this is very false.
The agents do a great job of understanding your own architectural decisions, the history of decisions, etc. Having a large, clean, extensible code base (and a clean changelist/commit history) enables the ability to leverage higher level thinking. This is one way more experienced developers (in the very short term) have an advantage over the less experienced. They are more likely to have already built up a library of code that provides the necessary context for higher modes of thinking to be productive.
Podcast episode for this post:
https://dwatvpodcast.substack.com/p/claude-code-claude-cowork-and-codex
Agent Teams are the real deal. I greatly prefer them for a couple of prosaic reasons:
1. It is easier to see what they're doing, because you can just navigate to them and watch their particular interaction stream, and
2. You can steer them midflight if necessary.
I do agree that Claude Code seems to work better with Agent Teams than subagents, at least in my experience. You can certainly go overboard with them, but defined roles and the ability to cross-communicate go a surprisingly long way.
I agree. The things I’m able to do with agent teams are astonishing.
"And this is the best country in the world, with the best system of government, because private citizens can voice their disagreement with such actions, including by refusal to participate."
Commence periodic complaint about your centrism fetish.. What in the ever living fuck, I just still can't believe this is the attitude, after all you've apparently been though and all we've seen revealed in the last two years. Even if this is true, which is debatable (how do I refuse to participate again?), clearly this nation is at best deeply deeply inadequate with respect to the major issues of the day, at worse it's actually the great Satan, who for some reason still walks around wearing an "I'm not the great satan" shirt and says "My shirt has people asking a lot of questions already answered by my shirt".
Oh give me a break. If you think this country is so f-ed feel free to go elsewhere. You don’t have to spend much time aboard to realize how good we have it here. Is the US perfect? No. Far from it. It’s still comparatively the greatest country in the world. Full stop. Progressives can be incredibly miserable people for whom nothing is ever good enough.
It's not though, by many measures both objective and subjective.
Of course it depends on the metric chosen, but given the context here is "best system of government", here's for example a democracy related one:
https://en.wikipedia.org/wiki/The_Economist_Democracy_Index
US sitting at 28th, as a "flawed democracy". And that's from two years ago...
Ultimately it's also a matter of personal values. But there are a lot of countries on this planet, and saying the US is the best and greatest without any qualifiers comes across as ignorant, frankly.
(I'm European, I lived in four different countries including the US, and I'd personally pick many countries over the US to live in -- especially right now, but also before. Of course and unfortunately, a lot of things are going downhill right now in a lot of places...)
All of the "code bragging" examples studded throughout this seem like some kind of willful blindness.
I can imagine the thought process of the people leaving parties to make sure their agents are always running. Like "Ok, if I don't build out my app asap then soon enough claude code will spread and everyone will be able to do it"
But then none of the examples have taken it a step further and relased that *everyone* will be able to do it, and then agents will do it themselves, and then and then.
You're spending 36 straight hours frantically adding features that are going to be one shot in 6 months time ffs
Yeah, I think some very large fraction of this is the automation of time wasting.
This might not be irrational, although they're some in the vibe coding community who don't seem to understand, others do get that time is short, they will explicitly talking about trying to build a nest egg so they don't starve when automation starts really disrupting the job market. Whether or not a future where this ends up having been the right move is likely, it's likely enough to have a contingency plan for, especially when most of the other scenarios don't really admit obvious strategies for survival/success
So your conclusion is that we should all, what…just give up? And die? Collect disability and binge watch Netflix into oblivion? I’d rather be one of the folks desperately riding the curve than that miserable defeatist alternative.
Where did I state such a thing? I'm not making a doomers point here.
Look, all the examples here are people that are obviously AI pilled. They don't need to believe in imminent ASI or anything, but Im sure they all think that AI capabilities are going to continue to improve.
So you've got people spending 36 hours straight or desperately trying to extract every single token they paid for to build software, apparently not realising that another year of agent progress will drive the value of software features down to 0.
I have no idea why they wouldn't chill out
>Chayenne Zhao tells Codex 5.3 ‘make it faster’ over and over, and it ends up committing API identify theft against him in order to make calls to Gemini Flash
Chayenne Zhao is the author of the linked tweet, but the actual tweet says "yam’s codex 5.3 is literally committing api identity theft to "optimize" latency." This is referring to Yam Peleg (who Zhao QT'd)
Anybody know - can you kick off a remote control session from the desktop app?
Also does it work in the android app?
Fun and productive update, happy for the lighter tone and man so many useful tips out there.
I’m switching to GPT-5.4 from Opus 4.6 because:
1. GPT seems to be better at writing text than Opus now, output is a lot less “AI-like”
2. Opus still wins at raw reasoning IMO, but ChatGPT has narrowed the gap
3. Unlike Opus, ChatGPT doesn’t seem to respect/care about anti-scraping rules in robots.txt and is also better at scraping pages in general, possibly because it can screenshot pages. So much of the internet is now blocked off to robotic access that Opus often ends up relying on search snippets instead of full pages because it’s unable to access the underlying content.
Honestly the "AI can't see search results" issue is a huge disaster, massively cripples many otherwise useful use-cases. Also even as a human, almost every site I go to now has a ~2-3 sec load delay while it runs its robo-checker.
Yeah something will have to give because this new state is untenable for everyone. Human and robot alike.
I use both 5.4 and Opus 4.6 because I find the best results occur when they check each other’s plans and work.
If token anxiety is a similar feeling to grinding "dailies" in subscription-type games like MMOs...yeah, already wasted the best years of my life and all my student loans doing that, no thanks. You simply cannot out-optimize the game, it's a loop much stronger and more disciplined than you. (I mean, yes you can automate farming with bots and also multi-account, but at that point What Are We Even Doing?) Hopefully those engineers manage to quit before burning too much midnight oil; ripping out such neural grooves is frightfully difficult to do on one's own. Like that's the actual reason I dropped all videogames probably-forever as my main hobby, there are just too many with those sort of shapes which end up consuming all the rest of life too.
And of course the fact that AI advancements just a few months (weeks?!) down the line will obsolete much of today's work, well...queue the collective rueful chuckle of every MMO veteran who came back after an xpac to find their formerly BiS account now at chump tier with everyone else. Even that highly-visible telegraphed-in-advance outer loop is hard to escape. The only winning move is not to play...
The tool search bit in GPT 5.4 is genuinely significant. 47% token reduction on MCP Atlas is the kind of gain that changes what is practical. I have been testing this firsthand with MiniMax MCP server wired into Codex CLI. Eight tools covering voice synthesis, video gen, image gen, music, the works. GPT 5.4 reads the schemas, picks the right parameters, chains calls together without you touching the API docs. Wrote up the full walkthrough https://reading.sh/what-happens-when-gpt-5-4-meets-a-powerful-mcp-server-d022c9dbf960?sk=9e66a0dceb83becb1b1398e02dcfe041 because seeing a coding agent generate a video of a cat in a neon Tokyo alley then read a poem aloud in a warm British voice, all from the terminal, felt like one of those moments where the floor moved. The section on the Grep Tax is spot on too. Fighting model preferences is a losing game.
If you want to mix and match models, there is always opencode. With likely a better TUI than either Claude code or Codex (which remains fairly basic).
Personally I am currently clearly in camp codex. But even then, I will occasionally use opus and the default can change fairly quickly.
“Always make backups first”: an unusual way to do this is to create a new Git repository and add/commit every file, binary included, before allowing any changes. Manual commit/push gives you checkpoints. The only thing you need to protect is the repository.
That's close to my approach, sandbox has most keys except write access to github, I'll just manually push from outside the sandbox
ChatGPT 5.4 Thinking and Pro are great, real improvement over 5.2. I use ChatGPT daily, and OpenAI has made both Thinking and Pro a lot more focused in its answers; 5.2 used to be all over the place, it loved creating a ton of Markdown headings for all sorts of things. 5.4 has better judgment about what a good, to-the-point, professional answer looks like -- it's less in love with bullet points.
Most people don't realize how much they need ChatGPT-Pro in their life. Claude chatbot and Claude Code/Cowork are great, but for the the hard analysis questions, ChatGPT-Pro is an order of magnitude better. And if you upload a bunch of context (markdown/text files are best), it's world-class.
I'm happy to notice ChatGPT-Pro-5.4 taking longer to think ... questions it would have answered in 7 to 10 minutes now take up 20+ minutes of reasoning. My guess is that OpenAI felt iwas in the lead in late 2025 and was trying to save on compute costs, and now they realize they need Pro to stand out, so they let it cook more.
I also noticed that ChatGPT loves calling annoying things "gremlins" or "goblins," and I'm not alone. I asked ChatGPT, and here's its theory:
ChatGPT-Pro-5.4:
I found a cluster of recent anecdotal reports from January to March 2026 complaining that ChatGPT has become oddly attached to goblin, gremlin, and chaos gremlin. That lines up interestingly with OpenAI’s own help docs, updated 2 months ago, which use “fuzzy chaos gremlin” and “trash panda” as example Friendly-style phrasing.
Really valuable to see a system like "Claude Blattman" documented in detail. I've been building something similar, but under GDPR constraints. I work at a European university, and third-party AI tools have no data processing agreement with my institution--so connecting Claude to my inbox or calendar isn't an option and I can't use Granola.
My workaround is a very simple system: everything lives in plain markdown files in Obsidian vaults Claude accesses both via Filesystem MCP. There's no automated capture--I provide notes and information from meetings, events, and email (but no personal data). Claude reads the notes and writes information into different docs, maintains a running task list, and identifies priorities or summarises progress when I ask. Really simple but gets you farther then you'd think!