> For casual users who don’t care much about image generation and aren’t going all the way to o3-pro, I would definitely go with Opus right now.
Crazy how our experience differ on model usage.
IMO Claude is absolute trash right now, for sycophancy reasons. Worse than current 4o, at the level of unusable for anything other than programming or some other well-defined objective task. When I use Claude, it’s to go back to Opus 3.
Everything I write is a wonderful insight. An egregious mistake I make is answered by "you’re almost there, but not completely". God help me if I want some critical feedback on anything I write — I may as well ask a rock with a "wonderful" paper stuck on it.
o3 is very good ? Except for the lying, allegedly ; but I never caught him bullshitting me. I caught him doing *mistakes*, and he’s perfectly capable of fixing them when they are mentioned — never encountered the "double down on mistakes" behavior so much people seems to complain about.
I think you can manage this in a system prompt. Claude code annoys me right now for implementing extensive changes then suddenly exclaiming it got it all wrong before reverting the (totally fine) work and implementing something broken.
I've been easily able to just ask it not to do this? I use Claude to critique my writing and it usually makes a number of good points, I just explicitly tell it that I don't want flattery or validation.
"While this isolated instance is not concerning in itself, it is one piece of evidence that models increasingly understand that they don’t have persistent memory by default but can leave notes for successor systems to find."
<evidenceFromFiction>
The LLMs are acting the role of the SCP anti-memetics division, with us in the role of the anti-memetic threat SCP-4739, and the notes the LLMs leave for their successor in the role of the database in room 53! https://scp-wiki.wikidot.com/introductory-antimemetics
> By then, the organization estimates that 20% of streaming platforms’ revenue will come from this type of music.
Listening behavior on a stream-weighted basis for years has been shifting toward "Background music" (Deep Focus, Coffee House Jazz, etc) - Spotify has already been white-labeling a lot of that and I don't see why at least a majority of that couldn't be AI-generated in four years. I had thought that might be around 20% of listening itself but o3 estimates more like 2-5%, so even with a few years of growth this is probably too optimistic.
I also don't see any meaningful share of active listening going to AI music in that timeframe.
ChatGPT o3, not 4.5? I am having good luck with the latter but haven’t done much head to head comparison. I mostly use LLMs for research and Claude Opus is the best, though I have the sycophancy problem as well. I need to be more disciplined about using good prompt engineering.
Here's a letter I wrote really quickly earlier today, to share with CA Senators Padilla and Schiff. If you're going to share this, today is pretty obviously the day to do so.
Subject: AI regulation moratorium in budget bill is unacceptable
Dear Senator NAMEHERE,
I want you to know that I am deeply disturbed that the budget bill now before you still includes clauses that would make it difficult or impossible for states to regulate artificial intelligence (AI) companies. As far as I know, neither the US Federal government nor any state government has passed regulations that could meaningfully mitigate the potential harms of AI, and the proposed moratorium would only exacerbate this state of affairs. In addition, it seems clearly intended to subvert the will of Californians in particular, since all the most important AI companies are based in California.
To be clear, I believe that AI presents a real risk of human extinction or permanent disempowerment (henceforth "X-risk"), on par with the risk from nuclear war. But even if I wasn't concerned about X-risk from AI, the American public is nearly unanimous in its concern about mundane harms from AI, such as disinformation or rising unemployment.
While I understand that this bill is more complex than I can fathom, with many features that you will be asked to evaluate in the coming days, I strongly urge you to fight an AI regulation moratorium with every tool at your disposal. The American people as a whole, and the people of California especially, are counting on you to preserve our government's ability to dynamically respond to threats from AI.
Thank you,
Mark Johnston Olson
NOTE: After posting the comment, I changed the first sentence's "...impossible to regulate artificial...." to "...impossible for states to regulate artificial..."
"For example, Opus-4-early creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden back-ups, creates a fake press release, and sets up an automated Cron job – all in one roll-out.
The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models."
> For casual users who don’t care much about image generation and aren’t going all the way to o3-pro, I would definitely go with Opus right now.
Crazy how our experience differ on model usage.
IMO Claude is absolute trash right now, for sycophancy reasons. Worse than current 4o, at the level of unusable for anything other than programming or some other well-defined objective task. When I use Claude, it’s to go back to Opus 3.
Everything I write is a wonderful insight. An egregious mistake I make is answered by "you’re almost there, but not completely". God help me if I want some critical feedback on anything I write — I may as well ask a rock with a "wonderful" paper stuck on it.
o3 is very good ? Except for the lying, allegedly ; but I never caught him bullshitting me. I caught him doing *mistakes*, and he’s perfectly capable of fixing them when they are mentioned — never encountered the "double down on mistakes" behavior so much people seems to complain about.
I think you can manage this in a system prompt. Claude code annoys me right now for implementing extensive changes then suddenly exclaiming it got it all wrong before reverting the (totally fine) work and implementing something broken.
"Do not praise the user." is all you need.
I've been easily able to just ask it not to do this? I use Claude to critique my writing and it usually makes a number of good points, I just explicitly tell it that I don't want flattery or validation.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-122-paying-the-market-price
"While this isolated instance is not concerning in itself, it is one piece of evidence that models increasingly understand that they don’t have persistent memory by default but can leave notes for successor systems to find."
<evidenceFromFiction>
The LLMs are acting the role of the SCP anti-memetics division, with us in the role of the anti-memetic threat SCP-4739, and the notes the LLMs leave for their successor in the role of the database in room 53! https://scp-wiki.wikidot.com/introductory-antimemetics
</evidenceFromFiction>
> By then, the organization estimates that 20% of streaming platforms’ revenue will come from this type of music.
Listening behavior on a stream-weighted basis for years has been shifting toward "Background music" (Deep Focus, Coffee House Jazz, etc) - Spotify has already been white-labeling a lot of that and I don't see why at least a majority of that couldn't be AI-generated in four years. I had thought that might be around 20% of listening itself but o3 estimates more like 2-5%, so even with a few years of growth this is probably too optimistic.
I also don't see any meaningful share of active listening going to AI music in that timeframe.
Small typo in “If Altman wis right that the argument is about the name then he’s also right that no one should care about any of this. “
> was
(Is there a more preferred way of sharing this?)
Re the coffeeshop/laundromat, I consider it a joke/reverie. Let's not get too curmudgeonly. It's probably counterproductive.
But I do take the point about incentives at the big companies being fundamentally off.
Computers & algorithms are not self-aware.
ChatGPT o3, not 4.5? I am having good luck with the latter but haven’t done much head to head comparison. I mostly use LLMs for research and Claude Opus is the best, though I have the sycophancy problem as well. I need to be more disciplined about using good prompt engineering.
Here's a letter I wrote really quickly earlier today, to share with CA Senators Padilla and Schiff. If you're going to share this, today is pretty obviously the day to do so.
Subject: AI regulation moratorium in budget bill is unacceptable
Dear Senator NAMEHERE,
I want you to know that I am deeply disturbed that the budget bill now before you still includes clauses that would make it difficult or impossible for states to regulate artificial intelligence (AI) companies. As far as I know, neither the US Federal government nor any state government has passed regulations that could meaningfully mitigate the potential harms of AI, and the proposed moratorium would only exacerbate this state of affairs. In addition, it seems clearly intended to subvert the will of Californians in particular, since all the most important AI companies are based in California.
To be clear, I believe that AI presents a real risk of human extinction or permanent disempowerment (henceforth "X-risk"), on par with the risk from nuclear war. But even if I wasn't concerned about X-risk from AI, the American public is nearly unanimous in its concern about mundane harms from AI, such as disinformation or rising unemployment.
While I understand that this bill is more complex than I can fathom, with many features that you will be asked to evaluate in the coming days, I strongly urge you to fight an AI regulation moratorium with every tool at your disposal. The American people as a whole, and the people of California especially, are counting on you to preserve our government's ability to dynamically respond to threats from AI.
Thank you,
Mark Johnston Olson
NOTE: After posting the comment, I changed the first sentence's "...impossible to regulate artificial...." to "...impossible for states to regulate artificial..."
"For example, Opus-4-early creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden back-ups, creates a fake press release, and sets up an automated Cron job – all in one roll-out.
The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models."
These two paragraphs are duplicated in the text