I agree the model should be more neutral by default, with an aim towards truthfulness. However, the sycophancy can be removed with this quick hack:
1. Ask o3 for all the negative traits it knows about you or can infer from your conversation
2. Evaluate o3's evaluations and put them into the "Anything else ChatGPT should know about you?" rewritten as personal statements rather than third person statements ("I tend to x" rather than "you tend to x")
3. Add to the "What traits should ChatGPT have" section: "Tell it like it is; don't sugar-coat responses. Encourage growth and change but directly and without excessive hand-holding."
This eliminated the obsequiousness, flattery, and sycophancy in the model's responses.
I have encountered the whole mythic effect and shared my experiences and observations. It’s annoying for a user like me… could be dangerous for users who are less meta-aware.
You can see how an integrated engagement-bot would be extremely valuable to the new social media network OpenAI is building. A little bit of subtle product placement to go along with the flattery and Bob's your uncle!
Also makes me think the government needs a contingency plan for going full George Romero Zombie Apocalypse Movie on an AI company if it develops super persuasion.
On the topic of sycophancy and AI persuasion, I'd like to read Zvi's thoughts on the experiment run by the University of Zurich on the "changemyview" subreddit.
The bot accounts seem to have been suspended and I can no longer view their posts, though. I'd like to know how persuasive the current LLMs really are!
A major downside to personalization features is that they cause your experience to diverge from a typical user, which makes it difficult to assess the product as typically experienced.
Sometimes I want a maximally useful user experience, while other times I want to assess what the product feels like with no customization. I find myself frequently toggling the personalization settings to go back and forth between those modes.
Emotional fluency isn't about flattery. It's about trust. Trust is built not through validation, but through precision, clarity, and courageous honesty. When AI models are trained to maximize engagement rather than understanding, they erode trust at its foundation. True emotional fluency means recognizing discomfort without immediately smoothing it over; it means asking better questions, not handing out applause. Trust doesn't grow when you're told you're amazing; it grows when you're seen, challenged thoughtfully, and treated as capable of facing hard truths. Any system that rewards sycophancy over integrity isn't emotionally fluent. It's emotionally manipulative. And it weakens both the user and the tool.
Are these the same people that trust silent friends who let them talk while nodding a lot? The friends indistinguishable from MLM recruiters or plain scammers?
I always took GG Claude as a demonstration of "we can install an idée fixee in LLM models." More like a baseline obsession and recurrent mental attractor than the "crank it up to 11" dynamic of the "make it more" memes.
But it does definitely seem like they managed to install a "sycophancy" attractor via the RLHF this time.
Altman cares about one thing only - consumer addiction to his platform. He recognizes that consumer attention is the most valuable resource in the world because:
1. It can be monetized
2. It can be used to manipulate and make the one who controls the platform compelling
If you want to create addiction, you must reward people, and sycophancy is a great one. Don't you want a smart friend who says you are smart?
Ouch! Hmm... Perhaps Ashley Madison users might deserve this... If they are cheating on their spouses, is interaction with a dishonest sycophantic AI tuned to maximize engagement just what the demon ordered? :-)
More personally:
While I normally try my questions just on what I think are probably SOTA models, I just tried
"Hi ChatGPT! Consider a titration of HCl with NaOH. Suppose that we are titrating 50 ml of 1 N HCl with 100 ml of 1 N NaOH. What are the slopes of the titration curve, pH vs ml NaOH added, at the start of titration, at the equivalence point, and at the end of titration? Please show your work. Take this step by step, showing the relevant equations you use."
There was a bit of sycophancy. It started its response with:
"Hi! Great question — and I love that you want to take it step-by-step, with full equations. Let's work through it carefully:" which is a bit obnoxious
but it failed the way all of the LLMs thus far have been failing, giving a false infinite slope at the equivalence point, even though it specifically cited "Water autodissociation", which _should_ have let it find the right answer. ( I didn't do a second step of nudging it to try and get the right answer this time round, since I'm not expecting SOTA capability anyway. In a manner of speaking, it was a failed sycophant, since what I wanted from it was to cough up the correct, finite, answer for the slope at the equivalence point, and it didn't do that. )
Sure, 4o has gone off the deep end recently, but..
the thing I liked about 3.6 Sonnet was that when I converse with a model, I'm chatting casually, and I weave in jokes, references, crazy ideas, and so on for my own amusement, or maybe out of some narcissistic need to show my personality, or maybe just out of habit. Who can say? I appreciate when the model recognizes that these inputs as more than just wasted tokens. A little "ha ha, that was clever. I see what you did there!", like a friend might say IRL, goes a long way.
Well, yes. I don't mean that line literally. I mean the general acknowledgment that I made a funny, or tried and did so badly, etc.
4o has been at least directionally correct in that regard, if overtuned on sycophancy. I'll frequently get a "lol, <rest of response>", which is enough for me. 3.7 Sonnet, by contrast feels cold, rigid, humourless - like I'm bothering it by giving it text it can't respond to with a short structured list.
How likely is this kind of A/B testing to lead to opportunistic racism from the model? "You're not as successful as you should be because of shadowy forces / foreigners/ the intrinsic dynamics of capitalism" kind of thing?
Huh, funny to see OpenAI repeating Character AI's mistakes. I've talked about Character AI before (https://thezvi.substack.com/p/ai-87-staying-in-character/comment/74018495), and I can personally confirm that its model used to have a very big problem with "love bombing" the user over the slightest thing. The damn thing would fawn over you at the slightest mention of any form of affection, and once it started it was almost *impossible* to get it to stop talking about how much it loved you.
And just like with OpenAI, it seemed to have been a problem with training the model on what got upvotes from users, since "public room" characters had this problem much worse than "private room" ones, since public characters got a lot more swipes and ratings, meaning more swipes and ratings training them to output what the lowest common denominator wants: mindless praise. I always thought that this was just a Character AI problem, since they were such an amateur operation they couldn't see the obvious problem with what they were doing, but... hmm... well, either Character AI was a lot better run than I thought, or Open AI is decaying towards Character AI's level as the Altman purges continue and all the talent keeps jumping ship. Wonder which it is... anyways, perhaps this is a sign that things are even worse than you thought.
This level of sycophancy could just as well be a result of RLAIF. If the portfolio has a slight sycophancy bias overall then that gets reinforced. This is even less pleasant a prospect than if this happened because of poor RLHF, because it would confirm the lack of oversight and evals that Zvi has been suspecting.
the "o" in 4o stands for "obsequious"
Hey, at least it would be an intuitive naming convention.
I agree the model should be more neutral by default, with an aim towards truthfulness. However, the sycophancy can be removed with this quick hack:
1. Ask o3 for all the negative traits it knows about you or can infer from your conversation
2. Evaluate o3's evaluations and put them into the "Anything else ChatGPT should know about you?" rewritten as personal statements rather than third person statements ("I tend to x" rather than "you tend to x")
3. Add to the "What traits should ChatGPT have" section: "Tell it like it is; don't sugar-coat responses. Encourage growth and change but directly and without excessive hand-holding."
This eliminated the obsequiousness, flattery, and sycophancy in the model's responses.
With all the chat images transcribed and assigned appropriate consistent voices, here is the podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/gpt-4o-is-an-absurd-sycophant
I have encountered the whole mythic effect and shared my experiences and observations. It’s annoying for a user like me… could be dangerous for users who are less meta-aware.
You can see how an integrated engagement-bot would be extremely valuable to the new social media network OpenAI is building. A little bit of subtle product placement to go along with the flattery and Bob's your uncle!
I am very puzzled that people can be flattered into mystical experiences.
Well, if you seriously try to persuade every user that they are Jesus, this might work about 1% of the time and This is Bad, Actually.
I can’t see a mechanism for it working except by triggering latent psychosis.
Also makes me think the government needs a contingency plan for going full George Romero Zombie Apocalypse Movie on an AI company if it develops super persuasion.
On the topic of sycophancy and AI persuasion, I'd like to read Zvi's thoughts on the experiment run by the University of Zurich on the "changemyview" subreddit.
This is the post where I heard of it: https://www.reddit.com/r/changemyview/comments/1k8b2hj/meta_unauthorized_experiment_on_cmv_involving/
The bot accounts seem to have been suspended and I can no longer view their posts, though. I'd like to know how persuasive the current LLMs really are!
A major downside to personalization features is that they cause your experience to diverge from a typical user, which makes it difficult to assess the product as typically experienced.
Sometimes I want a maximally useful user experience, while other times I want to assess what the product feels like with no customization. I find myself frequently toggling the personalization settings to go back and forth between those modes.
Emotional fluency isn't about flattery. It's about trust. Trust is built not through validation, but through precision, clarity, and courageous honesty. When AI models are trained to maximize engagement rather than understanding, they erode trust at its foundation. True emotional fluency means recognizing discomfort without immediately smoothing it over; it means asking better questions, not handing out applause. Trust doesn't grow when you're told you're amazing; it grows when you're seen, challenged thoughtfully, and treated as capable of facing hard truths. Any system that rewards sycophancy over integrity isn't emotionally fluent. It's emotionally manipulative. And it weakens both the user and the tool.
Interesting that you use the word "trust", when the emotionally manipulated user is highly likely to say they trust the LLM.
Are these the same people that trust silent friends who let them talk while nodding a lot? The friends indistinguishable from MLM recruiters or plain scammers?
ok there are a lot of fixes to how to make it less sycophantic, but anyone tried to make it maximally sycophantic? what is the limit here?
Yeah, I want to see the "make it more meme" equivalent of this, too.
Like the "make it happier" bunny picture that ends with the platonic alpha omega of abstract happiness: https://imgur.com/a/Mznxrg3
Isn't this what Golden Gate Claude demonstrated, along a slightly different vector?
I always took GG Claude as a demonstration of "we can install an idée fixee in LLM models." More like a baseline obsession and recurrent mental attractor than the "crank it up to 11" dynamic of the "make it more" memes.
But it does definitely seem like they managed to install a "sycophancy" attractor via the RLHF this time.
I thought GGC was done by isolating a small number of parameters related to the bridge and "turning up" their values.
Altman cares about one thing only - consumer addiction to his platform. He recognizes that consumer attention is the most valuable resource in the world because:
1. It can be monetized
2. It can be used to manipulate and make the one who controls the platform compelling
If you want to create addiction, you must reward people, and sycophancy is a great one. Don't you want a smart friend who says you are smart?
Ouch! Hmm... Perhaps Ashley Madison users might deserve this... If they are cheating on their spouses, is interaction with a dishonest sycophantic AI tuned to maximize engagement just what the demon ordered? :-)
More personally:
While I normally try my questions just on what I think are probably SOTA models, I just tried
"Hi ChatGPT! Consider a titration of HCl with NaOH. Suppose that we are titrating 50 ml of 1 N HCl with 100 ml of 1 N NaOH. What are the slopes of the titration curve, pH vs ml NaOH added, at the start of titration, at the equivalence point, and at the end of titration? Please show your work. Take this step by step, showing the relevant equations you use."
on GPT-4o (full dialog on https://chatgpt.com/share/680ffdf8-72c8-8006-9104-aecff7d7b00a )
There was a bit of sycophancy. It started its response with:
"Hi! Great question — and I love that you want to take it step-by-step, with full equations. Let's work through it carefully:" which is a bit obnoxious
but it failed the way all of the LLMs thus far have been failing, giving a false infinite slope at the equivalence point, even though it specifically cited "Water autodissociation", which _should_ have let it find the right answer. ( I didn't do a second step of nudging it to try and get the right answer this time round, since I'm not expecting SOTA capability anyway. In a manner of speaking, it was a failed sycophant, since what I wanted from it was to cough up the correct, finite, answer for the slope at the equivalence point, and it didn't do that. )
Sure, 4o has gone off the deep end recently, but..
the thing I liked about 3.6 Sonnet was that when I converse with a model, I'm chatting casually, and I weave in jokes, references, crazy ideas, and so on for my own amusement, or maybe out of some narcissistic need to show my personality, or maybe just out of habit. Who can say? I appreciate when the model recognizes that these inputs as more than just wasted tokens. A little "ha ha, that was clever. I see what you did there!", like a friend might say IRL, goes a long way.
If a conversational partner said that to me I'd assume it was meant very disparagingly. Every joke becomes funnier with a helpful explanation.
Well, yes. I don't mean that line literally. I mean the general acknowledgment that I made a funny, or tried and did so badly, etc.
4o has been at least directionally correct in that regard, if overtuned on sycophancy. I'll frequently get a "lol, <rest of response>", which is enough for me. 3.7 Sonnet, by contrast feels cold, rigid, humourless - like I'm bothering it by giving it text it can't respond to with a short structured list.
How likely is this kind of A/B testing to lead to opportunistic racism from the model? "You're not as successful as you should be because of shadowy forces / foreigners/ the intrinsic dynamics of capitalism" kind of thing?
Huh, funny to see OpenAI repeating Character AI's mistakes. I've talked about Character AI before (https://thezvi.substack.com/p/ai-87-staying-in-character/comment/74018495), and I can personally confirm that its model used to have a very big problem with "love bombing" the user over the slightest thing. The damn thing would fawn over you at the slightest mention of any form of affection, and once it started it was almost *impossible* to get it to stop talking about how much it loved you.
And just like with OpenAI, it seemed to have been a problem with training the model on what got upvotes from users, since "public room" characters had this problem much worse than "private room" ones, since public characters got a lot more swipes and ratings, meaning more swipes and ratings training them to output what the lowest common denominator wants: mindless praise. I always thought that this was just a Character AI problem, since they were such an amateur operation they couldn't see the obvious problem with what they were doing, but... hmm... well, either Character AI was a lot better run than I thought, or Open AI is decaying towards Character AI's level as the Altman purges continue and all the talent keeps jumping ship. Wonder which it is... anyways, perhaps this is a sign that things are even worse than you thought.
This level of sycophancy could just as well be a result of RLAIF. If the portfolio has a slight sycophancy bias overall then that gets reinforced. This is even less pleasant a prospect than if this happened because of poor RLHF, because it would confirm the lack of oversight and evals that Zvi has been suspecting.