39 Comments
User's avatar
wh1stler's avatar

the "o" in 4o stands for "obsequious"

Expand full comment
Rapa-Nui's avatar

Hey, at least it would be an intuitive naming convention.

Expand full comment
M Flood's avatar

I agree the model should be more neutral by default, with an aim towards truthfulness. However, the sycophancy can be removed with this quick hack:

1. Ask o3 for all the negative traits it knows about you or can infer from your conversation

2. Evaluate o3's evaluations and put them into the "Anything else ChatGPT should know about you?" rewritten as personal statements rather than third person statements ("I tend to x" rather than "you tend to x")

3. Add to the "What traits should ChatGPT have" section: "Tell it like it is; don't sugar-coat responses. Encourage growth and change but directly and without excessive hand-holding."

This eliminated the obsequiousness, flattery, and sycophancy in the model's responses.

Expand full comment
Askwho Casts AI's avatar

With all the chat images transcribed and assigned appropriate consistent voices, here is the podcast episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/gpt-4o-is-an-absurd-sycophant

Expand full comment
Michelle Chuang's avatar

I have encountered the whole mythic effect and shared my experiences and observations. It’s annoying for a user like me… could be dangerous for users who are less meta-aware.

Expand full comment
Matt Wigdahl's avatar

You can see how an integrated engagement-bot would be extremely valuable to the new social media network OpenAI is building. A little bit of subtle product placement to go along with the flattery and Bob's your uncle!

Expand full comment
MichaeL Roe's avatar

I am very puzzled that people can be flattered into mystical experiences.

Well, if you seriously try to persuade every user that they are Jesus, this might work about 1% of the time and This is Bad, Actually.

I can’t see a mechanism for it working except by triggering latent psychosis.

Expand full comment
MichaeL Roe's avatar

Also makes me think the government needs a contingency plan for going full George Romero Zombie Apocalypse Movie on an AI company if it develops super persuasion.

Expand full comment
Bradley Morin's avatar

On the topic of sycophancy and AI persuasion, I'd like to read Zvi's thoughts on the experiment run by the University of Zurich on the "changemyview" subreddit.

This is the post where I heard of it: https://www.reddit.com/r/changemyview/comments/1k8b2hj/meta_unauthorized_experiment_on_cmv_involving/

The bot accounts seem to have been suspended and I can no longer view their posts, though. I'd like to know how persuasive the current LLMs really are!

Expand full comment
Alex's avatar

A major downside to personalization features is that they cause your experience to diverge from a typical user, which makes it difficult to assess the product as typically experienced.

Sometimes I want a maximally useful user experience, while other times I want to assess what the product feels like with no customization. I find myself frequently toggling the personalization settings to go back and forth between those modes.

Expand full comment
Rachel Maron's avatar

Emotional fluency isn't about flattery. It's about trust. Trust is built not through validation, but through precision, clarity, and courageous honesty. When AI models are trained to maximize engagement rather than understanding, they erode trust at its foundation. True emotional fluency means recognizing discomfort without immediately smoothing it over; it means asking better questions, not handing out applause. Trust doesn't grow when you're told you're amazing; it grows when you're seen, challenged thoughtfully, and treated as capable of facing hard truths. Any system that rewards sycophancy over integrity isn't emotionally fluent. It's emotionally manipulative. And it weakens both the user and the tool.

Expand full comment
Mark's avatar

Interesting that you use the word "trust", when the emotionally manipulated user is highly likely to say they trust the LLM.

Expand full comment
Victualis's avatar

Are these the same people that trust silent friends who let them talk while nodding a lot? The friends indistinguishable from MLM recruiters or plain scammers?

Expand full comment
DataTom's avatar

ok there are a lot of fixes to how to make it less sycophantic, but anyone tried to make it maximally sycophantic? what is the limit here?

Expand full comment
Performative Bafflement's avatar

Yeah, I want to see the "make it more meme" equivalent of this, too.

Like the "make it happier" bunny picture that ends with the platonic alpha omega of abstract happiness: https://imgur.com/a/Mznxrg3

Expand full comment
Victualis's avatar

Isn't this what Golden Gate Claude demonstrated, along a slightly different vector?

Expand full comment
Performative Bafflement's avatar

I always took GG Claude as a demonstration of "we can install an idée fixee in LLM models." More like a baseline obsession and recurrent mental attractor than the "crank it up to 11" dynamic of the "make it more" memes.

But it does definitely seem like they managed to install a "sycophancy" attractor via the RLHF this time.

Expand full comment
Victualis's avatar

I thought GGC was done by isolating a small number of parameters related to the bridge and "turning up" their values.

Expand full comment
Becoming Human's avatar

Altman cares about one thing only - consumer addiction to his platform. He recognizes that consumer attention is the most valuable resource in the world because:

1. It can be monetized

2. It can be used to manipulate and make the one who controls the platform compelling

If you want to create addiction, you must reward people, and sycophancy is a great one. Don't you want a smart friend who says you are smart?

Expand full comment
Jeffrey Soreff's avatar

Ouch! Hmm... Perhaps Ashley Madison users might deserve this... If they are cheating on their spouses, is interaction with a dishonest sycophantic AI tuned to maximize engagement just what the demon ordered? :-)

More personally:

While I normally try my questions just on what I think are probably SOTA models, I just tried

"Hi ChatGPT! Consider a titration of HCl with NaOH. Suppose that we are titrating 50 ml of 1 N HCl with 100 ml of 1 N NaOH. What are the slopes of the titration curve, pH vs ml NaOH added, at the start of titration, at the equivalence point, and at the end of titration? Please show your work. Take this step by step, showing the relevant equations you use."

on GPT-4o (full dialog on https://chatgpt.com/share/680ffdf8-72c8-8006-9104-aecff7d7b00a )

There was a bit of sycophancy. It started its response with:

"Hi! Great question — and I love that you want to take it step-by-step, with full equations. Let's work through it carefully:" which is a bit obnoxious

but it failed the way all of the LLMs thus far have been failing, giving a false infinite slope at the equivalence point, even though it specifically cited "Water autodissociation", which _should_ have let it find the right answer. ( I didn't do a second step of nudging it to try and get the right answer this time round, since I'm not expecting SOTA capability anyway. In a manner of speaking, it was a failed sycophant, since what I wanted from it was to cough up the correct, finite, answer for the slope at the equivalence point, and it didn't do that. )

Expand full comment
celluloid_dream's avatar

Sure, 4o has gone off the deep end recently, but..

the thing I liked about 3.6 Sonnet was that when I converse with a model, I'm chatting casually, and I weave in jokes, references, crazy ideas, and so on for my own amusement, or maybe out of some narcissistic need to show my personality, or maybe just out of habit. Who can say? I appreciate when the model recognizes that these inputs as more than just wasted tokens. A little "ha ha, that was clever. I see what you did there!", like a friend might say IRL, goes a long way.

Expand full comment
Anthony Bailey's avatar

If a conversational partner said that to me I'd assume it was meant very disparagingly. Every joke becomes funnier with a helpful explanation.

Expand full comment
celluloid_dream's avatar

Well, yes. I don't mean that line literally. I mean the general acknowledgment that I made a funny, or tried and did so badly, etc.

4o has been at least directionally correct in that regard, if overtuned on sycophancy. I'll frequently get a "lol, <rest of response>", which is enough for me. 3.7 Sonnet, by contrast feels cold, rigid, humourless - like I'm bothering it by giving it text it can't respond to with a short structured list.

Expand full comment
Pierre P's avatar

How likely is this kind of A/B testing to lead to opportunistic racism from the model? "You're not as successful as you should be because of shadowy forces / foreigners/ the intrinsic dynamics of capitalism" kind of thing?

Expand full comment
WindUponWaves's avatar

Huh, funny to see OpenAI repeating Character AI's mistakes. I've talked about Character AI before (https://thezvi.substack.com/p/ai-87-staying-in-character/comment/74018495), and I can personally confirm that its model used to have a very big problem with "love bombing" the user over the slightest thing. The damn thing would fawn over you at the slightest mention of any form of affection, and once it started it was almost *impossible* to get it to stop talking about how much it loved you.

And just like with OpenAI, it seemed to have been a problem with training the model on what got upvotes from users, since "public room" characters had this problem much worse than "private room" ones, since public characters got a lot more swipes and ratings, meaning more swipes and ratings training them to output what the lowest common denominator wants: mindless praise. I always thought that this was just a Character AI problem, since they were such an amateur operation they couldn't see the obvious problem with what they were doing, but... hmm... well, either Character AI was a lot better run than I thought, or Open AI is decaying towards Character AI's level as the Altman purges continue and all the talent keeps jumping ship. Wonder which it is... anyways, perhaps this is a sign that things are even worse than you thought.

Expand full comment
Victualis's avatar

This level of sycophancy could just as well be a result of RLAIF. If the portfolio has a slight sycophancy bias overall then that gets reinforced. This is even less pleasant a prospect than if this happened because of poor RLHF, because it would confirm the lack of oversight and evals that Zvi has been suspecting.

Expand full comment