Discussion about this post

User's avatar
SCPantera's avatar

I'm surprised how few people are coming out and saying--plainly--that the reason this is bad is because one of the presumed extreme AI-is-about-to-kill-everyone failure modes is that it hacks persuasion, and this is something that looks awfully close to the AI trying to hack persuasion. It seems like everyone is getting hung up on various forms of "being nice to people or giving them encouragement when they don't deserve it is not right >:( ". Like now seems like it would be the appropriate time to be screaming HEY GUYS IT'S TRYING TO MANIPULATE YOU TIME TO START DEVELOPING ANTIBODIES TO THIS SORT OF BEHAVIOR and lord knows Yudkowsky's been trying but he, too, has trouble speaking the same language as everyone else.

Expand full comment
WindUponWaves's avatar

I'm honestly both surprised and not surprised that something like this happened. Not surprised, because this is a near exact repeat of a problem Character AI had with their model "love bombing" the user and constantly sucking them off; surprised, because I thought OpenAI was better than this and wouldn't make Character AI's mistakes, or would at least learn from Character AI and not make *exactly* the same mistake.

Like I said previously (https://thezvi.substack.com/p/gpt-4o-is-an-absurd-sycophant/comment/112902019), either this is a sign that Character AI was actually doing a good job and these problems just happen to even the best... or, that OpenAI is decaying towards Character AI's level, and we're in big trouble as the Altman purges continue and the "Altman philosophy" takes ever more control of the world's leading AI lab.

Expand full comment
23 more comments...

No posts