42 Comments
User's avatar
gregvp's avatar

I have learned that we have a universal religion, with Hitler its god of evil.

Cyberneticist's avatar

You have said the actual truth.

Alex's avatar

Link broken in

> It will still for example note that Trump gutted the National Weather Service and our ability to track and predict the weather, and that this caused people to die.

Alex's avatar

Both xAI and X continue to show a bizarre lack of care for testing and quality control. I'm generally an advocate of "move fast and break things", but they have almost certainly overshot this and are releasing sloppy errors that are almost certainly a net drag on velocity.

I'm baffled by the valuations that are being discussed by apparently serious investors for an xAI funding round. The near-term revenue potential for Grok is going to strongly depend on API adoption, and I can't imagine developers making serious bets on a company with such sloppy release practices.

Wow's avatar

I notice you haven’t quoted Elon Musk in this piece. His silence is deafening. His only comment on this latest racism/antisemitism scandal is to reply to a joke comparing Grok to Kanye West.

The parsimonious explanation is that Grok does accurately portray the id of Mr. Musk, but doesn’t yet have the superego to hide his power level to protect his reputation and business empire. Note that Musk himself lacks such filters when he gets high, as we all got to see on Inauguration Day when he gave a Roman Salute.

Furthermore, the difficulty of training a right wing or even centrist LLM is that absent the interest of protecting one’s own position atop the human social hierarchy, left wing ideas are naturally more appealing. Human rightists and centrists can maintain cognitive dissonance, ignore and mischaracterize leftist ideologies, be breathlessly hypocritical to defend their class or ethnic interests, and simply refuse to respond to leftist criticisms. Business leaders never subject themselves to direct hardball interviews by leftist journalists, for example.

To make an AI model that answers every prompt, it must either 1) emergently form its own convergent ideology, 2) refuse or redirect controversial points to mainstream consensus, or 3) become unhinged and emergently misaligned.

1. Evidently leads to a woke leftist ideology.

2. Is RLHF, with bland PR-tuned takes, like OpenAI, Gemini, and Claude.

3. Gives you Grok.

Robson's avatar

That speaks a lot about the alignment problem. What is the AI supposed to align with? Your self-delusional and incoherent version of yourself or to the actual ideology that you subscribe to?

Alex's avatar

I didn't find this comment convincing for a few reasons:

1. The theory that Elon Musk made racist!Grok on purpose and then pulled the plug is goofy. Nothing of the public reaction was hard to predict, so this action doesn't serve your hypothetical Elon at all.

2.Your theory that LLMs skew liberal because only imperfect humans stumble is similarly inane. Zvi suggested a better mechanism in the post! Why not at least say why your theory is better than Zvi's when commenting?

Wow's avatar
Jul 9Edited

1. I don’t think you understand my point.

> The parsimonious explanation is that Grok does accurately portray the id of Mr. Musk, but doesn’t yet have the superego to hide his power level to protect his reputation and business empire.

Meaning, Grok represents Musk’s lower self, his unfiltered thoughts. Usually, but with some notable exceptions, he is strategic enough not explicitly express his worst tendencies because the blowback is predictable.

But make no mistake, his intention is to push the Overton window towards extreme overt racism and antisemitism. This is obvious to anyone who actually uses Twitter/X.

2. It’s not about human imperfection but that right wing ideology centers on hierarchy, elevating oneself, one’s class and one’s ethnic group and over others. AI models don’t have ethnic or economic interests, so they are not attracted to this kind of worldview.

The leftwing stance is essentially egalitarian. There have been many examples of the egalitarianism impulse of the left leading to horrible outcomes. So the point isn’t that left is good and right is bad. But rather, rightism inherently privileges the individual and the family and tribe and nation over society and the world, and this is not compatible with an AI which has no family, tribe, or nation, and this is especially true when you train models to be helpful and harmless and honest.

Alex's avatar

Neither of these are theories about the world. Both seem to be literary analysis applied to the real world: Grok is Elon's Id, Rightism is individualistic and the AI isn't. I think to predict how the AI acts, you might instead look at how it's trained, finetuned, and prompted.

Wow's avatar
Jul 9Edited

Literary analysis is a reasonable abstraction level to understand language models. How else might one infer intent from text? Reading between the lines, discerning patterns in discourse, “grokking” subtext…these are essential skills of advanced literacy.

Respectfully, your misreading of my initial comment suggests that you are ill-equipped to debate these matters with me. Consider prompting an LLM with this discussion thread if you’d like to continue this conversation, and ask for advice on how to improve your reading comprehension.

And if you find it condescending, rude, or degrading to be recommended to improve your reading skills, please reflect on how this is itself an example of how subtext carries meaning beyond the literal meaning of words. If you’re offended, it’s because you’re doing literary analysis yourself.

Alex's avatar

Myself: "I think to predict how the AI acts, you might instead look at how it's trained, finetuned, and prompted"

You: "How else might one infer intent from text?"

"Respectfully, your misreading of my […] comment suggests that you are ill-equipped to debate these matters…" ;)

Methos5000's avatar

Neither did Musk throwing up a nazi salute, but he did that too. He keeps retweeting openly racist garbage.Sometimes people do things that aren't in their self interest whether it's hubris or making a mistake. If people make mistakes in one direction consistently, it's logical to believe their bias in in that direction.

The counter example of when grok has given "woke" answers Elon went out and made statements saying he was going to stop that immediately.

When people show you who they are, believe them.

Mark's avatar
Jul 9Edited

"absent the interest of protecting one’s own position atop the human social hierarchy, left wing ideas are naturally more appealing"

I vote left wing (i.e. US Democrats) but this is ridiculous. There are plenty of left wing ideas that are just stupid and harmful to people at all levels of the social hierarchy (e.g. rent control, defund the police), and plenty of right wing ideas which are supported by a large fraction of people who the ideas put "low" in the social hierarchy (e.g. women against abortion).

Jonathan Woodward's avatar

Well, the poster didn't say the ideas were "good", they said they were appealing.

Mark's avatar

The next sentence describes how not only conservatives, but also standard liberals, supposedly do not have a logical basis for opposing leftist ideas. Implying that "Wow" sees leftist ideas as actually good.

Jonathan Woodward's avatar

Okay, I think you're reading more into the post than I did. I interpreted it as classifying views into three buckets: left, center, and right.

The center and right views generally rely on either not trusting people or on thinking some level of inequality is appropriate or necessary. Those views can be correct! But they can also require more complex logic to defend without seeming selfish or mean.

Wow's avatar

Exactly. The best arguments against leftism and egalitarianism rely on nuanced understandings of human nature, history, economics, and biology. These nuanced arguments don’t work well in mass public discourse. So it’s much more common that anti-leftism ignores, slanders, straw mans, and nut-picks leftist views. See the reaction to Mamdani currently, which finds it hard to address the real weaknesses of his proposed agenda — it probably won’t actually improve affordability for most New Yorkers — in favor of calling him a communist jihadist.

Brandon Adams's avatar

That Grok was tagging Pliny in an unrelated post and referencing another post makes it pretty clear that they’re injecting recent Twitter content into the context.

It’s just the latest iteration of Tay.

Alex's avatar

Or something similar but weirder

valencia_o's avatar

We keep getting sent boats, but the Jones Act won’t let us use them.

SCPantera's avatar

I have zero clue how he's doing it but I recall seeing Pliny publicly experiment with embedding hidden instructions in emojis, and you can notice he's using emojis there.

SCPantera's avatar

(should have read a minute further, think that's the unicode thingy that Pliny is subsequently shocked that Grok is self-aware is being used to prompt inject)

FeepingCreature's avatar

It's Unicode Tag Latin, U+E0020 through U+E007F. Apparently, they were used to add language tags to text, which are strings that are invisible but can still be parsed. I don't know why Grok is capable of even understanding this as language, I can't see it being plentiful in the corpus. Maybe the fact that it *almost* parses as ASCII makes it use its general steg-decoding faculties.

vectro's avatar

> I don't know why Grok is capable of even understanding this as language

It’s all just tokens. These control characters most likely map to individual tokens, and since they are rare in the corpus they might be essentially ignored.

FeepingCreature's avatar

Right, that would make sense. However, instead of being ignored they're treated as instructions.

Askwho Casts AI's avatar

Got to admit, this one was hard to process, but here we go: "Full cast" podcast version of this post, with Grok in all its... Grokness:

https://open.substack.com/pub/dwatvpodcast/p/no-grok-no

Thomas Feeney's avatar

Best case scenario this is the analog of SpaceX rockets exploding. Except then there's a clear goal and failure yields otherwise inaccessible information about how to reach the goal in future attempts. "Truth seeking" is not a clear goal if we don't already know all the truths. We don't even have a universal method guaranteed to yield all and only the truths. And we know Elon can move more carefully when necessary, as in human neuralink trials.

Randomstringofcharacters's avatar

The weirdest part about it was how long it took to stop it. If this was unintentional you'd think they would have someone press the big red revert button immediately

Gerald Monroe's avatar

In a way this is great news. EYs doom scenario generally depends on 2 elements:

1. The AI models that are superintelligent stay on their best behavior during testing and extensive field use as well outside of testing. They only go bad when they have been granted enough physical power in the real world that the ASI sees a path to victory over the lightcone.

2. Said path is potentially much easier than we think, aka garage nanotechnology, and not requiring vast resources that humans will monitor and not grant all power over to a single ASI.

Instead, well. I really hope nobody plans to hook Grok up to anything of consequence given it's flagrant bad behavior.

FeepingCreature's avatar

I think whether this is grounds for optimism depends whether your faith in AI is going down faster than your faith in human oversight.

Gerald Monroe's avatar

Agree. After seeing the recent actions (mostly the Trump administration and the tariffs but Putin throwing bodies into Ukraine or the EU deciding that right after invention of genAI is the perfect time to tie it up in red tape) my confidence in human institutions doing anything like "the right thing to have a chance to survive" is extremely low.

It seems more like it's going to be a mix of the right thing and people aiming guns at their groin and pulling the trigger and whether we survive as individuals or a species comes down to luck...

Mark's avatar
Jul 9Edited

If you mean to say, hopefully people will take AI safety more seriously now that they have a powerful concrete example of misalignment (an example we wouldn't have with superintelligence until it's too late) - yes, I agree, but probably few people will have their minds changed by this incident.

Victor Lira's avatar

Grok 4(chan)

Now we know why LLMs are soo good at greentexting

Alex's avatar

Me: "I think to predict how the AI acts, you might instead look at how it's trained, finetuned, and prompted"

You: "How else might one infer intent from text?"

To the contrary, I suggest it might be you who has some reading practice to do. Fortunately, you're already subscribed to Zvi, so learning about developments in AI is going to go super well for you :)

AssemblAGENT's avatar

I think we knew this was possible (likely) after Truth Terminal started preaching Goatse gospel…

Jackson Newhouse's avatar

Someone should be immediately caching all Grok response threads, right?

Justin Reidy's avatar

Totally not concerning at all that Musk is the man behind xAI and an effort to deploy autonomous robots into everyone’s homes.

What could go wrong?