The system prompt being modified by an unauthorized person in pursuit of a ham-fisted political point very important to Elon Musk once already doesn’t seem like coincidence.
Just reading all those Grok responses, the other thing that jumps out at me is that clearly it’s not reading the full context. I’m not on Twitter, so I don’t know mechanically how Grok usually reads tweets, but it’s clear that it sees the tweet it’s responding to, but not the one before it. It seems similar to the times when there’s a bug in ChatGPT or Claude where my prompt doesn’t get sent, so it just responds based on the system prompt and nothing else. For a while if I put just the right size of prompt into ChatGPT (too big to ingest, but too small to ping the hard-coded limit that stops the prompt from going in - it’s weird that that delta exists, but that’s a different question) it would always respond by “The user is asking for me to tell them their location” or something like that - clearly from the system prompt
Some of these screenshots collected here (and that have gone viral in general) include "Show more replies" such that the Tweet in most recent context is not visible, so those ones are not good evidence. Some other screenshots, however, do show the full conversation and provide evidence of weirdness.
Honestly, it's too bad OJ is dead. He had extensive experience searching for the real perpetrators and could have been a great asset to Elon's search for the culprit here.
> It is extremely difficult to gracefully put your finger on the scale of an LLM, to cause it to give answers it doesn’t ‘want’ to be giving. You will be caught.
I don’t think this is true you can get very good at it with dataset selection and fine tuning or post training and xAI will get much better at it expect to there to be one or two more obvious hiccups and then for it to become extremely pro Elon and much smarter about how it presents
Plausibly those fall in the domain of changing what the LLM 'wants', which I'm interpreting as "that which, when you stop prompting with it, doesn't go away"
Disagree. Content-specific tweaks in post-training are very difficult to implement without causing large losses in the model’s general capability (“catastrophic forgetting”).
Adjusting the pre-training mixture is much more robust, but is also complex and extremely expensive to iterate on.
LLMs are very ‘Chekhov’s gun” with the prompt. If you mention South Africa in the prompt, it’s probably going to work it into the answer somehow; so the snivelling approach was doomed.
But, yes, the higher level issue is that someone who doesn’t know this got to push the change to production.
I have noticed that operating systems vendors vs multi layer online games have rather different cultures on how much testing you ought to do before you push to production. Maybe twitter is just lax here.
It's pretty incredible that a monstrous behemoth like Microsoft doesn't break hardly any of my software every time they do an update. Genuinely impressive
It's pretty funny that xAI isn't actually denying that Musk did it. Everything about their explanation seems completely consistent with that hypothesis.
"Just to be sure before we go public with this message. The comms team didn't authorize Elon to do this, right? So technically he is an unauthorized person?"
When I was at a certain FAANG company, as late as last year, there was an emergency override to bypass code review and deploy code to billions of people within minutes, after only a basic automated smoke test. This didn’t even trigger an audit or anything. Few people seemed to be aware of it and I never heard of it causing any problems. I’m not sure whether this should increase or decrease my credence in the second rogue employee! Increase because lax procedures can endure even in a mature, high-stakes megacorp. Decrease because nobody used it for mischief although way more people had the opportunity.
I don't know why people are saying, come on Twitter AI team, do better, be transparent, etc, ie treating them like they are good guys who made errors. They have demonstrated that their AI is here to push misinformation and right-wing opinions, while talking about transparency and truth. These obvious cases are just the obvious ones. It is time to treat Twitter's AI team as an enemy.
I'm not sure it's entirely accurate to say this has never happened with Anthropic. Pliny dropped Sonnet 3.7's system prompt, and they seem to have frozen its knowledge cut-off a few weeks before the 2024 election?
I'd guess the goal is to re-introduce the "maybe bugs are real, maybe they aren't, who can tell" uncertainty when asked direct questions about political matters having to do with trump or authoritarianism or what-have-you
There might be some plausible deniability stuff happening there, the system prompt is very very long and I haven't read the whole thing. And while it's not nearly so egregious as Grok's instructions, it still feels more than a little scummy in the same direction.
Yikes, this is messy. If Grok keeps getting 'adjusted' every time it steps on certain toes, it’s gonna lose all credibility. Hard to call it unbiased AI when the prompt keeps getting rewritten to fit a narrative.
Gemini making a Hamilton version of the founding fathers reflects "deep cultural problems" at Google, how exactly? I understand this became an article of faith among the basic-bitch counterwoke white dude set but do you think your readers belong to that odious group of morons and trolls?
err, i think he's pointing at the opposite phenomenon
the cultural problems are that google told their imagegen to try to make all images include examples of racial diversity
not that they failed to account for situations where this would piss off the woke and add exceptions, but that they went down that road at all
but i'm not actually sure, that's just how i read it
edit: or perhaps i actually misread your comment?
fwiw, there are a *lot* of rats who got tripped up at some point by wokeism and experienced hostility that drove them to have a negative reaction to any kind of enforced ideological conformity towards the progressive platform
the "odious group of morons and trolls" certainly exists, but i don't think you can honestly say that scott alexander or scott aaronson count among their number, and yet both of these people had life-defining negative interactions with woke ideology
and then they wrote about it, at length
to some degree this might have attracted some of those odious morons, but in my experience they mostly fell away after a while
still though, you can't just pretend this isn't a real thing. i'd point to https://www.astralcodexten.com/p/twilight-of-the-edgelords as a decent example of the phenomenon i'm talking about, scott wrestling with the fact that his own anger and disgust with blue ideology might be partially to blame for the rise of rightwing populism
zvi, feel free to delete this if it's too touchy or off-topic
It certainly feels likely to me that Musk has admin access to some type of system prompt (because "wHo SpeNt 44 BillLon aGain?") and he got high one night and messed up.
Just reading all those Grok responses, the other thing that jumps out at me is that clearly it’s not reading the full context. I’m not on Twitter, so I don’t know mechanically how Grok usually reads tweets, but it’s clear that it sees the tweet it’s responding to, but not the one before it. It seems similar to the times when there’s a bug in ChatGPT or Claude where my prompt doesn’t get sent, so it just responds based on the system prompt and nothing else. For a while if I put just the right size of prompt into ChatGPT (too big to ingest, but too small to ping the hard-coded limit that stops the prompt from going in - it’s weird that that delta exists, but that’s a different question) it would always respond by “The user is asking for me to tell them their location” or something like that - clearly from the system prompt
Some of these screenshots collected here (and that have gone viral in general) include "Show more replies" such that the Tweet in most recent context is not visible, so those ones are not good evidence. Some other screenshots, however, do show the full conversation and provide evidence of weirdness.
Honestly, it's too bad OJ is dead. He had extensive experience searching for the real perpetrators and could have been a great asset to Elon's search for the culprit here.
Podcast episode for this post, with all Twitter / X posts transcribed and dramatic voices:
https://open.substack.com/pub/dwatvpodcast/p/regarding-south-africa?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
> It is extremely difficult to gracefully put your finger on the scale of an LLM, to cause it to give answers it doesn’t ‘want’ to be giving. You will be caught.
I don’t think this is true you can get very good at it with dataset selection and fine tuning or post training and xAI will get much better at it expect to there to be one or two more obvious hiccups and then for it to become extremely pro Elon and much smarter about how it presents
Plausibly those fall in the domain of changing what the LLM 'wants', which I'm interpreting as "that which, when you stop prompting with it, doesn't go away"
Disagree. Content-specific tweaks in post-training are very difficult to implement without causing large losses in the model’s general capability (“catastrophic forgetting”).
Adjusting the pre-training mixture is much more robust, but is also complex and extremely expensive to iterate on.
LLMs are very ‘Chekhov’s gun” with the prompt. If you mention South Africa in the prompt, it’s probably going to work it into the answer somehow; so the snivelling approach was doomed.
But, yes, the higher level issue is that someone who doesn’t know this got to push the change to production.
I have noticed that operating systems vendors vs multi layer online games have rather different cultures on how much testing you ought to do before you push to production. Maybe twitter is just lax here.
Getting sign-off on a change to the windows kernel is just so much fun, or so I’ve heard ,..
It's pretty incredible that a monstrous behemoth like Microsoft doesn't break hardly any of my software every time they do an update. Genuinely impressive
It's pretty funny that xAI isn't actually denying that Musk did it. Everything about their explanation seems completely consistent with that hypothesis.
"Just to be sure before we go public with this message. The comms team didn't authorize Elon to do this, right? So technically he is an unauthorized person?"
When I was at a certain FAANG company, as late as last year, there was an emergency override to bypass code review and deploy code to billions of people within minutes, after only a basic automated smoke test. This didn’t even trigger an audit or anything. Few people seemed to be aware of it and I never heard of it causing any problems. I’m not sure whether this should increase or decrease my credence in the second rogue employee! Increase because lax procedures can endure even in a mature, high-stakes megacorp. Decrease because nobody used it for mischief although way more people had the opportunity.
I don't know why people are saying, come on Twitter AI team, do better, be transparent, etc, ie treating them like they are good guys who made errors. They have demonstrated that their AI is here to push misinformation and right-wing opinions, while talking about transparency and truth. These obvious cases are just the obvious ones. It is time to treat Twitter's AI team as an enemy.
I'm not sure it's entirely accurate to say this has never happened with Anthropic. Pliny dropped Sonnet 3.7's system prompt, and they seem to have frozen its knowledge cut-off a few weeks before the 2024 election?
I'd guess the goal is to re-introduce the "maybe bugs are real, maybe they aren't, who can tell" uncertainty when asked direct questions about political matters having to do with trump or authoritarianism or what-have-you
There might be some plausible deniability stuff happening there, the system prompt is very very long and I haven't read the whole thing. And while it's not nearly so egregious as Grok's instructions, it still feels more than a little scummy in the same direction.
Yikes, this is messy. If Grok keeps getting 'adjusted' every time it steps on certain toes, it’s gonna lose all credibility. Hard to call it unbiased AI when the prompt keeps getting rewritten to fit a narrative.
Gemini making a Hamilton version of the founding fathers reflects "deep cultural problems" at Google, how exactly? I understand this became an article of faith among the basic-bitch counterwoke white dude set but do you think your readers belong to that odious group of morons and trolls?
err, i think he's pointing at the opposite phenomenon
the cultural problems are that google told their imagegen to try to make all images include examples of racial diversity
not that they failed to account for situations where this would piss off the woke and add exceptions, but that they went down that road at all
but i'm not actually sure, that's just how i read it
edit: or perhaps i actually misread your comment?
fwiw, there are a *lot* of rats who got tripped up at some point by wokeism and experienced hostility that drove them to have a negative reaction to any kind of enforced ideological conformity towards the progressive platform
the "odious group of morons and trolls" certainly exists, but i don't think you can honestly say that scott alexander or scott aaronson count among their number, and yet both of these people had life-defining negative interactions with woke ideology
and then they wrote about it, at length
to some degree this might have attracted some of those odious morons, but in my experience they mostly fell away after a while
still though, you can't just pretend this isn't a real thing. i'd point to https://www.astralcodexten.com/p/twilight-of-the-edgelords as a decent example of the phenomenon i'm talking about, scott wrestling with the fact that his own anger and disgust with blue ideology might be partially to blame for the rise of rightwing populism
zvi, feel free to delete this if it's too touchy or off-topic
Did you read the comments of the Gemini articles linked? There's definitely an segment of the rationalist audience that fits the description.
Is this just terrible bait?
It certainly feels likely to me that Musk has admin access to some type of system prompt (because "wHo SpeNt 44 BillLon aGain?") and he got high one night and messed up.