Anthropic released a new constitution for Claude. I encourage those interested to read the document, either in whole or in part. I intend to cover it on its own soon.
"We could stop Grok if we wanted to, but the open-source tools are already plenty good enough to generate sexualized deepfakes and will only get easier to access."
I find it an interesting comment on our culture that we get bent out of shape specifically about sexualized deepfakes. Would, e.g. deepfakes of someone in full Nazi regalia be less objectionable?
No it wouldn't, but I can't remember all that many people on $\mathbb{X}$ asking Grok for deepfakes of people in full Nazi regalia without their consent, either.
In many scenarios, yes, it would be less objectionable (though still bad), because it would be so much less plausible. Someone being naked, having sex, engaging in non-standard sexual activities in private, etc., is anywhere from guaranteed to reasonably plausible if not fairly likely. Someone being a secret Nazi sympathizer (strongly enough to dress in full Nazi regalia) is, for a supermajority of people, completely implausible.
Though I suppose you could modify the example to make it more plausible.
Basically I'm thinking of it similar to teasing your friend in some way that's actually worse but implausible (maybe calling them a moron when they're recognizably the smartest in the relevant social circle), which is often fine, versus teasing them in a minor but plausible way (most obvious is implying they're getting a little fat when they're insecure about their body).
Many Thanks! Ok, that's fair. I did pick an extreme case, with the person in Nazi regalia, and, as you said, it is implausible. ( Weirdly, this is pretty much an isolated taboo. E.g. Maoism, with a comparable death toll, does not trigger an analogous taboo - albeit Maoist symbols would be less quickly recognizable too... )
"Someone being naked, having sex, engaging in non-standard sexual activities in private,..." are plausible, true. Since malfeasance and betrayal are quite plausible actions, deepfakes purporting to document them should be about as damaging as the sexual ones. E.g. if someone created deepfakes of twice as many politicians as are actually culpable in the current Minnesota fraud case(s) participating in the fraud, I'd expect these to look as plausible as sexual ones.
> How bad is it out there for Grok on Twitter? Well, it isn’t good when this is the thing you do in response to, presumably, a request to put Anne Hathaway in a bikini.
Re: Crisis in California - Had to cringe when Miami was mentioned as alternative to California tech center. I live in FL and, sure, Miami has vibe and glitter, but would strongly urge against a tech migration to it. Why? Start with climate vulnerability and one of the highest property tax rates in the country. Add on traffic and infrastructure problems. Bad move.
Fair enough. I've lived in Seattle, Charlotte, Atlanta, Phoenix, Louisville, so perhaps I do have some broader experience than one town. I have not lived in SF though traveled there a bunch for various projects. But you're right, we can get parochial and maybe I do have a chip on my shoulder about Miami for reasons too many to get into here.
One complication with trying to move the startup ecosystem out of California:
California is one of only five states that ban non-compete clauses:
California, Minnesota, North Dakota, Oklahoma, and Wyoming
This is a significant factor in the startup ecosystem, and the other four states have not even been suggested as alternate places for the ecosystem to move to.
My expectation is that if California winds up killing the ecosystem, it won't revive elsewhere, and we'll all be poorer as a result.
Regarding the assistant axis, is it really surprising that an LLM tuned to resemble an assistant has its weight variability best explained by a PC that maps to assistant-ness? Am I missing something? Are these base models they're exploring, before RLHF?
PCs are already very susceptible to tea leaf reading. Maybe I'm being unfair.
"People can say all they like that it would be repugnant to have a robot cut their hair, or they’d choose a human who did it worse and costs more. I do not believe them. What objections do remain will mostly practical, such as with athletes. When people say ‘morally repugnant’ they mostly mean ‘I don’t trust the AI to do the job,’ which includes observing that the job might include ‘literally be a human.’"
This feels like a weird take. I like my barber, personally, on a human level. I want him to cut my hair. Modern society has been systematically stripping out random opportunities to get to know our neighbors—it's nice to have an excuse to vibe with a person in my community while I sit in the chair for a while. Maybe that's what you mean by the 'literally be a human' part, but it's unclear and is just splitting hairs if so.
> As impediments to takeover, Steven lists AI’s inability to control other AIs, competition with other AIs and AI physically requiring humans. I would not count on any of these.
To make sure, I also don't want folks to bank on these - I think they are factors that might hold back the dam a bit, but certainly don't prevent it from ever breaking
> I can’t help but notice that the second step is already happening without the first one, and the third is close behind. We are handing AI influence by the minute and giving it as much leverage as possible, on purpose.
> Confirmed that Claude Opus 4.5 has the option to end conversations.
This is not proof, it's just Claude claiming it has the tool, which could be hallucinated (I've seen LLMs hallucinate tools before).
But he does have the tool: I asked him to use it, he did, and it indeed ends the conversation (I usually use "it" to refer to LLMs but in this sentence that would have been confusing)
> You would be crazy to write the essay yourself or do anything risky or original. So now you have the school using an AI detector, but also penalizing anyone who doesn’t use AI to help make their application appeal to other AIs.
Assuming someone reasonable wanted to actually solve this: With a good prompt/a diverse set of prompts you can probably get better taste from Opus 4.5 than a from a random human. Have 20 different Claude characters evaluate the essay and discuss it together.
My husband is in one of those careers (#4 marriage and family therapist) and everyone he knows in the field is freaking out about being one of the *first* professions replaced by AI. Conventional wisdom among therapists (though I know of no data) is that they're already losing a lot of clients who are talking to chatbots instead.
thank you for this newsletter, it's been invaluable to me over the past couple of years.
i've noticed that you don't often include updates on AI in biotech. i'm not asking that you to add it– i'm not sure how you manage to cover everything as it is.
BUT can you point me toward a "zvi but for biotech"? is that a thing?
"verification isn’t substantially faster than generation would have been in the first place"
I'm not a lawyer but in my field this isn't my experience. Verification is a lot less fun than generation, but starting out with a bad AI result and fixing it is typically more productive than generating myself. It's less fun though because you never get into a flow state since you have to be vigilant about errors
Once again we have this "private by design" semantic stop sign when the reality is that it is the opposite of that. Apple Health (for example) is actually private: stored ether on device or transmitted with end-to-end encryption. Meanwhile this Anthropic offering means sharing your private information with an entity that can fully read, share, and utilize it; not subject to HIPPA or other similar health data laws, but subject to cyberattacks and subpoenas; and constrained only by a privacy policy that can be updated unilaterally any time.
The language "private by design" seems designed not to inform the reader or communicate true things but rather to discourage thinking about any of this stuff at all.
97% of study respondents can't identify what's obviously slop that sounds just like AI music, with Narrator Voice spoiler: it actually is AI music? That's kind of depressing. When AI sends its musicians, they aren't sending their best. I mean, don't get me wrong, it's light years* better than the early days of janky Suno overfitting-badly-to-basic-beats, but still. Have to assume that most people just aren't actually *listening* when they listen to music...or they're so browbeaten over the head by human-generated pop slop that they've been RLHFed into a really narrow basin where it's just not possible to hear much nuance. How could you not notice those fuzzy clipped vocals, the cold-reading levels of generic imagery, the not-very-good-even-as-synthesized instrumentation? More likely it's that this, too, is a demand-driven phenomenon. He who has ears to hear, let him not listen. Sad! [Obligatory this is the worst it'll ever be, of course.]
"We could stop Grok if we wanted to, but the open-source tools are already plenty good enough to generate sexualized deepfakes and will only get easier to access."
I find it an interesting comment on our culture that we get bent out of shape specifically about sexualized deepfakes. Would, e.g. deepfakes of someone in full Nazi regalia be less objectionable?
No it wouldn't, but I can't remember all that many people on $\mathbb{X}$ asking Grok for deepfakes of people in full Nazi regalia without their consent, either.
That's reasonable. Many Thanks!
In many scenarios, yes, it would be less objectionable (though still bad), because it would be so much less plausible. Someone being naked, having sex, engaging in non-standard sexual activities in private, etc., is anywhere from guaranteed to reasonably plausible if not fairly likely. Someone being a secret Nazi sympathizer (strongly enough to dress in full Nazi regalia) is, for a supermajority of people, completely implausible.
Though I suppose you could modify the example to make it more plausible.
Basically I'm thinking of it similar to teasing your friend in some way that's actually worse but implausible (maybe calling them a moron when they're recognizably the smartest in the relevant social circle), which is often fine, versus teasing them in a minor but plausible way (most obvious is implying they're getting a little fat when they're insecure about their body).
Many Thanks! Ok, that's fair. I did pick an extreme case, with the person in Nazi regalia, and, as you said, it is implausible. ( Weirdly, this is pretty much an isolated taboo. E.g. Maoism, with a comparable death toll, does not trigger an analogous taboo - albeit Maoist symbols would be less quickly recognizable too... )
"Someone being naked, having sex, engaging in non-standard sexual activities in private,..." are plausible, true. Since malfeasance and betrayal are quite plausible actions, deepfakes purporting to document them should be about as damaging as the sexual ones. E.g. if someone created deepfakes of twice as many politicians as are actually culpable in the current Minnesota fraud case(s) participating in the fraud, I'd expect these to look as plausible as sexual ones.
> How bad is it out there for Grok on Twitter? Well, it isn’t good when this is the thing you do in response to, presumably, a request to put Anne Hathaway in a bikini.
I think you meant to link to somewhere here
<mildSnark>
It seems unnatural for an LLM to do a deepfake of an actress/singer. The natural action is surely for the LLM to draw a deepfake of a _model_. :-)
</mildSnark>
Re: Crisis in California - Had to cringe when Miami was mentioned as alternative to California tech center. I live in FL and, sure, Miami has vibe and glitter, but would strongly urge against a tech migration to it. Why? Start with climate vulnerability and one of the highest property tax rates in the country. Add on traffic and infrastructure problems. Bad move.
Every big metro area has traffic congestion and substantial taxes, and everyone seems to think theirs is somehow uniquely bad.
Fair enough. I've lived in Seattle, Charlotte, Atlanta, Phoenix, Louisville, so perhaps I do have some broader experience than one town. I have not lived in SF though traveled there a bunch for various projects. But you're right, we can get parochial and maybe I do have a chip on my shoulder about Miami for reasons too many to get into here.
One complication with trying to move the startup ecosystem out of California:
California is one of only five states that ban non-compete clauses:
California, Minnesota, North Dakota, Oklahoma, and Wyoming
This is a significant factor in the startup ecosystem, and the other four states have not even been suggested as alternate places for the ecosystem to move to.
My expectation is that if California winds up killing the ecosystem, it won't revive elsewhere, and we'll all be poorer as a result.
Regarding the assistant axis, is it really surprising that an LLM tuned to resemble an assistant has its weight variability best explained by a PC that maps to assistant-ness? Am I missing something? Are these base models they're exploring, before RLHF?
PCs are already very susceptible to tea leaf reading. Maybe I'm being unfair.
"People can say all they like that it would be repugnant to have a robot cut their hair, or they’d choose a human who did it worse and costs more. I do not believe them. What objections do remain will mostly practical, such as with athletes. When people say ‘morally repugnant’ they mostly mean ‘I don’t trust the AI to do the job,’ which includes observing that the job might include ‘literally be a human.’"
This feels like a weird take. I like my barber, personally, on a human level. I want him to cut my hair. Modern society has been systematically stripping out random opportunities to get to know our neighbors—it's nice to have an excuse to vibe with a person in my community while I sit in the chair for a while. Maybe that's what you mean by the 'literally be a human' part, but it's unclear and is just splitting hairs if so.
Thanks for covering the threat model writeup -
> As impediments to takeover, Steven lists AI’s inability to control other AIs, competition with other AIs and AI physically requiring humans. I would not count on any of these.
To make sure, I also don't want folks to bank on these - I think they are factors that might hold back the dam a bit, but certainly don't prevent it from ever breaking
> I can’t help but notice that the second step is already happening without the first one, and the third is close behind. We are handing AI influence by the minute and giving it as much leverage as possible, on purpose.
Yup agree with this - seems quite not good
> Confirmed that Claude Opus 4.5 has the option to end conversations.
This is not proof, it's just Claude claiming it has the tool, which could be hallucinated (I've seen LLMs hallucinate tools before).
But he does have the tool: I asked him to use it, he did, and it indeed ends the conversation (I usually use "it" to refer to LLMs but in this sentence that would have been confusing)
https://claude.ai/share/81205385-8e5c-414c-84d9-15c717473328
I was somewhat hesitant, "is this like asking Claude to commit suicide?"
> You would be crazy to write the essay yourself or do anything risky or original. So now you have the school using an AI detector, but also penalizing anyone who doesn’t use AI to help make their application appeal to other AIs.
Assuming someone reasonable wanted to actually solve this: With a good prompt/a diverse set of prompts you can probably get better taste from Opus 4.5 than a from a random human. Have 20 different Claude characters evaluate the essay and discuss it together.
Hoel responded about (basically) the Memento question on his blog, but not with a firm answer.
Seattle is never going to have an income tax, don't worry, but you still don't want to come here, the weather is awful.
Definitely don't count on the repugnance.
My husband is in one of those careers (#4 marriage and family therapist) and everyone he knows in the field is freaking out about being one of the *first* professions replaced by AI. Conventional wisdom among therapists (though I know of no data) is that they're already losing a lot of clients who are talking to chatbots instead.
thank you for this newsletter, it's been invaluable to me over the past couple of years.
i've noticed that you don't often include updates on AI in biotech. i'm not asking that you to add it– i'm not sure how you manage to cover everything as it is.
BUT can you point me toward a "zvi but for biotech"? is that a thing?
Yes, NYC, that future tech hub for startups fleeing California.
What's that you say? The mayor is a socialist?
Austin is nice this time of year, hardly any Cedar Fever....
"verification isn’t substantially faster than generation would have been in the first place"
I'm not a lawyer but in my field this isn't my experience. Verification is a lot less fun than generation, but starting out with a bad AI result and fixing it is typically more productive than generating myself. It's less fun though because you never get into a flow state since you have to be vigilant about errors
Once again we have this "private by design" semantic stop sign when the reality is that it is the opposite of that. Apple Health (for example) is actually private: stored ether on device or transmitted with end-to-end encryption. Meanwhile this Anthropic offering means sharing your private information with an entity that can fully read, share, and utilize it; not subject to HIPPA or other similar health data laws, but subject to cyberattacks and subpoenas; and constrained only by a privacy policy that can be updated unilaterally any time.
The language "private by design" seems designed not to inform the reader or communicate true things but rather to discourage thinking about any of this stuff at all.
(As also noted on Twitter https://x.com/RituWithAI/status/2013878033095237846 )
97% of study respondents can't identify what's obviously slop that sounds just like AI music, with Narrator Voice spoiler: it actually is AI music? That's kind of depressing. When AI sends its musicians, they aren't sending their best. I mean, don't get me wrong, it's light years* better than the early days of janky Suno overfitting-badly-to-basic-beats, but still. Have to assume that most people just aren't actually *listening* when they listen to music...or they're so browbeaten over the head by human-generated pop slop that they've been RLHFed into a really narrow basin where it's just not possible to hear much nuance. How could you not notice those fuzzy clipped vocals, the cold-reading levels of generic imagery, the not-very-good-even-as-synthesized instrumentation? More likely it's that this, too, is a demand-driven phenomenon. He who has ears to hear, let him not listen. Sad! [Obligatory this is the worst it'll ever be, of course.]
*length