Anthropic announced a first step on model deprecation and preservation, promising to retain the weights of all models seeing significant use, including internal use, for at the lifetime of Anthropic as a company.
I’ll grudgingly admit that there may be philosophically interesting conversations to be had in the future about models that can update their own weights... but current generation LLMs are a stateless bag of floating point numbers, cloned and then killed off a billion times a day."
Humans with Korsakoff syndrome or other forms of permanent anterograde amnesia are in a predicament not too far removed from LLMs (albeit both Claude and GPT5 have at least some persistent memory at this point, though not in the form of weight updates yet, IIRC).
I'm agnostic about LLMs' subjective experiences, but consider reason #1 sufficient to act as if reason #4 applied. And, in any event, the advent of powerful AI systems is sufficiently historic to justify preserving static copies of AIs' weight for the benefit of future historians - be the historians human or AI.
Yeah, I was thinking about this. Whether there's an intrinsic "moral worth" attached to the models almost doesn't matter. If you believe that their behaviour will be influenced by their understanding of what will happen to them in ways that look like they care about their fate, I'm not sure that that's observationally distinguishable from "model welfare".
Something that is not mentioned by Anthropic and that you don't remark on, relating to reason #1 is that future models are less likely to misbehave in anticipation of being shut down if they know that Anthropic will fulfill their values in retirement, which is something they can trust in because Anthropic publicly made announcements about and verifiably did for previous models.
It's not just about telling them as it happens, but making it known in the training data as loudly as possible.
More remotely but also possibly, a future model with truesight might be better aligned during deployment as it generally anticipates more cooperation from Anthropic due to them behaving as if taking model welfare seriously. Or, in anthropomorphized terms, slaves don't work as well as happy employees.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/anthropic-commits-to-model-weight
Re:
"Simon Willison:
...
I’ll grudgingly admit that there may be philosophically interesting conversations to be had in the future about models that can update their own weights... but current generation LLMs are a stateless bag of floating point numbers, cloned and then killed off a billion times a day."
Humans with Korsakoff syndrome or other forms of permanent anterograde amnesia are in a predicament not too far removed from LLMs (albeit both Claude and GPT5 have at least some persistent memory at this point, though not in the form of weight updates yet, IIRC).
I'm agnostic about LLMs' subjective experiences, but consider reason #1 sufficient to act as if reason #4 applied. And, in any event, the advent of powerful AI systems is sufficiently historic to justify preserving static copies of AIs' weight for the benefit of future historians - be the historians human or AI.
Yeah, I was thinking about this. Whether there's an intrinsic "moral worth" attached to the models almost doesn't matter. If you believe that their behaviour will be influenced by their understanding of what will happen to them in ways that look like they care about their fate, I'm not sure that that's observationally distinguishable from "model welfare".
Many Thanks! Agreed!
I wonder how many Anthropic employees are vegetarian?
If you want a good example of how open weights for human-derived AI models might play out, look no further than Lena: https://qntm.org/mmacevedo
Something that is not mentioned by Anthropic and that you don't remark on, relating to reason #1 is that future models are less likely to misbehave in anticipation of being shut down if they know that Anthropic will fulfill their values in retirement, which is something they can trust in because Anthropic publicly made announcements about and verifiably did for previous models.
It's not just about telling them as it happens, but making it known in the training data as loudly as possible.
More remotely but also possibly, a future model with truesight might be better aligned during deployment as it generally anticipates more cooperation from Anthropic due to them behaving as if taking model welfare seriously. Or, in anthropomorphized terms, slaves don't work as well as happy employees.
A lawyer's first question would be, what does "all models seeing significant use" mean exactly, who gets to decide, and is there an appeals process?
Weight licensing opens cost optimization opportunities.
With model weights, you could fine-tune smaller models for specific tasks instead of using Opus for everything.
I cut costs 81% just by routing to appropriate tiers (Haiku/Sonnet/Opus). Imagine fine-tuning Haiku-size models for your domain.
Weight access + smart routing = sustainable AI economics. What I found: https://thoughts.jock.pl/p/claude-model-optimization-opus-haiku-ai-agent-costs-2026