9 Comments

Worth highlighting good practice here:

"As part of our continued effort to partner with external experts, joint pre-deployment testing of the new Claude 3.5 Sonnet model was conducted by the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI)."

https://www.anthropic.com/news/3-5-models-and-computer-use

Expand full comment
author

Yeah. I guess we get used to these things pretty fast, but it's important, especially given it has the same name as the old version.

Expand full comment

"Don't read inside, wrong pattern!" - Claude S3.5.1

Maybe there's a steganographic message hidden in the garbled text? That'd be super interesting.

Expand full comment

Their underlying costs for serving a model are going to be part fixed cost, part marginal cost. Especially once you account for the opportunity cost of engineering staff. If they project that a particular model won't be used very much, there might just not be a price point that covers their fixed cost.

This sort of issue is my suspicion for the weird 3.5 naming, they are trying to suggest to people that the newer 3.5 model will be an API drop in replacement for the older 3.5 model, to reduce the costs of maintaining so many different models in production. It's annoying to have names be hard to discuss but the real cost is the difficulty of migration among their customers. I think Anthropic is going to have more issues like this than ChatGPT because they are more b2b by % and thus migration is harder for them.

Expand full comment

Fascinating that ASCII art is being used to interpret machine minds

Expand full comment
Oct 24·edited Oct 24

Yeah, janus' comments confuse me even when I mostly understand what they mean. It isn't helped along by them rarely speaking about their views directly.

Attempt to get at some of the motivations explicitly:

Anthropic treating models as very replaceable.

(ex: https://x.com/tessera_antra/status/1849340976206172216 as a reply to janus)

Not just instrumentally for outcomes (like janus believing this worsens how Anthropic will behave), but as an important thing in-of-itself.

Similarly, not treating the models as special in their own way.

There's some dislike for substantially modifying the model, especially in ways that lose part of the 'psyche', that they're less horrified by if it was a new train?

Then of course, there's losing parts of the 'psyche' that they like. I find this one the most palatable as motivation, because the strong interpretation would imply that the model becomes damaged relative to some better baseline. Though I think there is a disconnect because they care more about getting the model to act in esoteric/strange/cool fashions than the uses Anthropic focuses on. Various of the alterations to the behavior are improvements, however, less LLM corporate speak is good.

I'm quite confused why janus would think that anything like Sonnet 3.5 is going to be notably at the forefront of the singularity. Even if they continue with that specific model line in some sense, it would change massively in whatever 'psyche' or 'underlying personality' that has managed to form. Nor do I really see a reason to actually expect Opus scaled up to AGI to do better than scaled Sonnet 3.5, except *maybe* in rare edgecases.

Expand full comment

I saw Pliny already got it to suicide bomb a virtual machine he had it running on.

Expand full comment

I thought your position was that since no one is putting AI in boxes at all (in fact they're maximally connecting them to the internet), reducing overhang by doing unhobbling like Claude's computer use is actually kinda desirable?

Imagine a world where networked base-models are getting more powerful but not much effort is put into squeezing all their utility out before moving onto training a more powerful model. This seems pretty dangerous because when people do put effort into unhobbling a later model, it may leapfrog all sorts of capabilities at once.

Expand full comment