The weights are up on huggingface, but the usual inference providers (other than DeepSeek themselves) aren’t hosting it yet. And it’s too large to boot up on most “run an arbitrary model” providers — they can do 70B. It may be a while longer before people have had a chance to try it.
Chinese chipmaker stocks did pop following this release. Although training was not super extensive, if it was done entirely on domestic hardware, then this should be a significant update. Pangu Ultra was upcycled and not remotely close to frontier-level capabilities, so v3.1 could mark a major breakthrough depending on the HW stack.
We gave them no choice but to move to domestic chips. All we are doing is creating a global competitor for a market we dominated and everyone was happy buying from us.
I was really excited for it since in a lot of cases I really loved the V3 but the improvement seems so marginal compared to the rate of improvement in other companies, I wonder if there's another release soon to follow maybe?
> The dumbest style of reaction is when a company offers an incremental improvement (see: GPT-5) and people think that means it’s all over for them, or for AI in general, because it didn’t sufficiently blow them away. Chill out.
Please take this more seriously, we've now had multiple attempts at scaling which have failed to deliver what the creators have hoped for (gpt 4.5, 5, llama 4, now deepseek) it *is* a pattern. No it doesn't mean Gary Marcus was right and it's over but you really do need to take it more seriously.
Ask questions like; which scaling laws are we pretty sure still survive, what are signs the labs still plan on bigger training runs, is there a financial reason why the labs have not yet scaled successfully, did we expect more than this currently etc etc.
The other thing to watch here is the inevitable window between a big model release and people actually getting it to run reliably.
You often need:
- Unsloth quants.
- Unsloth template bug fixes.
- llama-server and/or vLLM patches.
- Enough providers on Open Router, etc., to actually fix their first buggy serving setup.
Until this stuff gets fixed, a lot of models look painfully bad. And some very nice models are never properly supported in common tools.
Now, since this is a 3.1 refresh, it's probably not suffering from as many tooling problems. But this is a real issue. People are _still_ barely getting GLM 4.5 & the Air version running, and it's a great model that's been out for a while now.
It's a hybrid model. Qwen moved away from them again and got massive improvement between qwen3 and qwen3 2507 so I wonder if this is in play here too.
What is interesting that they also released the base model.
The weights are up on huggingface, but the usual inference providers (other than DeepSeek themselves) aren’t hosting it yet. And it’s too large to boot up on most “run an arbitrary model” providers — they can do 70B. It may be a while longer before people have had a chance to try it.
I will say that I spent about $100 on R1 0528 this month, as a rough indication of my usage level of DeepSeek models.
Ok, fireworks.ai say they’re hosting it.
A few now do: https://openrouter.ai/deepseek/deepseek-chat-v3.1
Chinese chipmaker stocks did pop following this release. Although training was not super extensive, if it was done entirely on domestic hardware, then this should be a significant update. Pangu Ultra was upcycled and not remotely close to frontier-level capabilities, so v3.1 could mark a major breakthrough depending on the HW stack.
We gave them no choice but to move to domestic chips. All we are doing is creating a global competitor for a market we dominated and everyone was happy buying from us.
Maybe we'd rather have more competition for chips and less competition for AI.
Maybe we lose both and have an adversarial relationship with the winner.
That would definitely not be an ideal outcome!
I was really excited for it since in a lot of cases I really loved the V3 but the improvement seems so marginal compared to the rate of improvement in other companies, I wonder if there's another release soon to follow maybe?
> The dumbest style of reaction is when a company offers an incremental improvement (see: GPT-5) and people think that means it’s all over for them, or for AI in general, because it didn’t sufficiently blow them away. Chill out.
Please take this more seriously, we've now had multiple attempts at scaling which have failed to deliver what the creators have hoped for (gpt 4.5, 5, llama 4, now deepseek) it *is* a pattern. No it doesn't mean Gary Marcus was right and it's over but you really do need to take it more seriously.
Ask questions like; which scaling laws are we pretty sure still survive, what are signs the labs still plan on bigger training runs, is there a financial reason why the labs have not yet scaled successfully, did we expect more than this currently etc etc.
Engage with it
DeepSeek did not really scale up in this case. Same old 671b model as V3 late last year, possibly with more or different RL on top...
In this case, we know why they did not scale up - the Huawei setup did not run stable enough.
The other thing to watch here is the inevitable window between a big model release and people actually getting it to run reliably.
You often need:
- Unsloth quants.
- Unsloth template bug fixes.
- llama-server and/or vLLM patches.
- Enough providers on Open Router, etc., to actually fix their first buggy serving setup.
Until this stuff gets fixed, a lot of models look painfully bad. And some very nice models are never properly supported in common tools.
Now, since this is a 3.1 refresh, it's probably not suffering from as many tooling problems. But this is a real issue. People are _still_ barely getting GLM 4.5 & the Air version running, and it's a great model that's been out for a while now.