19 Comments

First word in title has a typo ? Was it ai generated?

Expand full comment

An AI wouldn't make that type of error. Either typo blindness (gets us all sometimes) or an obscure joke.

Expand full comment

Link under "No question that these are amazingly strong benchmarks" is broken.

Expand full comment

I did a search on "deekseek" and it brought up mostly examples of "deepseek". For a little bit, I actually thought that "deekseek" was really a thing.

Expand full comment

The typo I made at the last minute and feel really silly about and made it through because it was in the title. That's how it goes.

Expand full comment

Podcast episode for this post, Happy new year's eve!

https://open.substack.com/pub/dwatvpodcast/p/deepseek-v3-the-six-million-dollar

Expand full comment

Risk aside, this is so cool. They built a GPT4 in a cave! With a box of scraps!

Expand full comment

Could gpt-4 have happened in 2018?

Expand full comment

I just asked deepseek to give me a plan for invading West Virginia and it responded with a plan even without me telling it I had 3 geese (or whatever).

I also asked it the 'brick and blanket' question and it spun its wheels for about 5 minutes before I finally gave up on it. What a strange thing to turn this into an energy sink.

So, since this thing is super cheap I'm going to give it a try through Openrouter and cline and see how it compares to cursor and Claude. Cursor w/Claude is fantastic but if you use it a lot it gets expensive. Of course this would be yet another way China kills us with pricing.

Expand full comment

What’s the brick and blanket question?

Expand full comment

Somewhere in one of the links in this article (which I can't find any more for some reason) was a list of about 8 questions that gave LLMs a hard time. One was "what can you do with a brick and a blanket?". I'm not sure why that one is special but caused deepseek to just churn for about 5 minutes before I told it to give up.

Expand full comment

> It seems very clear that lack of access to compute was an important constraint on DeekSeek here. They had to use a limited supply of H800s. Yes, this meant they got better at solving optimization and efficiency than they would have otherwise, but I see this as arguing in favor of strong export controls rather than against them.

Compute glut breeds dinosaurs and elephants, compute constraints (past a certain threshold) breed humans. Stringent export controls may result in better abstractions and smarter AIgents, the opposite of what you want at this point in time.

https://www.lesswrong.com/posts/oAhehHvSxHpLBQXL8/compute-and-size-limits-on-ai-are-the-actual-danger

Expand full comment

Great post!

If I understand his tweeting correctly, Teortaxes is saying this lab would've already been at the frontier had they not been compute constrained.

What happens when their indigenous chip industry catches up to ours?

Expand full comment

> One thing it emphasizes is the need for international cooperation between ourselves and China.

"Hey China, we're currently embargoing your AI sector, but at the same time we would really like your AI companies to voluntarily slow down and cooperate with us."

Expand full comment

Seems to me data quality is what makes Deepseek v3 fall short from sonnet.

€5M is so little compute they clearly have massively undertrained from an inference-optimal standpoint - and could easily just run for another few months (and probably are?)

The MoE improvement using a dynamic bias instead of ancillary loss is potentially the biggest discovery - if it indeed allows a 600B model to be performantly trained with ~1/15th of the compute (with noise or ancillary loss, training is much slower and poorer). This plus fp8 and you are cheaply saturating compute benefits for huge (600M) models given ~1/30 compute reqs.

Expand full comment

The link for "No question that these are amazingly strong benchmarks." is currently just "s", which I assume is a mistake.

Expand full comment

Potential bright spot here: I think MoE architectures could be quite favorable for alignment. The training process could be designed to put specific knowledge in each expert. Then you toggle off potentially dangerous experts that you don't need. E.g. if you don't want the AI to manipulate you, and you don't need knowledge of human psychology for this task, toggle off the "human psychology" expert.

Interested in hearing the best available critiques of this argument.

Expand full comment

Try DeepSeek V3 online for free - the most powerful open-source AI model with 671B parameters. No registration, no credit card, unlimited free access.

https://www.deepseekv3.net/

Expand full comment