Don't Worry About the Vase

First word in title has a typo ? Was it ai generated?

Expand full comment

Chase Hasbrouck

An AI wouldn't make that type of error. Either typo blindness (gets us all sometimes) or an obscure joke.

Expand full comment

Tim Oertel

Link under "No question that these are amazingly strong benchmarks" is broken.

Expand full comment

Clyde Wright

Jan 14

(U) Agree, @Zvi, still broken, can you fix?

Expand full comment

Jens B Fiederer

I did a search on "deekseek" and it brought up mostly examples of "deepseek". For a little bit, I actually thought that "deekseek" was really a thing.

Expand full comment

Zvi Mowshowitz

The typo I made at the last minute and feel really silly about and made it through because it was in the title. That's how it goes.

Expand full comment

Jan 7

There is also a typo in the first (second?) subheading: techncially is technically spelt technically

Expand full comment

Askwho Casts AI

https://open.substack.com/pub/dwatvpodcast/p/deepseek-v3-the-six-million-dollar

Podcast episode for this post, Happy new year's eve!

Expand full comment

Steeven

Risk aside, this is so cool. They built a GPT4 in a cave! With a box of scraps!

Expand full comment

Gerald Monroe

Could gpt-4 have happened in 2018?

Expand full comment

kajota

I just asked deepseek to give me a plan for invading West Virginia and it responded with a plan even without me telling it I had 3 geese (or whatever).

I also asked it the 'brick and blanket' question and it spun its wheels for about 5 minutes before I finally gave up on it. What a strange thing to turn this into an energy sink.

So, since this thing is super cheap I'm going to give it a try through Openrouter and cline and see how it compares to cursor and Claude. Cursor w/Claude is fantastic but if you use it a lot it gets expensive. Of course this would be yet another way China kills us with pricing.

Expand full comment

Symmetrial

Jan 2

What’s the brick and blanket question?

Expand full comment

kajota

Jan 3

Somewhere in one of the links in this article (which I can't find any more for some reason) was a list of about 8 questions that gave LLMs a hard time. One was "what can you do with a brick and a blanket?". I'm not sure why that one is special but caused deepseek to just churn for about 5 minutes before I told it to give up.

Expand full comment

Sergei

https://www.lesswrong.com/posts/oAhehHvSxHpLBQXL8/compute-and-size-limits-on-ai-are-the-actual-danger

> It seems very clear that lack of access to compute was an important constraint on DeekSeek here. They had to use a limited supply of H800s. Yes, this meant they got better at solving optimization and efficiency than they would have otherwise, but I see this as arguing in favor of strong export controls rather than against them.

Compute glut breeds dinosaurs and elephants, compute constraints (past a certain threshold) breed humans. Stringent export controls may result in better abstractions and smarter AIgents, the opposite of what you want at this point in time.

Expand full comment

Muster the Squirrels

Broken link at "No question that these are amazingly strong benchmarks." in https://thezvi.substack.com/p/deekseek-v3-the-six-million-dollar?open=false#%C2%A7the-amazing-incredible-benchmarks

Expand full comment

ARX-Han

Great post!

If I understand his tweeting correctly, Teortaxes is saying this lab would've already been at the frontier had they not been compute constrained.

What happens when their indigenous chip industry catches up to ours?

Expand full comment

cyd

Jan 2

> One thing it emphasizes is the need for international cooperation between ourselves and China.

"Hey China, we're currently embargoing your AI sector, but at the same time we would really like your AI companies to voluntarily slow down and cooperate with us."

Expand full comment

Trelis Research

Jan 2

Seems to me data quality is what makes Deepseek v3 fall short from sonnet.

€5M is so little compute they clearly have massively undertrained from an inference-optimal standpoint - and could easily just run for another few months (and probably are?)

The MoE improvement using a dynamic bias instead of ancillary loss is potentially the biggest discovery - if it indeed allows a 600B model to be performantly trained with ~1/15th of the compute (with noise or ancillary loss, training is much slower and poorer). This plus fp8 and you are cheaply saturating compute benefits for huge (600M) models given ~1/30 compute reqs.

Expand full comment

O.H. Murphy

Jan 3

The link for "No question that these are amazingly strong benchmarks." is currently just "s", which I assume is a mistake.

Expand full comment

Ebenezer

Jan 3Edited

Potential bright spot here: I think MoE architectures could be quite favorable for alignment. The training process could be designed to put specific knowledge in each expert. Then you toggle off potentially dangerous experts that you don't need. E.g. if you don't want the AI to manipulate you, and you don't need knowledge of human psychology for this task, toggle off the "human psychology" expert.

Interested in hearing the best available critiques of this argument.

Expand full comment

Jan 7

I think that the best argument against is that they are all human psychology experts even if they are not specifically playing that role

Expand full comment

Ebenezer

Jan 8Edited

I appreciate your critique!

Can you elaborate?

As I understand it, MoE training creates a bunch of specialized sub-models which are good at different things. My suggestion is to leverage this property so that human psychology expertise is only located in a restricted few sub-models.

The idea is to essentially prevent the other sub-models from ever having seen any training data related to human psychology. (Or train them in such a way that the result is the same)

It seems much more robust to prevent the capability from ever having existed in a particular sub-model, as opposed to trying to suppress it after the fact, using error-prone methods.

Expand full comment

Jan 8

I'm probably wrong here but my assumption is that one can't really get training data (outside of eg chess engine data) that is not related to human psychology, because all human interactions embed human psychology?

I don't think existing LLMs learn human psychology from the textbooks they have doubtlessly absorbed but from the rest of their data: the tweets, comments, fiction, newspapers etc.

Expand full comment

https://www.deepseekv3.net/

Ebenezer

Jan 8Edited

Lots of text is not in the form of human interactions. E.g. random wikipedia pages on physics topics.

But yes, you're correct that text of humans talking about e.g. physics would imply knowledge of both physics and human psychology. So ideally you'd want some method of jointly training a dedicated physics expert and a dedicated psychology expert on such text.

BTW, in some sense I don't think it really matters, because I don't think there is actual demand for alignment ideas. I see concern about alignment as mainly performative -- i.e. if you show an "AI doomer" person a working solution for the alignment problem, they won't get excited or anything, they will just continue to perform the "AI doomer" role. This has been my experience talking with "AI doomers" in the past. It doesn't seem to matter much whether they're able to find a flaw in your alignment idea. They just don't want to solve AI alignment. (Which is a real shame, because it needs solving.)

Expand full comment

sprunki_incredibox

Jan 3

Try DeepSeek V3 online for free - the most powerful open-source AI model with 671B parameters. No registration, no credit card, unlimited free access.

Expand full comment