Ya, as soon as I heard it was released by a hedge fund I thought "Strategy: Heavily short NVDA. Release a good model free, claiming (falsely) that it was trained without a lot of compute. Profit."
OTOH, if they really found a trick to train a good model with relatively tiny quantities of compute, we should all be shorting NVDA.
What has struck me as odd is how many people seem to know how to properly prompt a reasoning model.
Like there's no way all these normies 1) know that you prompt a reasoning model differently than a chatbot and 2) they can prompt a reasoning model well.
That's not to say that DeepSeek's innovations aren't real, just that I think a lot of ppl on Twitter are faking being impressed to look sophisticated.
IMO main thing is that it's easy to get to #1 in the app store because Iphone growth isn't growing anymore. Most people in the market for chatbots downloaded the chatgpt app months ago.
Meta should be panicking. But they do "panic" well. They have a habit of being shocked by some startup, spending 10x the money to replicate the startup's features a year late, feeling like losers, and then slowly using their infrastructure, power of crosspromotion, and advertising relationships to grind out a winning position in the market.
"I traded accordingly" might mean "I bought more shares at a discount" or "I pared down my position to minimize further losses." The ambiguity looks intentional to me...
I didn't see this mentioned, but Deepseek seems to think it was made by OpenAI. I've seen several screenshots now where Deepseek refused queries on the basis of "OpenAI policy." What does this mean exactly?
One of the first things I noticed when using Deepseek for the first time a few days ago was how similar its personality was to ChatGPT. It seemed suspicious to me at the time, and now I'm even more suspicious.
It could mean that DeepSeek was trained on internet data, and there are lots of ChatGPT prompts/outputs available on the internet. And/or it could mean that DeepSeek was trained on ChatGPT outputs generated for the purpose by the DeepSeek team.
"If I am Meta or Microsoft or Amazon or OpenAI or Google or xAI and so on, I want as many GPUs as I can get my hands on, even more than before. I want to be scaling. Even if I don’t need to scale for pretraining, I’ll still want to scale for inference. If the best models are somehow going to be this cheap to serve, uses and demand will be off the charts. And getting there first, via having more compute to do the research, will be one of the few things that matters."
I kind of believe it, but... it's a little counterintuitive. If top-tier open models exist, does it really make sense to "get there first"? If you could bank on top-tier open source reasoning model always being 2 months behind the closed source ones, then maybe save a billion dollars.
Counterarguments to that:
- You can't rely on it, so you still need to do the work just in case.
- Megacorporations want to control their own destiny (possibly for good reason, possibly not)
- At some point in the exponential curve, 2 months will make a huge difference. (Shareholders are thinking about that scenario though and may start to push back harder.)
- For Microsoft/Amazon/Google specifically, it would still make sense to buy GPUs to serve up that inference (but not build the model yourself).
I've been in EA spaces for a long time and I still don't understand how they expect to solve coordination problems around Picking Up The Phone?
Imagine you're FDR in 1939 and you have the order to start the Manhattan Project on your desk, waiting for signature.
You could get Churchill, Hitler, Stalin and Hirohito in the phone and say "look guys it would be super bad if we developed nuclear weapons because the x-risk is like, really high" and they all agree, nobody will develop nuclear weapons.
Do we think it ends there? Do you trust any of these people? We couldn't even confirm that Iraq didn't have WMDs, and hallucinated that they did, even though weapons inspectors had full access to the country.
This whole enterprise seems very naive to the prisoners dilemma.
If one country develops nukes, and the others don't, that's really good (at least on the scale of a few years) for that country. It's bad, but probably not existentially bad, for everyone else. It's bad in the long run for everyone, probably, but there's no realistic hope of delaying any technology indefinitely, so at worst you accelerate timescales by a few years.
If one country develops ASI, everybody dies. That's terrible for everyone, including the country responsible for it. True, the same arguments about inevitability apply, but even delaying the apocalypse a few years is pretty good, and the more its delayed, the more the chance of figuring out how to get the good version.
You're letting your own viewpoint cloud your reasoning.
Imagine you are sitting around the table. Each player is thinking different variations of "wow ASI sounds really powerful and dangerous! But I know narrow ASI can't cause much harm or we would have died to AlphaGo. So how do I narrow an ASI so I can beat all the rest of these n00bs"....
Or "what if I boxed it in a way that it's theoretically impossible to escape".
Or "I am not an idiot, I will just make AGI and then boost its performance just a little bit. Should be fine".
Or "what if I approximated an ASIs performance by using swarms of dumber AGI level agents. Since swarm members are in an adversarial relationship they can't team up.."
I am sure, from your view, you are going to claim:
1. You "know" none of the above will work
2. Its too risky to try
But try to see it from others perspectives. It's completely reasonable to believe that it can be made to work, and that the risks are within your risk tolerance.
AGI isn't Prisoner's Dilemma; it's Stag Hunt. The best outcome even selfishly is Cooperate/Cooperate, not Defect/Cooperate, because if you don't build AGI and I do, then I probably die to misalignment which is worse for me than "nobody builds it".
You can at least guarantee the AI inspectors full access with "or we nuke you", since "if you do that we'll nuke you back" is void in the face of X-risk and as such the "or we nuke you" can be 100% serious. The ideal is that a bunch of nations do this, so everyone watches everyone else.
(Also, nuclear weapons aren't an X-risk. You can argue that they were before we knew that they wouldn't ignite the atmosphere, but not after Trinity.)
> AGI and I do, then I probably die to misalignment
[citation needed]
P(doom) is not so high, even among AI researchers, that nations are willing to do whatever is necessary to enforce a global arms embargo against AGI research (or else we nuke you).
Regarding the Chatbot Arena Leaderboard: you should just use the "Hard Prompts" category with the "Style Control" filter applied.
This has the ranking as:
1) o1
1) DeepSeek R1
1) Gemini 2.0 Pro Experimental
1) Gemini 2.0 Flash Thinking
2) o1-preview
3) Claude 3.5 Sonnet (New)
If you still think the Gemini models are ranked too high, I would point you to e.g. https://livebench.ai/, which also has them near the top.
I think they're both a bit underrated at this point, since 1.0 and 1.5 were pretty bad and people assume they're still bad (also, 2.0 Flash Thinking is only available via API / AI Studio).
Note that if you specifically care about coding, Sonnet moves up on both of these leaderboards.
There are reports that although Deepseek did their training on NVIDIA chips, they are doing inference on Huawei hardware (910c). Possibly bad news for NVDIA, if true…
"Panic At the AppStore" <- excellent work, sir
You might have seen already but Nvidia is now down 17%. Largest one-day single-stock market cap loss in history?
overreaction?
Is that true? I guess I was in crypto for too long, that number sounded pretty small.
Market cap loss is percent drop * original market cap, and the latter is exceptionally large here
Oh man I can't read. Yes, I'm pretty sure it is.
Ya, as soon as I heard it was released by a hedge fund I thought "Strategy: Heavily short NVDA. Release a good model free, claiming (falsely) that it was trained without a lot of compute. Profit."
OTOH, if they really found a trick to train a good model with relatively tiny quantities of compute, we should all be shorting NVDA.
What about the business model of "short Nvidia and Microsoft, release model, cash in, repeat"
What has struck me as odd is how many people seem to know how to properly prompt a reasoning model.
Like there's no way all these normies 1) know that you prompt a reasoning model differently than a chatbot and 2) they can prompt a reasoning model well.
That's not to say that DeepSeek's innovations aren't real, just that I think a lot of ppl on Twitter are faking being impressed to look sophisticated.
IMO main thing is that it's easy to get to #1 in the app store because Iphone growth isn't growing anymore. Most people in the market for chatbots downloaded the chatgpt app months ago.
Meta should be panicking. But they do "panic" well. They have a habit of being shocked by some startup, spending 10x the money to replicate the startup's features a year late, feeling like losers, and then slowly using their infrastructure, power of crosspromotion, and advertising relationships to grind out a winning position in the market.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/deepseek-panic-at-the-app-store
"I traded accordingly" might mean "I bought more shares at a discount" or "I pared down my position to minimize further losses." The ambiguity looks intentional to me...
He bought more https://x.com/thezvi/status/1883892612622581924?s=46
Confirmed,. I was trying to avoid accidental 'investment advice' not trying to be ambiguous.
I didn't see this mentioned, but Deepseek seems to think it was made by OpenAI. I've seen several screenshots now where Deepseek refused queries on the basis of "OpenAI policy." What does this mean exactly?
One of the first things I noticed when using Deepseek for the first time a few days ago was how similar its personality was to ChatGPT. It seemed suspicious to me at the time, and now I'm even more suspicious.
It could mean that DeepSeek was trained on internet data, and there are lots of ChatGPT prompts/outputs available on the internet. And/or it could mean that DeepSeek was trained on ChatGPT outputs generated for the purpose by the DeepSeek team.
This kind of mistake happens all of the time with all of the models, now that they're training data set is contaminated with the output of other LLMs.
It is trivial to get Opus to make the same mistake re: OpenAI policy the same way, for instance.
Altho tbf, it's mostly about 'non-openAI LLMs contaminated by OpenAI output' and less about the reverse
"If I am Meta or Microsoft or Amazon or OpenAI or Google or xAI and so on, I want as many GPUs as I can get my hands on, even more than before. I want to be scaling. Even if I don’t need to scale for pretraining, I’ll still want to scale for inference. If the best models are somehow going to be this cheap to serve, uses and demand will be off the charts. And getting there first, via having more compute to do the research, will be one of the few things that matters."
I kind of believe it, but... it's a little counterintuitive. If top-tier open models exist, does it really make sense to "get there first"? If you could bank on top-tier open source reasoning model always being 2 months behind the closed source ones, then maybe save a billion dollars.
Counterarguments to that:
- You can't rely on it, so you still need to do the work just in case.
- Megacorporations want to control their own destiny (possibly for good reason, possibly not)
- At some point in the exponential curve, 2 months will make a huge difference. (Shareholders are thinking about that scenario though and may start to push back harder.)
- For Microsoft/Amazon/Google specifically, it would still make sense to buy GPUs to serve up that inference (but not build the model yourself).
I've been in EA spaces for a long time and I still don't understand how they expect to solve coordination problems around Picking Up The Phone?
Imagine you're FDR in 1939 and you have the order to start the Manhattan Project on your desk, waiting for signature.
You could get Churchill, Hitler, Stalin and Hirohito in the phone and say "look guys it would be super bad if we developed nuclear weapons because the x-risk is like, really high" and they all agree, nobody will develop nuclear weapons.
Do we think it ends there? Do you trust any of these people? We couldn't even confirm that Iraq didn't have WMDs, and hallucinated that they did, even though weapons inspectors had full access to the country.
This whole enterprise seems very naive to the prisoners dilemma.
Agreed. I think it's either that or despair, though.
The cases are very much not parallel.
If one country develops nukes, and the others don't, that's really good (at least on the scale of a few years) for that country. It's bad, but probably not existentially bad, for everyone else. It's bad in the long run for everyone, probably, but there's no realistic hope of delaying any technology indefinitely, so at worst you accelerate timescales by a few years.
If one country develops ASI, everybody dies. That's terrible for everyone, including the country responsible for it. True, the same arguments about inevitability apply, but even delaying the apocalypse a few years is pretty good, and the more its delayed, the more the chance of figuring out how to get the good version.
You're letting your own viewpoint cloud your reasoning.
Imagine you are sitting around the table. Each player is thinking different variations of "wow ASI sounds really powerful and dangerous! But I know narrow ASI can't cause much harm or we would have died to AlphaGo. So how do I narrow an ASI so I can beat all the rest of these n00bs"....
Or "what if I boxed it in a way that it's theoretically impossible to escape".
Or "I am not an idiot, I will just make AGI and then boost its performance just a little bit. Should be fine".
Or "what if I approximated an ASIs performance by using swarms of dumber AGI level agents. Since swarm members are in an adversarial relationship they can't team up.."
I am sure, from your view, you are going to claim:
1. You "know" none of the above will work
2. Its too risky to try
But try to see it from others perspectives. It's completely reasonable to believe that it can be made to work, and that the risks are within your risk tolerance.
AGI isn't Prisoner's Dilemma; it's Stag Hunt. The best outcome even selfishly is Cooperate/Cooperate, not Defect/Cooperate, because if you don't build AGI and I do, then I probably die to misalignment which is worse for me than "nobody builds it".
You can at least guarantee the AI inspectors full access with "or we nuke you", since "if you do that we'll nuke you back" is void in the face of X-risk and as such the "or we nuke you" can be 100% serious. The ideal is that a bunch of nations do this, so everyone watches everyone else.
(Also, nuclear weapons aren't an X-risk. You can argue that they were before we knew that they wouldn't ignite the atmosphere, but not after Trinity.)
> AGI and I do, then I probably die to misalignment
[citation needed]
P(doom) is not so high, even among AI researchers, that nations are willing to do whatever is necessary to enforce a global arms embargo against AGI research (or else we nuke you).
Regarding the Chatbot Arena Leaderboard: you should just use the "Hard Prompts" category with the "Style Control" filter applied.
This has the ranking as:
1) o1
1) DeepSeek R1
1) Gemini 2.0 Pro Experimental
1) Gemini 2.0 Flash Thinking
2) o1-preview
3) Claude 3.5 Sonnet (New)
If you still think the Gemini models are ranked too high, I would point you to e.g. https://livebench.ai/, which also has them near the top.
I think they're both a bit underrated at this point, since 1.0 and 1.5 were pretty bad and people assume they're still bad (also, 2.0 Flash Thinking is only available via API / AI Studio).
Note that if you specifically care about coding, Sonnet moves up on both of these leaderboards.
Just a somewhat trivial remark that I’ve noticed you mix up “break” and “brake” reproducibly.
>GFodor: I shudder at the thought I’ve ever posted anything as stupid as these theories, given the logical consequence it would demand of the reader
when someone who can entertain mole people existing thinks your AI skepticism is too far-fetched, maybe time to recalibrate
(i write this with respect to GFodor, his absolutely right here, but mole people don't exist)
There are reports that although Deepseek did their training on NVIDIA chips, they are doing inference on Huawei hardware (910c). Possibly bad news for NVDIA, if true…
I mean they literally can't buy Nvidia chips, so I don't see how this is news?
On their business model: I’m somewhat more convinced now deepseek is a psyop from a (the?) future ASI to drive its creation and accelerate FOOM
Is anyone worried about sleeper agents in R1? Relatedly, does anyone remember sleeper agents?