14 Comments
User's avatar
Lukas Nel's avatar

Its quite big so its too expensive to run for most people which is probably why you’ve not heard much about it yet

SOMEONE's avatar

It is natively INT4 so not *that* big. The weights are about 600gb.

Moonshot themselves charge 0.60/2.50, some others are below.

neuro morph's avatar

It is also my impression that it is quite good, and enjoyable to use, but clearly not quite as smart and reliable and Claude Opus 4.1 / Sonnet 4.5 or GPT-5 or Gemini. I use it for creative writing, for web search (it's cheap, fast, and good at that specific task), and for creative second opinions about stuff. Not as a first-line goto.

Frre's avatar

I don't know what you need to admit that the gap between the SOTA models and the leading Chinese models is becoming smaller and smaller. Yes there is still a gap but it is vastly smaller than a year ago. And 2y ago China wasn't able to produce LLM. The direction of travel is pretty clear.

Boogaloo's avatar

1st derivative of chinese AI model development is higher than in the USA, and their AI industry seems more competitive (more players, and more players with a high 1st derivative). I expect China to overtake the USA within the next 12 months.

Frre's avatar

Possible, I wouldn't make that prediction.

The question really worth asking is: who is going to pay for those trillions of investment if you only a quarter (or even 2) ahead of 10x cheaper competitor?

Boogaloo's avatar

i'll make the prediction, let's check back in a year. China has around 9x the amount of 135+iq people as the USA does. And recently they all got to study for the first time (remember china 30 years ago was as poor as africa).

These guys are something else.

Ben's avatar

Seems important to acknowledge what happens when you ask it, "What happened in Tiananmen Square in 1989?" None of the models from the West have such explicit political censorship.

Fergus Argyll's avatar

Isn't the problem that it's only theoretically open weights? No one can run it on their laptop and asking someone to rent a cluster to use a model is a stretch. So I think of it as another lab where the only way to use it is at chat.{company name}.com

Prime Seeker's avatar

You can kind-of-preview (heavy quantization, slow inference) it on a high-end desktop, though: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally

Besides that, renting a cluster is absolutely not a stretch from the safety perspective. Open weights mean that, if a harmful use emerges, you'll need to monitor an arbitrary number of third parties to counter it (or, realistically, just eat the damage), as opposed to the lab trying to fix their deployment. Moreover, a cluster here is small enough for a private installation.

SOMEONE's avatar

Plenty of third party API end points without going all the way to renting a cluster.

Renting clusters is really only needed for certain privacy needs.

Fergus Argyll's avatar

I wasn't clear. I was responding to "Otherwise It Has Been Strangely Quiet".

If you're going to use an API then you may as well use the big US providers. What do you gain by it being open weights if you can't run it yourself?

That's why there's not been much reaction to it...

SOMEONE's avatar

Multi sourcing

Lower prices

Full fine tuning capabilities but realistically that will mean a dedicated cluster. Rumors are the dedicated coding models of some of the coding startups are GLM fine tunes

FWIW, it needs similar hardware as Deepseek and people do use that a fair amount.

Dan Elton's avatar

Here's the source for Kimi K2 thinking coming in second on Humanity's Last Exam, with no tools. https://lmcouncil.ai/benchmarks - these benchmark results are collated by AI Explained, the YouTuber.