Discussion about this post

User's avatar
Mo Diddly's avatar

“Demos sound great, we are so back indeed, quick sampling seemed good too, but these are all obviously cherry-picked.”

Periodic reminder that the human art you have been exposed to is also extremely cherry picked

Expand full comment
bakkot's avatar

> A developer called Justine claims they got Llamafile to run LLMs 30%-500% faster on regular local machines (looks like mostly 50%-150% or so?) via some basic performance optimizations.

To be clear, this is a CPU-only improvement, which means that it makes the very very slow option slightly less slow. If you want something fast enough to be usable for models large enough to be useful, you need to be using a GPU.

Of course faster CPU support is nice because you can use models which don't fit on your GPU, such as the new Mistral model (Mixtral 8x22B), which even at 4-bit quantization needs something like 80GB of memory. But performance is going to be abysmal.

Expand full comment
22 more comments...

No posts