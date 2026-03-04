Don't Worry About the Vase

Dustin Venini
4h

I'm a big Google fan in general, but the Gemini models so far have been a perfect example of how benchmarks don't reflect real world usage. I've yet to have a situation with Gemini where I felt one of those "wow, it just gets it" moments.

Compa
23m

LiveBench added this disclaimer to Gemini 3.1 Pro, which I found v surprising:

> 5th rank in unseen questions across all categories

They also introduced a "Show High Unseen Question Bias Models" filter; when it's unticked Gemini 3.1 Pro disappears.

This suggests some overt benchmaxing to me, seemingly supported by on-the-ground anecdotes about 3.1 Pro's underwhelming performance in practice.

