15 Comments

when you say the video demo wasn’t faked. what do you mean? i thought the video demo was a dishonest, dramatic reenactment of the actual much more boring behavior. they did not have a video feed as input with gemini ultra responding live with low latency. they fed in still-photos and text prompts together offline and then created this video after the fact in post

edit: ok rereading your words i think i misunderstood you. you’re saying you did indeed see the claims that it was faked, and their blog post describing the multimodal prompts, but you are doubtful that the video really is fake because why would google do that? is my interpretation correct?

Expand full comment

Ah, that would be... more fake than I realized, if that's right, in a 'wow that's really bad' kind of way.

Expand full comment

gotcha

here’s the blog post that people think is the actual demo: https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html

Expand full comment

I was confused by the video where the tester shows Gemini pictures of the sun, Saturn, and Earth, and asks "Is this the right order?".

This puzzle is poorly-worded and has no correct answer. It's the wrong order if you want the planets arranged by proximity to the sun. It's the right order if you want them arranged by mass (heavier --> lighter). Gemini doesn't even know that the left image represents OUR sun. It could be a star outside the solar system.

But then we see the text prompt, which fills in the blanks ("Consider the distance from the sun and explain your reasoning"). But I'm sure GPT4 can solve puzzles like this, so what new ability is demonstrated?

Expand full comment

"I love that this is saying that OpenAI isn’t valuable both because Gemini is so good and also because Gemini is not good enough."

Why do you continue quoting Gary Marcus? I feel like he passed the "not a serious person" threshold a while ago, and while it's occasionally fun to dunk on him like this, there are uncountably many posters with dunkable takes, it's not like this is Yan Lee Cunn or someone else in a position to make their silly opinions matter.

Expand full comment

Love the image of the barely significant p-value!

Expand full comment

"Certainly they are a long way from ‘beat OpenAI’ but this is the first and only case where someone might be in the game."

Maybe -- but what we have right now is benchmarks, and it's worth remembering how many models we've seen that looked good on benchmarks but turned out to be pretty bad in practice. I think we can't be very confident that they have a serious contender until we can actually start playing with Ultra in the wild.

Expand full comment

Says a lot that OpenAI has an API and Gemini does not.

One is letting everyone kick their tires, the other is not.

Expand full comment

To be fair I don't think GPT4 had day-one API access either.

Expand full comment

Yah but Bard has been around since March and has never had an API.

Expand full comment

General style question: when you are putting two quotes from separate threads one after the other, why do put them in a single quote block, rather than each in their own quote block? It makes it harder to parse, even with the written explanation.

Similarly with quote tweets coming after the response tweet with no indication beforehand

Expand full comment

When the alternative seems worse, I do it and note I'm doing it, for flow.

Expand full comment

The 2nd case seems harder, but would two successive quote blocks really look bad? I feel like I see that all the time

Expand full comment

Benchmark hacking. Fake videos. No details on how the model actually works. I love AI in 2023 so much!!

Google has apparently trained a model about equal (perhaps slightly better) than GPT4. Depending on how you look at it, the glass is either half full ("OpenAI now has competition!") or half empty ("Google has equalled OpenAI's progress as of August 2022").

Considering the hype (Gemini was supposed to use 4x the compute of GPT4, and implement all kinds of nifty MCTS tricks), this is substantially less than promised, and the release has really soured me on the whole thing.

>I love that ‘above 90%’ turns out to be exactly 90.04%

It's like when the teacher assigns a 2000-word report and you turn in one that's precisely 2001 words long. I'm imagining Sundar Pichai screaming at the eng team "I don't care what you do! Get a 90% MMLU out of this model or no pay bonus!"

Expand full comment