when you say the video demo wasn’t faked. what do you mean? i thought the video demo was a dishonest, dramatic reenactment of the actual much more boring behavior. they did not have a video feed as input with gemini ultra responding live with low latency. they fed in still-photos and text prompts together offline and then created this video after the fact in post
edit: ok rereading your words i think i misunderstood you. you’re saying you did indeed see the claims that it was faked, and their blog post describing the multimodal prompts, but you are doubtful that the video really is fake because why would google do that? is my interpretation correct?
I was confused by the video where the tester shows Gemini pictures of the sun, Saturn, and Earth, and asks "Is this the right order?".
This puzzle is poorly-worded and has no correct answer. It's the wrong order if you want the planets arranged by proximity to the sun. It's the right order if you want them arranged by mass (heavier --> lighter). Gemini doesn't even know that the left image represents OUR sun. It could be a star outside the solar system.
But then we see the text prompt, which fills in the blanks ("Consider the distance from the sun and explain your reasoning"). But I'm sure GPT4 can solve puzzles like this, so what new ability is demonstrated?
"I love that this is saying that OpenAI isn’t valuable both because Gemini is so good and also because Gemini is not good enough."
Why do you continue quoting Gary Marcus? I feel like he passed the "not a serious person" threshold a while ago, and while it's occasionally fun to dunk on him like this, there are uncountably many posters with dunkable takes, it's not like this is Yan Lee Cunn or someone else in a position to make their silly opinions matter.
"Certainly they are a long way from ‘beat OpenAI’ but this is the first and only case where someone might be in the game."
Maybe -- but what we have right now is benchmarks, and it's worth remembering how many models we've seen that looked good on benchmarks but turned out to be pretty bad in practice. I think we can't be very confident that they have a serious contender until we can actually start playing with Ultra in the wild.
General style question: when you are putting two quotes from separate threads one after the other, why do put them in a single quote block, rather than each in their own quote block? It makes it harder to parse, even with the written explanation.
Similarly with quote tweets coming after the response tweet with no indication beforehand
Benchmark hacking. Fake videos. No details on how the model actually works. I love AI in 2023 so much!!
Google has apparently trained a model about equal (perhaps slightly better) than GPT4. Depending on how you look at it, the glass is either half full ("OpenAI now has competition!") or half empty ("Google has equalled OpenAI's progress as of August 2022").
Considering the hype (Gemini was supposed to use 4x the compute of GPT4, and implement all kinds of nifty MCTS tricks), this is substantially less than promised, and the release has really soured me on the whole thing.
>I love that ‘above 90%’ turns out to be exactly 90.04%
It's like when the teacher assigns a 2000-word report and you turn in one that's precisely 2001 words long. I'm imagining Sundar Pichai screaming at the eng team "I don't care what you do! Get a 90% MMLU out of this model or no pay bonus!"
when you say the video demo wasn’t faked. what do you mean? i thought the video demo was a dishonest, dramatic reenactment of the actual much more boring behavior. they did not have a video feed as input with gemini ultra responding live with low latency. they fed in still-photos and text prompts together offline and then created this video after the fact in post
edit: ok rereading your words i think i misunderstood you. you’re saying you did indeed see the claims that it was faked, and their blog post describing the multimodal prompts, but you are doubtful that the video really is fake because why would google do that? is my interpretation correct?
Ah, that would be... more fake than I realized, if that's right, in a 'wow that's really bad' kind of way.
gotcha
here’s the blog post that people think is the actual demo: https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html
I was confused by the video where the tester shows Gemini pictures of the sun, Saturn, and Earth, and asks "Is this the right order?".
This puzzle is poorly-worded and has no correct answer. It's the wrong order if you want the planets arranged by proximity to the sun. It's the right order if you want them arranged by mass (heavier --> lighter). Gemini doesn't even know that the left image represents OUR sun. It could be a star outside the solar system.
But then we see the text prompt, which fills in the blanks ("Consider the distance from the sun and explain your reasoning"). But I'm sure GPT4 can solve puzzles like this, so what new ability is demonstrated?
"I love that this is saying that OpenAI isn’t valuable both because Gemini is so good and also because Gemini is not good enough."
Why do you continue quoting Gary Marcus? I feel like he passed the "not a serious person" threshold a while ago, and while it's occasionally fun to dunk on him like this, there are uncountably many posters with dunkable takes, it's not like this is Yan Lee Cunn or someone else in a position to make their silly opinions matter.
Love the image of the barely significant p-value!
"Certainly they are a long way from ‘beat OpenAI’ but this is the first and only case where someone might be in the game."
Maybe -- but what we have right now is benchmarks, and it's worth remembering how many models we've seen that looked good on benchmarks but turned out to be pretty bad in practice. I think we can't be very confident that they have a serious contender until we can actually start playing with Ultra in the wild.
Says a lot that OpenAI has an API and Gemini does not.
One is letting everyone kick their tires, the other is not.
To be fair I don't think GPT4 had day-one API access either.
Yah but Bard has been around since March and has never had an API.
General style question: when you are putting two quotes from separate threads one after the other, why do put them in a single quote block, rather than each in their own quote block? It makes it harder to parse, even with the written explanation.
Similarly with quote tweets coming after the response tweet with no indication beforehand
When the alternative seems worse, I do it and note I'm doing it, for flow.
The 2nd case seems harder, but would two successive quote blocks really look bad? I feel like I see that all the time
Or maybe something like in this? https://www.astralcodexten.com/p/highlights-from-the-comments-on-elon
Benchmark hacking. Fake videos. No details on how the model actually works. I love AI in 2023 so much!!
Google has apparently trained a model about equal (perhaps slightly better) than GPT4. Depending on how you look at it, the glass is either half full ("OpenAI now has competition!") or half empty ("Google has equalled OpenAI's progress as of August 2022").
Considering the hype (Gemini was supposed to use 4x the compute of GPT4, and implement all kinds of nifty MCTS tricks), this is substantially less than promised, and the release has really soured me on the whole thing.
>I love that ‘above 90%’ turns out to be exactly 90.04%
It's like when the teacher assigns a 2000-word report and you turn in one that's precisely 2001 words long. I'm imagining Sundar Pichai screaming at the eng team "I don't care what you do! Get a 90% MMLU out of this model or no pay bonus!"