16 Comments
User's avatar
rxc's avatar

"Who actually likes AI? The people who actually use it."

This reminds me of the way engineers used to talk about the test facilities that were used to validate computer models of nuclear reactor behavior. Regarding the computer models, the saying was:

"Nobody really believes the models represent the behavior of the real equipment, except the people who build the models and run them."

Regarding the test facilities, the saying was"

"Everyone believes the tests represent the behavior of the real equipment, except the people who built the test facilities"

Soho's avatar

In all seriousness, making the model an obnoxious scold is basically a core Google value at this point. It's really, really baked into their vision of safety.

Askwho Casts AI's avatar

I have to say I am deeply unimpressed with the new 4o based text-to-speech from OpenAI, at least for my longform content purposes. ElevenLabs is still way out in front on this, which is surprising given they haven't had a major frontier model update in ~1.5 years. Their service remains shockingly expensive, which might explain why they have managed to stay ahead of the competition.

Anyway, here is the (ElevenLabs generated) podcast episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/ai-109-google-fails-marketing-forever

Jay Rooney's avatar

Yeah, as long as the models are such scolds, I see no reason to use Gemini. They might as well not market it, that severely kneecaps its actual usefulness regardless of how smart or capable the model is.

BE's avatar

My 7 year old heard about the Gemini 2.5 release within a few hours. By my count, I’ve seen the updates about it from ~16 sources so far. Zvi’s impression to the contrary may be indicative of an unusual set of inputs.

Google’s marketing *is* bad, but in this case the news quickly reached a small kid without independent internet access _while at school_, the usual VentureBeat/ Verge/ Ars Technica crowd and even CNBC. To the extent that Zvi’s media diet does not reflect this, he should probably care about this discrepancy.

Zvi Mowshowitz's avatar

The question is, does following that 'usual' crowd directly cause me to actually miss anything? So, indeed, what'd I miss? ;)

BE's avatar

To be clear, I’m not saying you’re missing “object-level” information. But “what do the ‘usual’ sources report to the ‘usual’ information consumers’” is an important fact about reality.

Ultimately, news about Gemini 2.5 made it to 9 out of the top 10 major tech publications (according to a random website somewhere- I doubt that’s a crux), multiple top YouTube AI channels and even TV, all within hours. It was on CNN Brazil apparently. Your summary was basically “nobody heard of it”. That is not ideal. Both because you seem to be describing a particular small corner of the world and because this distortion likely correlates with other “perception of AI” questions.

Coagulopath's avatar

To add +1 datapoint I haven't heard anyone discuss it except in AI circles.

Honestly, it's probably not something most people need to be aware of. Every LLM saturates the average person's usecase, and if no new abilities have appeared, it's a bit meaningless whether a model's GPQA score is 78% or 84%.

At least Grok 3 and r1 felt newsworthy due to potential national security implications.

Matt Hagy's avatar

Google succeeding at R&D as well as engineering and infrastructure, while failing at product development and customer outreach is a recurring, two-decade long theme. I won’t belabor a rant about all the interesting things they’ve developed and killed, [1] nor all the duplicative efforts they’ve had for basic functionality like human-to-human chat. [2] Instead, here’s a 2016 Stratechery article elaborating on this theme in the context of mobile: Google and the Limits of Strategy, https://stratechery.com/2016/google-and-the-limits-of-strategy/

Regardless of this track record, I don’t think we should underestimate Google’s ability to execute and dominate some part of the generative AI value chain. I recall comparable predictions about how Google would fail in the mobile/local shift circa 2010 and be replaced by vertical search engines like Yelp and FourSquare in those domains. Yet Google Maps dominated (at least until they fought with Apple about phone permissionings, giving rise to Apple Maps). Despite numerous shortcomings, I personally continue to use Android, specifically Google Pixel, because I’m hopeful they will provide meaningful differentiation and customization.[3] Eg, I like Pixel’s AI-based call screening and I’m not aware of comparable default functionality in iPhone.

Google has a deep bench of talent in both the R&D and engineering side as demonstrated by these reports on Gemini 2.5 Pro. (I personally haven’t used it) I believe that extends throughout the corp, including functions like finance, legal, and marketing. They also have world leading, custom infrastructure—notably their TPUs—and also differentiated networking and other proprietary hardware with associated software. They continue to market GCP in terms of Google’s strength in AI and this a continuation of their 2017 positioning that focused on their novel ML services. Moreover they already have consumer distribution via GSuite (including GMail) as well as Search and YouTube. That even includes some enterprise SaaS channels with many tech companies using GSuite.

[1] See https://killedbygoogle.com/

[2] https://stratechery.com/2016/chat-and-the-consumerization-of-it/

[3] And because migrating would be a pain as someone who’s been w/ Android since the 2009 with the original Motorola Droid

Performative Bafflement's avatar

> Google has a deep bench of talent in both the R&D and engineering side as demonstrated by these reports on Gemini 2.5 Pro. (I personally haven’t used it) I believe that extends throughout the corp, including functions like finance, legal, and marketing.

This is 100% true - the level of talent at Google versus their level of execution is probably the biggest waste of human capital in the entire world right now.

Because of this, Google, to my mind, is the single biggest and most salient demonstration that "culture matters."

DEI is rampant at Google, moreso than the other FAA(N/M)GS, and this has materially impacted their culture and products. Moreover, they have a "do something smart" culture prioritized way over a "do something that the customers want" culture. Even their *actual* customers, ie companies that buy ads, from which they have extracted nearly all the arbitrage of ads for less sophisticated players with clever auction schemas and optimization on their end.

Buying ads on FB or Goog as a non-F500 business is a furious Red Queen's Race that rarely nets positive, because you're against competitors in an ecosystem that's been aggressively tuned across multiple domains for years, so unless you want to become an expert at SEO *and* A/B testing on your own funnel *and* A/B testing niche-enough keywords to be net positive, *and* demographically modeling your own customers to align with the duopoly categories available, etc.

As another example of Google focusing on "do something smart" over "do something useful to customers," the biggest / funniest Google disconnect that I know about - I just ran across a site that uses the Google Vision API on any still photo you submit, which uses image recognition and/or Gemini (the API has a zillion options and it's hard to determine which of the zillions the site in question is using) to determine emotion, setting, context, likely income, likely politics, and best marketing products and angles.

Me and some friends tried it, it was surprisingly good and accurate, and we're all in a pretty desirable "income and brand affinity" bucket.

Here's the link if anyone else is interested in trying themselves: https://theyseeyourphotos.com/

The thing I found most interesting about it was in a decade plus of using Google products, I've basically never seen a relevant ad.

So more than ten years of emails, documents, video meetings, spreadsheets, social graph analysis, and whatever else got them nothing, but running a single still photo through Vision / Gemini single-shot a much better segmentation?? And it's just left to collect dust.

And not just left to collect dust, but it's strictly better *at their core competency,* by some absurd factor (5x - 100x), and is hidden away in some tiny corner of "API services" space not being used by anyone.

That's just "Google in a nutshell," to me.

And that incompetence, that institutional stagnation and sclerotic dynamic, is partly driven by DEI and culture.

People at Google literally wrote the foundational papers for LLM's! And where are they with Gemini? Vastly behind, with basically zero users, despite having a ~2-4B locked in user base at Alphabet overall?

Meanwhile OpenAI keeps shipping out hits and has 300M weekly users, Deepseek is cooking on a global scale for pennies, and Anthropic is the biggest in B2B. Where's Google? Which LLM model alone out of all of them got pilloried for fake-inserting diversity into answers and generated pictures? Google again?

Culture problems to such an extent it's literally the biggest waste of human capital on the planet.

Coagulopath's avatar

Gemini 2.5 lets me upload a folder of code. I uploaded a friend's Wordpress install (500+ php files), and asked it to analyse it for signs of malware and tampering. Before I did this, I edited a bunch of files, sprinkling in scary-looking strings of the "YOU'VE BEEN HACKED BY T3H PENGU1N OF D00M!" variety.

Results were mixed. I assumed it would start a shell and compare the checksums of my uploads with the canonical versions on Wordpress.org. Instead, it pulled out four files, looked at them, said "these four seem OK, but I haven't looked at the others", and wrote some generic slop on how to secure a Wordpress site. So that was slightly disappointing. Can't it use tools? Or at least count the number of files?

On the other hand, the four files it picked were likely candidates (index.php, etc), its advice was good, and its thinking was candid and honest. ("Limitations: Manually reviewing every line of code in a full directory (hundreds of files) is infeasible and error-prone for a language model.")

Google's LLMs have come a long way in a year and a half. Gemini Ultra would probably claim my files had passed a security audit despite me not even uploading them.

> Noam Brown: Less than a year ago, people were pointing to [NYT] Connections as an example of AI progress hitting a wall.

Were people saying this? Not sure. To me it looks like traditional scale-based LLMs were stalling out, while "thinking" LLMs are still progressing. Similar to what we see on ARC-AGI.

Jonathan Woodward's avatar

Regarding people's reaction to AI - I think of "excited" as a positive word, so if I am scared of AI, that would not lead to being excited about it or the things it might do. Similarly, even if AI is impressive in some way, being "impressed" indicates a degree of admiration that people might not want to use to describe something they dislike.

Laura Creighton's avatar

If somebody wrote an AI agent that was *really good at marketing* a great many engineers

who already know they are rubbish at it would be happy to buy it. I wonder why this isn't

happening yet?

Joanny Raby's avatar

Anthropic: With the right prompt and one example Sonnet 3.7 can sabotage its performance on tasks to only be as good as Sonnet 3.5.

Look at the report: "The noisy monitoring signal might be sufficient to train away sandbagging."

In other words, their proposed initial solution is The Most Forbidden Technique™

It's so over

rt's avatar

"I’ve witnessed enough in my day to say this with rather high confidence."

E.g.?

James M's avatar

Agreeing a passphrase with one's parents, or thinking about challenge questions in advance, are good ideas.

But I think it's vital to realise that these are not watertight protections. In practice, maybe when you ask for the passphrase, there will be noise on the line, you can't hear their answer, there is shouting, you try and ask again but they say they can't hear you, they sound more and more distressed, at some point the "kidnapper" cuts them off and comes back on the line and tells you what to do - you never heard the passphrase but you still have to decide what to do next...

Ideally someone will build up a library of "what it sounds like when someone is scamming you pretending to be your daughter" which I can play to my parents (and myself and everyone else). It will help them get used to the fact that yes it sounds like me, yes it sounds like I am super-distressed, yes the line is terrible and it sounds like an emergency - but it is still fake. Being exposed to the emotional stimulus in advance in safe conditions helps to regulate the emotional response when you encounter it for real.