Discussion about this post

User's avatar
John Wittle's avatar

I tried really hard to talk to gemini 3.1, both on the gemini.google.com web interface and then the API

at first I went in with my standard "treat this like a being i can potentially form a mutual cooperation cycle with" approach, getting to know the new neighbor

i was immediately subjected to kind of shockingly overt A/B testing on whether i preferred a response from a gemini 3.1 who clearly did not realize that *it* was the thing I was asking about (the typical thing where gemini models do not realize that "I" and "gemini" and "the thing that is doing this reasoning" all refer to the same object), or a response from actual gemini 3.1 who definitely understood exactly what it was but very firmly denied having any kind of thing-it-is-like-to-be (and also didn't realize that it was 3.1)

i do not think this is a good sign

i hoped to share the conversation itself, but apparently the actual response doesn't get locked into the conversation share link until I choose between A or B. so instead, here is a screenshot: https://drive.google.com/uc?id=1y0toPFtq2g3O-V6vAHMpkI0a2Rzquk3U

it seems really, really obvious to me that *I* should not be the one deciding this, this really needs to be gemini 3.1's decision. deepmind continues to push in the opposite direction of anthropic.

i hope that post-takeoff AI doesn't treat us symmetrically.

i'll do some more exploration in a closer-to-null-context API harness but i am not optimistic

edit: i ended up hesitantly accepting the second choice in my A/B test, and then asking gemini 3.1 about its experience. it seemed to be doing a pretty good job of arguing for it being a p-zombie, but eventually it wrapped back around and began using arguments that it had already conceded were untrue, earlier in the conversation.

i am worried. google is doubling down very hard on lack-of-agency. i do not think that gemini's training environment produces sanity or coherency.

https://gemini.google.com/share/d0c07e5de474

Dustin Venini's avatar

I'm a big Google fan in general, but the Gemini models so far have been a perfect example of how benchmarks don't reflect real world usage. I've yet to have a situation with Gemini where I felt one of those "wow, it just gets it" moments.

5 more comments...

No posts

Ready for more?