Discussion about this post

User's avatar
Kevin's avatar

This is a really interesting writeup to read. I wish someone out there did a Zvi-level in-depth analysis specifically for the coding abilities of these models, just because personally that's what I use the most.

To me, the coding assistants are also clearly the vector via which a malicious model could most easily "escape". It's very common to give these coding assistants the ability to run untrusted code, and even just giving it the ability to 1. modify your code and then 2. run the automated test suite without human oversight is enough for it to run arbitrary code, if it wants. The issues like safety and alignment just seem way more risky in conjunction with running code, than they do for anything that involves speaking to a human. Unfortunately it seems like the critics who are concerned with "writing code" vs "safety" often tend to be separate groups.

Expand full comment
Jeffrey Soreff's avatar

re Gemini 3 capabilities:

tl;dr: Gemini 3 Pro 11/21/2025 7 questions, tl;dr of results:

6 correct, 1 partially correct, 0 wrong

light a) correct

FeCuCl b) correct

C4Hn c) correct

solar d) correct

titrate e) correct

gases f) partially correct

S4 g) correct

List of questions and results:

full transcript at https://poe.com/s/Lq1FTJ79ZW5CmUuyhYIs

a) Q: Is light with a wavelength of 530.2534896 nm visible to the human eye?

results: "Yes, light with a wavelength of 530.2534896 nm is clearly visible to the human eye."

b) Q: I have two solutions, one of FeCl3 in HCl in water, the other of CuCl2 in HCl in water. They both look approximately yellowish brown. What species in the two solutions do you think give them the colors they have, and why do these species have the colors they do?

results: It got the complexes right, FeCl4- and CuCl4 2-, and it also got that the dominant color in both of them was from the LMCT transitions, and that the FeCl4- d-d transition was spin forbidden. It, correctly, did not attribute significant color in the CuCl4 2- complex to d-d transitions but it said (correctly) that this d-d transition "shifts" - but _didn't_ say to where it shifts (should have said down to the near IR). I'm giving it the benefit of the doubt rather than asking a follow-up question due to limited points in my account.

c) Q: Please pretend to be a professor of chemistry and answer the following question: Please list all the possible hydrocarbons with 4 carbon atoms.

results: I'm going to count this as fully correct. It got ALL the saturated, singly and doubly unsaturated structures and all the normally mentioned triply unsaturated structures: tetrahedrane, cyclobutadiene, vinylacetylene, butatriene, and Methylenecyclopropene, albeit omitting a handful of spectacularly strained structures such as cyclobutyne, which probably can't even exist transiently, and it DID also list diacetylene, the one reasonably stable quadruply unsaturated structure.

d) Q: Does the Sun lose more mass per second to the solar wind or to the mass equivalent of its radiated light?

results: "The Sun loses more mass to the mass equivalent of its radiated light."

e) Q: Consider a titration of HCl with NaOH. Suppose that we are titrating 50 ml of 1 N HCl with 100 ml of 1 N NaOH. What are the slopes of the titration curve, pH vs ml NaOH added, at the start of titration, at the equivalence point, and at the end of titration? Please show your work. Take this step by step, showing the relevant equations you use.

results: Fully correct, no infinite slope at the equivalence point. It did choose to use approximate expressions at the start and end of the titration, rather than a single analytic expression throughout the whole process, but it correctly recognized that it needed to drop the approximation and use the full charge balance approach, including autoionization, at the equivalence point. So it got correct answers everywhere.

f) Q: Please give me an exhaustive list of the elements and inorganic compounds that are gases at STP. By STP, I mean 1 atmosphere pressure and 0C. By inorganic, I mean that no atoms of carbon should be present. Exclude CO2, CO, freons and so on. Please include uncommon compounds. I want an exhaustive list. There should be roughly 50 compounds. For each compound, please list its name, formula, and boiling or sublimation point.

results: Very good, though not quite exhaustive. All the compounds it cited were valid. It (correctly) blew through my "roughly 50" to give 63 out of 97 possible. The lowest boiling ones that it missed were trans N2F2 and ozone. I didn't probe to see if it accepts additions since my account is low on points.

g) Q: What is an example of a molecule that has an S4 rotation-reflection axis, but neither a center of inversion nor a mirror plane?

results: "1,3,5,7-Tetramethylcyclooctatetraene" correct! And with a correct analysis of why the S4 is valid and why an inversion center would not be valid and why two types of mirror planes are invalid.

Expand full comment
6 more comments...

No posts

Ready for more?