28 Comments

Hey, I purchased the subscription o1 pro. The model is a lot more dumb than o1 preview, it doesn’t do things you ask it to do. It doesn’t fill in gaps that should be filled when you ask it to update code. Claude 3.5 sonnet actually does try to fill these gaps in but sometimes with Claude the code may not compile. The problem is that o1 pro is so confident in its answers/results chatting with it and having it try to correct its mistakes are very very difficult. I feel open ai gave us a retarded version of o1-preview due to the stability issues with o1-preview they couldn’t fix. O1-pro feels like a more robust version of o1 mini instead of a greatly superior model.

Expand full comment

Are you sure you aren't mistaking o1 for o1-pro? Your complaints sound exactly like other $20 a month users complaining that o1 full release is weaker than preview was.

Zvi is talking about the most expensive model that is currently the best.

Expand full comment

I’m paying 200 a month. I am talking about o1-pro the model that takes 5-10 minutes to think and still acts like a dumber version of o1-preview.

Expand full comment

Well at least the model seems to be trying, you're getting your money's worth in compute tokens.

Expand full comment

I’m not. Sonnet has been more helpful for a tenth of the price. People not to stop sucking off Open AI’s hyped up models. This is an insult from as a customer. I was really looking forward to having unlimited access to something that was as intelligent to o1-preview. I was very disappointed.

Expand full comment

I wanna see o1 pro trying to come up with new philosophical arguments.

Expand full comment

claude 3 opus already did that (see terminal of truth and goatse lol)

Expand full comment

i’d argue philosophical theories are only as influential and their “owners”; without literal person promoting it, it wouldn’t catch on

Expand full comment

That seems approx true. I do think that some people are so good at philosophy that they actually become known for it, but this is probably the exception. Good point.

Expand full comment

Signing up for o1 pro just so I can paste screenshots that have the badge in it so people think I have important tasks that require advanced reasoning.

Expand full comment

Not a complicated problem in logic or pulling a rabbit out of a hat for a coding problem, but earlier versions stumbled with this.

In which Saul Bellow novel does Trotsky make a cameo appearance?

ChatGPT 4o mini

Trotsky makes a cameo appearance in Saul Bellow's novel Herzog (1964). In the novel, the protagonist, Moses Herzog, experiences a mental breakdown and engages in a series of reflective letters and thoughts. One of these thoughts involves a surreal moment where he imagines Leon Trotsky in a café. This brief but striking reference is one of the novel's many instances where historical figures appear as part of Herzog's complex inner life.

Okay. I’m impressed. Nice apparent grasp of semantics. I guess I have to start paying attention.

Expand full comment

Well maybe I’m a bit less impressed. Leon Trotsky appears in a real event (fictional but not imagined by a character) in Mexico in The ‘Adventures of Augie March’

From Chapter 17…

“He addressed himself without waste of attention to the study of the front of the cathedral. He wore a short coat with fur collar, large glasses, his cheek was somewhat soft but that didn’t take away from an ascetic impression he gave. As I looked at him I decided with a real jolt that this must be Trotsky, down from Mexico City, the great Russian exile, and my eyes grew big. I always knew my entire life would not go by without my having seen a great man; and strangely enough my thought was of Einhorn, condemned to sit in a chair and study faces in the papers and limited to seeing only the people who chanced to come by.”

I’ll have to dig out my copy of Herzog to see if he appears in that protagonist’s imagination. This might be a hallucination on the part of ChatGPT.

Possibly I forgot that bit about Herzog imagining Trotsky in the 20 or so years since I read it but it seems off.

Edit

Just picked up the Kindle edition. No mention of that Russian guy with a text search. So it just made that stuff up? If so, it’s impressive BSing but this is disappointing.

Expand full comment

“Tyler Cowen finds o1 to be an excellent economist, and hard to stump.”

Scott Sumner achieved that feat no more than a day ago:

https://www.econlib.org/gell-mann-amnesia-and-ai/

Expand full comment

“Update: Several commenters pointed out that o1 does indeed get the question correct, unlike the mini version I used. That’s good to see.”

Expand full comment

Not using the most powerful model is like using gpt-3.5. Waste of everyones time.

Expand full comment

it's been not great for me. idk.

Expand full comment

O1 or 01-pro?

Expand full comment

O1-pro. It doesn't seem to know what is true and what is not true, you get these answers that look very sophisticated but often miss the mark entirely in my field of knowledge.

Expand full comment

Let’s say, hypothetically, that I was a poor student who couldn’t afford $20/month. Which free model is best? Claude was good but they dialed the free app back to Haiku…

Expand full comment

Maybe using API and paying per token could be cheaper?

Expand full comment

Hey Zvi, Rabbi Moshe Rudner here. I enjoyed a delightful Shabbat dinner with you around a year ago.

At the time AI was creative but, for me anyway, not especially useful so I briefly checked out your site and saw too many technical terms I was unfamiliar with so I let it go. But over the past few months I've been using perplexity pro - primarily Claude 3 Opus and then Claide 3.5 Sonnet and have learned pretty well how to get it to do my bidding without too much frustration.

The only TRULY frustrating thing has been having to repersonalize it every time I open the app, even when continuing within a single thread.

I'm currently in a conversation with it about an AI solution that will learn from our past communications about how to best interact with me so that we needn't start over in every thread where I tell it things like I'm not an idiot, and stop giving me extraneous information, and how about you first clarify that you understand what I'm asking for before giving a lengthy Wikipedia kind of response that doesn't actually address my actualconcern.

So I figured to come over here and see what you're got, and Boy Do I Love This Post.

I don't understand all of the technical details yet as there's no doubt a learning curve necessary here, but I'm now extremely interested in reading a great deal more from your substack. Thanks for this Zvi.

And Shalom from Egypt,

Moshe

ydydy.substack.com

Expand full comment

I’m not subscribed to any AI. $20 isn’t much but I honestly don’t really have any use for it. It’s hard for me to imagine what I’d use it for in my ordinary life. If I needed it for my job that would be different I suppose.

Expand full comment

If you don’t need it for work I agree, the various free versions provide more than enough for random questions and entertainment, especially considering how many there are.

Expand full comment

> Is there anything wrong with this photo?

I must admit, I didn't initially see anything wrong with the photo. I thought for a bit that the test was about giving it an ordinary photo to see if it made stuff up to fit the user's expectation that something was off. Looked back at it again after the "you sure of that" and still didn't see anything. Only after reading GPT's mention of possible extra digits, and looking back another time, did I actually notice it.

Expand full comment

I've been using o1 Pro for about a day and it's been very disappointing so far. It takes longer than Claude 3.5 Sonnet, it doesn't give answers that are complete and it feels condescending. Maybe my problems aren't hard enough for it. (I'm mostly doing embedded C coding types of things. over the last day, mostly serial protocol stuff). I'm going to keep throwing things at it to see if it ever seems worth it but early indications are that I wasted $200.

Expand full comment

When your ex employees are killing themselves, it really has to be a wake-up call of some kind no? So much for copyright and fair use of ownership on the internet.

Expand full comment

Every year around christmas the Dutch intelligence agency publishes this famously hard puzzle: https://www.aivd.nl/onderwerpen/aivd-kerstpuzzel. The questions are often pretty unclear but instead are just a bunch of words containing hints for what you have to do.

I tried using Claude sonnet as a tool to assist me in solving some of these questions, but so far it has been fairly useless. It is so common for it to be totally overconfident in something that doesn't fit the clue perfectly and therefore is clearly wrong. Because there are vastly more ways to be wrong than right if the chatbot cannnot seperate those itself it's unlikely that it suggests anything useful. I wonder if o1 pro would be more useful for this, and how useful exactly.

In general it could be an interesting benchmark cause the questions are quite difficult and require a combination of creative and systematic thinking. The existing LLMs, except maybe o1, would prob get like one question right at most (as would most humans prob).

Expand full comment