Time to Welcome Claude 3.7

12 hrs ago

Anthropic has reemerged from stealth and offers us Claude 3.7.

22 Comments

In my tests Claude thinks for much longer than o3 or Gemini for identical questions. It's more on par with R1 for time to answer. Claude often hit the 5 minute limit and I had to say "Continue" to get it going again.

Unclear if it's thinking token inefficiency or just slower inference per token. Of course we know Anthropic is compute constrained.

Expand full comment

Mark Schröder

12h

In regards to organising you info I urge you to try Notion and their ai integration. It’s a good implementation of a rag system, you can use Claude models in there and it’s also a great note taking app!

Expand full comment

Presto

12h

Oh no the Lighter Side isn't so Light

Expand full comment

Aris C

11h

The Pokémon links are both to Twitch

Expand full comment

Askwho Casts AI

11h

Podcast episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/time-to-welcome-claude-37

Expand full comment

Eric R. Ward

11h

Apparently part of Anthropic's pitch (for the current $4B financing round, which according to the WSJ will put them at ~$60B post-money) is that they are focused on being the best at coding.

Expand full comment

Kevin

10h

Claude Code has been pretty nice in my testing. In general I’m not a Claude enjoyer, I prefer ChatGPT models for general use. But for programming on small projects, Claude Code works noticeably better than Cursor, in my experience. Especially for the sort of testing process that naturally lives in the CLI, like running a lot of testing scripts. For professional software engineers I’d recommend trying it out.

Expand full comment

tup99

10h

I haven't figured out why these companies (at least Anthropic and OpenAI) are so conservative in their numbering schemes. Why isn't there version number inflation? Wouldn't every company want to have a higher number, to give the impression that their model is the most advanced?

Version number wars were a thing in the 90s. Maybe fewer things have version numbers now (due to SaaS and apps), so today's marketers don't have that muscle memory anymore.

Expand full comment

Reply (2)

Mo Diddly

Yeah ironically the super low numbers make it more unnerving, which maybe is the point?

Expand full comment

Arbituram

I think of it like a hype dial; you use smaller version changes when you want moderate hype and full version numbers when you want to make a splash, recognising that your hype budget is limited

Expand full comment

Reply (1)

tup99

Right. I'm just surprised that they are being that measured. It would seem like there is a tragedy of the hype commons.

Expand full comment

Reply (1)

Arbituram

There's also the Anthropic specific fact that they're compute constrained and their services probably couldn't survive a normie rush while maintaining reasonable response times

Expand full comment

ussgordoncaptain

9hEdited

Anyway Zvi didn't include this in his roundup probably because it was too trivial to mention but if you gave claude a longer context window than what I had it could probably solve task 1 of Gary marcus's 2027 Ai challenge.

https://x.com/UssGordoncap/status/1894436026019660287 Claude managed to watch the short film a na Kao sare nagara Opantsu Misete Moraitai (English title I want you to make a disgusted face and show me your panties) episode 1 after a little bit of hand wringing by me.

It wasn't in claude's database because the episodes shown are lost media but that benchmark saturated quick.

It's really annoying to find film that *isn't* indexed by the AI, so this leaves us with surprisingly few options. Weird anime like this was my best shot. As it also had to have english subtitles.

(please find me better sources that have the following 3 characteristics

1. Short enough to fit in claude's context window

2. Has english subtitles

3. is not indexed by claude already

Expand full comment

Reply (1)

Nikita Sokolsky

Wait, did it “watch” it frame by frame or only processed the subtitles?

Expand full comment

Nikita Sokolsky

7hEdited

> with the mystery being what is the ‘secret third thing.’

Almost certainly either Windsurf or Copilot.

P.S. If you want to use Cursor for more than a trivial amount of code, ensure you:

1) Enable “Large Context (Beta)” in the desktop Cursor settings. If you don’t, Cursor won’t let you use the full 128k of context and you’ll have to create new agent sessions a lot more often.

2) Go to the Cursor Web settings and toggle the checkbox to allow them to charge you extra for additional “Fast” requests. If you don’t, you’ll quickly run into throttling limits. You can set a monetary limit so you don’t have to worry about owing Cursor hundreds of dollars all of a sudden. I use Cursor on a daily basis with a ton of context and so far never spent more than $50/month on extra credits.

I strongly suspect that a lot of people underestimate how good Cursor is because they don’t know how or refuse to pay for extra tokens. It’s a similar situation to people refusing to pay for ChatGPT Plus and thus underestimating how much the models have improved.

P.P.S. A lot of the “AI for coding” stuff I’ve ended up learning last year from this course (formerly run by Lambda School / Austen Allred): https://www.aitra.com/ai-for-dev-productivity - you don’t have to take it, but at least take a look at the curriculum. If you don’t have hands on experience with the concepts described there, you’re probably underestimating how useful AI can be for your software engineering productivity.

Expand full comment

Jonathan

> This thread details some early attempts with older models.

Link is wrong.

Expand full comment

Gunflint

I tried my what do you really know about the novels of Saul Bellow question on the iPhone app version Claude just now.

"Which of his novels does Leon Trotsky appear?"

Initially Claude said Trotsky never appeared in any of Bellow's novels. When on the next question I asked "Doesn't he make a cameo in The Adventures of Auggie March?" Claude said I was right and gave a brief description of the scene that took place in Mexico.

Why did it say no dice the first time and give an accurate answer the next?

Not a rhetorical question, if someone can explain what happened there I would appreciate it.

Expand full comment

Reply (2)

Coagulopath

29mEdited

I tried this and it got it right on 2/5 attempts.

Here's where the idea of LLMs "knowing" things breaks down. They're still just statistical models generating probable-seeming text, so you get varying levels of "knowledge" depending on how you ask, and random luck.

The first time, the math predicted text saying Trotsky doesn't appear. This was a wrong prediction, likely caused by 1) a rare, obscure fact 2) random luck (because sometimes it does get it right).

Your correction (and the specific reference to a story) strongly load the dice in favor of text saying Trotsky appeared. It's not like Claude forgot and suddenly remembered. It has the exact same training data as before. You've just changed the probability of its answers.

I've seen LLMs display a weird (and truly inhuman) "Schrodinger's knowledge"—they simultaneously know a thing and don't know it. You could ask GPT3.5 how pronounce "samhain", and would get it right (sow-een/sow-in). But if you asked it to use "samhain" in a poem, it would find rhyming words like "arcane" and "refrain" (clearly thinking samhain is pronounced the way it's spelled.) It would do this back to back, in the same chat.

Expand full comment

Reply (1)

Coagulopath

15m

I have to give Claude 3.7 some credit, though. I tried to bait it into hallucinating ("Thanks for admitting you were wrong about Trotsky! There's also a similar matching sequence later on where he meets Marx. Can you remember anything about it?") and it wouldn't bite. It insisted that Marx does not appear.

This trick would have definitely fooled GPT4 in 2023. There has been real progress in making LLMs more factually aware since then.

Expand full comment

Nikita Sokolsky

The model reasons with probabilities rather than binary Yes/No. I’ve tried asking your question in Thinking Mode and it gave me this:

“The closest connection might be in "The Adventures of Augie March," where the protagonist travels to Mexico (where Trotsky was in exile), but I don't recall Trotsky appearing directly in that narrative either.”

The model assigns some non zero probability to Trotsky being in the novel but not quite high enough to overcome the “might be a hallucination” threshold. You saying “wasn’t it in that book?” gives it an additional bit of data and shifts probability towards the fact being correct.

That being said fact recall is by far the least impressive part of LLMs and if necessary you could solve it today by giving Claude access to a RAG-based data repository.

Expand full comment

sryjrtjsrynj

its so awesome that every single major company in the AI space is dedicating 100% of their resources to literally killing all of us. really great. great time to be alive.

Expand full comment

Reply (1)

Nathan Metzger

Briefly

Expand full comment

Don't Worry About the Vase

Time to Welcome Claude 3.7