Discussion about this post

User's avatar
Hastings's avatar

It's strange. I understand the sentiment that in 20 years, line by line coding will be dead. However, I have to do a lot of line by line coding now, and the models I have access to (claude 3.5, flash thinking 2 experimental) have an approximately 0% hit rate on helping me program solutions to problems I have day to day. Chatting with claude is the closest thing to helpful, but I've tried instead chatting with a notepad.txt window and it's equally helpful: Claude is basically serving as a really polite and friendly rubber duck. It feels like I'm in a 2000 mile race riding a bicycle against a car, right now I have a 1000 mile head start, and the advice I'm getting is "try drafting behind the car"

Expand full comment
Steve Newman's avatar

"If o3 was as good on most tasks as it is at coding or math, then it would be AGI."

A literal interpretation of this statement is true (more or less by definition; "emulating an AGI" is a task). But I think the more natural interpretation is false. o3 isn't designed to address messy open-world tasks that require interacting with the Internet (or even an intranet), selectively retrieving information, ignoring spurious information, etc. It doesn't have long-term memory or the ability to learn continuously. There are presumably still limits on how long it can maintain task coherence (and this would probably be harder for open-world tasks). We don't know how reliably good it is at coding and math, across what breadth of tasks. (For instance, the FrontierMath result is crazy impressive but, at only 25.2% in absolute terms, leaves a lot of wiggle room to somehow be achieved in a fluky / non-generalizable way (I've heard mutterings).)

I think o3-but-for-all-task-domains would still be far short of AGI. I could imagine that we get a lot of the former before 2025 is out but still be years away from transformational impact. (I've just finished drafting a blog post about more or less this topic, will publish within a week.) Very interested in your thoughts on this.

Expand full comment
26 more comments...

No posts