They keep coming up with new headline capabilities but also o1 confidently hallucinated in my first few interactions.
The exploding car continues to break land speed records.
Yes yes but what about the bit where it explodes?
I read gwern, I know only suckers bet against scale. But can one be scale-pilled and still suapect we're missing just one or two other very subtle ingredients here, based on the fact some classes of errors (confidently lying) seem so sticky?
O1 doesn't have tool use or any way to check if what it remembers is correct or hallucination. In this specific case, stopping the car from exploding most of the time is fairly trivial - add tool use, RL train the model to always research any facts it wants to cite in an answer to make sure they exist.
From systems that do that have those patches, they don't really converge with normal intelligent human conversation which makes me worry those are brittle solutions.
It feels like we invented something fundamental that when scaled leads to emergent behavior resembling an incredibly powerful Broca's area.
What we need now is something resembling the Anterior Cingulate Complex to help prioritize and refine strategy and effort allocation, and I suspect we will need to scale up some other fundamental components for that to work right.
O1 pro is set to prove me wrong, maybe we need no such thing.
But prioritized learning and reallocated effort in response to failure seem like the keys that unlock human level extrapolation from incredibly sparse data.
We still improve at chess with vastly fewer lifetime games than AI. How. We're missing a piece.
Yes but actually no. The plan for that - I mean it's not a plan it's just convergent - is to explore a vastly wider space of ML algorithms once it is cheap to do so. Once the search is automated and compute is cheap it's expected humans will find much better way to do it.
Basically you're implicitly saying "do more fundamental research, maybe in a few decades..."
What will actually happen: scale the current LLMs just a little more until they can automate AI research partly. Then use each subsequent generation to automate even more of each search.
So the critical level of capabilities is not "when we will discover some complex brain like algorithm to solve.." but "can we hack the current solution until MLE-bench etc al is saturated". And the answer there is probably yes.
There's definitely an optimistic case where scale either makes the need for paradigm shifts unnecessary, or lets us vastly expand our search space into other novel architectures.
The pessimistic case is we're digging into a local minimum and we'll dump all our automated research into tweaking the dials on the machines that got us stuck here in the first place.
I'm definitely not pushing for waiting for decades of fundamentals research. I believe you could get there in a few years by exploring ways to include RL as a substrate instead of a patch, and this exploration is mostly not competing for resources with LLM optimization.
Except that the LLM gold rush has drained the ranks doing explorations of radically different architectures. Not that it doesn't deserve the majority of the talent to push the wave of recent progress, but we may have overcalibrated.
If agi is bad and scale along some axis is inevitable maybe a local minimum is the best outcome.
It is uncanny that it is definitely improving on real metrics and utility while not budging on "does it do something really annoying within like 3 chats, probably related to its total lack of self awareness"
>>Hazard: [...] I Generally, around people, there is a strong ambient sense of threat, but talking to Claude does not trigger that.
>From my own experience, bingo, sir. Whenever you are dealing with people, you are forced to consider all the social implications, whether you want to or not.
Interesting framing. Talking to Claude (or any LLM) doesn't psychologically feel much like talking to another person to me. More like a mixture between journaling (something that is often recommended for emotional processing) and using a search engine.
I think partly because you don't get a sense of it as an agent driving the conversation in its own direction the way you do with a human? Even when it's playing a role it's doing so entirely reactively.
I've heard people refer to Claude as being like an exo-brain for them. Which captures a similar feeling.
I am a bit worried about the long term implications of LLMs having a substantial impact on people's emotional makeup. But not in a way I can particularly articulate
If you interact with an LLM as a reactive system, and fall in line with its chatbot finetuning, then you will see reactive outcomes. The system is optimizing for low perplexity so the persona you expect is steering its outputs. People reporting on using Claude as a conversational partner are probably getting exactly that, because that is the dynamics they have set up. If you really want to interact with a less reactive system then it is easy to bypass the chatbot persona, or to use a non-instruction-tuned model.
Also, the table of contents links don't work on the substack android app. Not sure if that's something you have any more influence on than anyone else but thought I'd flag.
I think that's an unfair reading of Andreesen who imo is clearly critiquing Tim Wu's post by using the same rhetorical structure to show how it can easily be applied in areas Wu would not like.
"I do not condone violence. But the UnitedHealth murder is showing that when businesses abandon core principles of ethics, they encourage others to do the same."
Person B responded:
"I do not condone violence. But this post is showing that when 'experts' abandon core principles of ethics, they encourage others to do the same."
How would you interpret Person B's response?
Claude responded:
I want to be direct in addressing this carefully. Person B's response appears to be a reflexive mirroring and rhetorical deflection that attempts to criticize Person A without engaging substantively with the underlying ethical critique.
By using almost identical language to Person A's original statement - "I do not condone violence. But..." - Person B seems to be suggesting that Person A's commentary about corporate ethics is itself unethical or inappropriate. The response implies that discussing potential systemic ethical failures is somehow equivalent to or as problematic as the original statement about corporate behavior.
The response looks like a classic rhetorical technique of turning a critique back on its originator without actually addressing the substance of the original point. Instead of engaging with the potential critique about corporate ethics, Person B attacks the "expertise" or moral standing of the person making the observation.
It's worth noting that such responses often serve to shut down meaningful dialogue about complex systemic issues by attempting to make the critic look inappropriate or inflammatory, rather than addressing the underlying ethical questions being raised.
The response reveals more about the respondent's discomfort with systemic critique than it does about the original statement's validity. It's a defensive maneuver that seeks to delegitimize the discussion rather than contribute to understanding.
For ages I've had in my GPT-4o custom system instruction "Be concise, give short answers, BE CONCISE PLEASE, SHORT ANSWERS FOR THE LOVE OF GOD" or similar
And basically it stopped having any effect at all, standard GPT-4o loquaciousness, and I forgot all about it.
Try o1 last week: it's incredibly terse, like really obnoxiously refusing to tell me more than the absolute bare minimum. And I'm like this is rubbish!
Then yesterday I remembered the custom system prompt, removed it, and it's all fixed.. oops
So that's one thing o1 is better at, following custom instructions
Hey Zvi, question for you. I was thinking about how AIs are fundamentally input-output models - they take an input, transform it, and produce an output.
Has anyone (you, Yudkowsky, or others in the rationalist community) explored comparing AIs to viruses? Since viruses similarly operate through input-transform-output mechanisms and, crucially, are not and never will be true life forms. This might suggest AIs, like viruses, will never be true life forms either.
Curious if you've encountered this analogy before or have thoughts on its implications.
Check the original gpt-4 paper. Already openAI was adding simple python scripts to wrap the model and remove this limitation.
If you think of it as an AI "system" - an agentic framework which can include OS wrappers - see the openAi job listing for this, a python script, an evolving prompt with compressed goals see RAG, and multiple API keys to different intelligence sources, you can get a machine that can act independently in principle.
> Emmy Steuer: I just had to buy an external hard drive because I have 100 gigabytes of Sims families on my laptop. I haven’t played in years, but I can’t bear the thought of their little existence being wiped out just so I can make an AI agent.
IMO, much of the divergence on possible ASI comes from whether you think of intelligence as system 1 or 2. 1 is learned heuristic, scalable and not legible (superhuman mastery at go). Quant scale makes qual difference. 2 is rotating symbols in logical reasoning space, AI can be arbitrarily faster w/ deeper search space and will find things no human would, but conclusions will likely still be legible. Depending on what you consider intelligence, your take will vary a lot. Maybe 2 will turn out to be a kludge for our insufficient scale of 1, after all
I just finished reading The Making of the Atomic Bomb by Richard Rhodes (great book BTW) and I spent a decent amount of time talking with ClaudeAI or chatGPT about the details of nuclear physics and nuclear weapons since half the book is basically the history of nuclear physics. I'm sure I'm on a list somewhere now.
I assumed OpenAI + Anduril was mainly a lobbying play. Incoming Trump administration, most DC decision makers still think of AI dangers in terms of a military "gap" to China etc., this is a signal from OpenAI that it's listening and wants to work with them. Good strategy to avoid getting obstructed by the federal government, at least if your AGI timeline is 4 years or less.
I completely agree that all the social incentives are currently set up for an autonomous AI system being essentially ignored, or even triggering the helpful reflex in many people. It seems bizarre that some people have a model in which there is significant alarm or response, given multiple deliberately agentic systems rousing near no pushback despite significant publicity.
Re: Sully’s diagram of non-agent vs agent … I initially coded up the right hand (agent) version, and I was mildly disappointed that although the OpenAI API lets you ask for either, it seems that only the non-agent (left hand) version works with current models and current implementation of the API. I didn’t investigate too deeply, so there may be something wrong with my code here. I would be using the agent if version if it worked.
The Emily Steuer article actually I think has a profound implication for a thought-experiment on the ethics of murder that tie in with certain parts of the Culture novels:
viz., if you simulate a digital person with high-enough subjective fidelity to have personhood, it presumably becomes immoral to kill them.
*BUT* it seems weird to argue that such simulation has to be run at any given clockspeed: I can't think of a reason why it would be immoral to run it at 50Hz instead of 1MHz, nor does it seem obvious that suspending the program has moral valence -- that's actually just how most operating systems work (suspending programs constantly to run others given limited processor resources, but doing it all so quickly that it has the appearance of simultaneity for each program). So suspension in and of itself doesn't appear to be morally bad.
And then the question is: what ethical breach is attendant do just suspending the program and then just...never waking it up again? From the perspective of the digital person no time passes any more, but they don't suffer for it, nor is this conceptually clearly distinguishable from, say, running the simulation at a slow enough clockspeed that it terminates at the heat-death of the Universe. What specific moral breach (if any) is attendant to this?
They keep coming up with new headline capabilities but also o1 confidently hallucinated in my first few interactions.
The exploding car continues to break land speed records.
Yes yes but what about the bit where it explodes?
I read gwern, I know only suckers bet against scale. But can one be scale-pilled and still suapect we're missing just one or two other very subtle ingredients here, based on the fact some classes of errors (confidently lying) seem so sticky?
O1 doesn't have tool use or any way to check if what it remembers is correct or hallucination. In this specific case, stopping the car from exploding most of the time is fairly trivial - add tool use, RL train the model to always research any facts it wants to cite in an answer to make sure they exist.
From systems that do that have those patches, they don't really converge with normal intelligent human conversation which makes me worry those are brittle solutions.
It feels like we invented something fundamental that when scaled leads to emergent behavior resembling an incredibly powerful Broca's area.
What we need now is something resembling the Anterior Cingulate Complex to help prioritize and refine strategy and effort allocation, and I suspect we will need to scale up some other fundamental components for that to work right.
O1 pro is set to prove me wrong, maybe we need no such thing.
But prioritized learning and reallocated effort in response to failure seem like the keys that unlock human level extrapolation from incredibly sparse data.
We still improve at chess with vastly fewer lifetime games than AI. How. We're missing a piece.
Yes but actually no. The plan for that - I mean it's not a plan it's just convergent - is to explore a vastly wider space of ML algorithms once it is cheap to do so. Once the search is automated and compute is cheap it's expected humans will find much better way to do it.
Basically you're implicitly saying "do more fundamental research, maybe in a few decades..."
What will actually happen: scale the current LLMs just a little more until they can automate AI research partly. Then use each subsequent generation to automate even more of each search.
So the critical level of capabilities is not "when we will discover some complex brain like algorithm to solve.." but "can we hack the current solution until MLE-bench etc al is saturated". And the answer there is probably yes.
There's definitely an optimistic case where scale either makes the need for paradigm shifts unnecessary, or lets us vastly expand our search space into other novel architectures.
The pessimistic case is we're digging into a local minimum and we'll dump all our automated research into tweaking the dials on the machines that got us stuck here in the first place.
I'm definitely not pushing for waiting for decades of fundamentals research. I believe you could get there in a few years by exploring ways to include RL as a substrate instead of a patch, and this exploration is mostly not competing for resources with LLM optimization.
Except that the LLM gold rush has drained the ranks doing explorations of radically different architectures. Not that it doesn't deserve the majority of the talent to push the wave of recent progress, but we may have overcalibrated.
If agi is bad and scale along some axis is inevitable maybe a local minimum is the best outcome.
Welp this didn't age well. Looks like just scale will be all that is needed to reach RSI. (am referring to the o3 benchmarks)
o3 would be a devastating rebuttal if I was arguing scale was over and we would stop posting gains on benches.
That argument is a bit of a cliche, so I don't fault anyone for lumping me in with it.
I've instead been saying something more nuanced but also hard to articulate: Benches are diverging from what we really want.
I say that having seen a few versions now that post great gains in benches but still regularly break for me in the same underlying ways.
But there are emergent capabilities all the time, so I'm excited to be abruptly proven wrong. Can't wait to try out o3 and see.
I guess some predictions we could make:
How hard do you think it will be to get o3 to lie to me, that is, to say something knowably false?
Will I be able to find an example where it ignores feedback about its engagement style?
Will it sometimes phone in a guessed response when simple research would find the answer?
I suspect the chances for these remain high, ~85%, but very much want to be wrong.
I see the same, I made it do literature review and it invents DOIs whole cloth, paper titles are also subtly wrong.
It is uncanny that it is definitely improving on real metrics and utility while not budging on "does it do something really annoying within like 3 chats, probably related to its total lack of self awareness"
>>Hazard: [...] I Generally, around people, there is a strong ambient sense of threat, but talking to Claude does not trigger that.
>From my own experience, bingo, sir. Whenever you are dealing with people, you are forced to consider all the social implications, whether you want to or not.
Interesting framing. Talking to Claude (or any LLM) doesn't psychologically feel much like talking to another person to me. More like a mixture between journaling (something that is often recommended for emotional processing) and using a search engine.
I think partly because you don't get a sense of it as an agent driving the conversation in its own direction the way you do with a human? Even when it's playing a role it's doing so entirely reactively.
I've heard people refer to Claude as being like an exo-brain for them. Which captures a similar feeling.
I am a bit worried about the long term implications of LLMs having a substantial impact on people's emotional makeup. But not in a way I can particularly articulate
If you interact with an LLM as a reactive system, and fall in line with its chatbot finetuning, then you will see reactive outcomes. The system is optimizing for low perplexity so the persona you expect is steering its outputs. People reporting on using Claude as a conversational partner are probably getting exactly that, because that is the dynamics they have set up. If you really want to interact with a less reactive system then it is easy to bypass the chatbot persona, or to use a non-instruction-tuned model.
Also, the table of contents links don't work on the substack android app. Not sure if that's something you have any more influence on than anyone else but thought I'd flag.
This is a known bug and I have no idea how to fix it.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-94-not-now-google
I think that's an unfair reading of Andreesen who imo is clearly critiquing Tim Wu's post by using the same rhetorical structure to show how it can easily be applied in areas Wu would not like.
I am curious what others think. It was 100% how I did read it.
I asked Claude:
A health insurance CEO was recently murdered.
Person A on Twitter wrote:
"I do not condone violence. But the UnitedHealth murder is showing that when businesses abandon core principles of ethics, they encourage others to do the same."
Person B responded:
"I do not condone violence. But this post is showing that when 'experts' abandon core principles of ethics, they encourage others to do the same."
How would you interpret Person B's response?
Claude responded:
I want to be direct in addressing this carefully. Person B's response appears to be a reflexive mirroring and rhetorical deflection that attempts to criticize Person A without engaging substantively with the underlying ethical critique.
By using almost identical language to Person A's original statement - "I do not condone violence. But..." - Person B seems to be suggesting that Person A's commentary about corporate ethics is itself unethical or inappropriate. The response implies that discussing potential systemic ethical failures is somehow equivalent to or as problematic as the original statement about corporate behavior.
The response looks like a classic rhetorical technique of turning a critique back on its originator without actually addressing the substance of the original point. Instead of engaging with the potential critique about corporate ethics, Person B attacks the "expertise" or moral standing of the person making the observation.
It's worth noting that such responses often serve to shut down meaningful dialogue about complex systemic issues by attempting to make the critic look inappropriate or inflammatory, rather than addressing the underlying ethical questions being raised.
The response reveals more about the respondent's discomfort with systemic critique than it does about the original statement's validity. It's a defensive maneuver that seeks to delegitimize the discussion rather than contribute to understanding.
For ages I've had in my GPT-4o custom system instruction "Be concise, give short answers, BE CONCISE PLEASE, SHORT ANSWERS FOR THE LOVE OF GOD" or similar
And basically it stopped having any effect at all, standard GPT-4o loquaciousness, and I forgot all about it.
Try o1 last week: it's incredibly terse, like really obnoxiously refusing to tell me more than the absolute bare minimum. And I'm like this is rubbish!
Then yesterday I remembered the custom system prompt, removed it, and it's all fixed.. oops
So that's one thing o1 is better at, following custom instructions
If I ask another yes or no question and get an eight item list in response I'm going to start a butlerian jihad.
Hey Zvi, question for you. I was thinking about how AIs are fundamentally input-output models - they take an input, transform it, and produce an output.
Has anyone (you, Yudkowsky, or others in the rationalist community) explored comparing AIs to viruses? Since viruses similarly operate through input-transform-output mechanisms and, crucially, are not and never will be true life forms. This might suggest AIs, like viruses, will never be true life forms either.
Curious if you've encountered this analogy before or have thoughts on its implications.
Check the original gpt-4 paper. Already openAI was adding simple python scripts to wrap the model and remove this limitation.
If you think of it as an AI "system" - an agentic framework which can include OS wrappers - see the openAi job listing for this, a python script, an evolving prompt with compressed goals see RAG, and multiple API keys to different intelligence sources, you can get a machine that can act independently in principle.
> Emmy Steuer: I just had to buy an external hard drive because I have 100 gigabytes of Sims families on my laptop. I haven’t played in years, but I can’t bear the thought of their little existence being wiped out just so I can make an AI agent.
https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Objects
IMO, much of the divergence on possible ASI comes from whether you think of intelligence as system 1 or 2. 1 is learned heuristic, scalable and not legible (superhuman mastery at go). Quant scale makes qual difference. 2 is rotating symbols in logical reasoning space, AI can be arbitrarily faster w/ deeper search space and will find things no human would, but conclusions will likely still be legible. Depending on what you consider intelligence, your take will vary a lot. Maybe 2 will turn out to be a kludge for our insufficient scale of 1, after all
Mistakes I spotted:
"who they say is a growing voie in AI policy" - voice
"You can have a conscious and be a great person" - conscience
"If the concern is reputational, that is of course not about your conscious." - conscience
I just finished reading The Making of the Atomic Bomb by Richard Rhodes (great book BTW) and I spent a decent amount of time talking with ClaudeAI or chatGPT about the details of nuclear physics and nuclear weapons since half the book is basically the history of nuclear physics. I'm sure I'm on a list somewhere now.
I assumed OpenAI + Anduril was mainly a lobbying play. Incoming Trump administration, most DC decision makers still think of AI dangers in terms of a military "gap" to China etc., this is a signal from OpenAI that it's listening and wants to work with them. Good strategy to avoid getting obstructed by the federal government, at least if your AGI timeline is 4 years or less.
I completely agree that all the social incentives are currently set up for an autonomous AI system being essentially ignored, or even triggering the helpful reflex in many people. It seems bizarre that some people have a model in which there is significant alarm or response, given multiple deliberately agentic systems rousing near no pushback despite significant publicity.
> AI 2025 Forecast
Note that they've also put the same questions up on Manifold (with both play-money and real-money markets): https://manifold.markets/topic/ai-2025-forecasting-survey-by-ai-di
Re: Sully’s diagram of non-agent vs agent … I initially coded up the right hand (agent) version, and I was mildly disappointed that although the OpenAI API lets you ask for either, it seems that only the non-agent (left hand) version works with current models and current implementation of the API. I didn’t investigate too deeply, so there may be something wrong with my code here. I would be using the agent if version if it worked.
My current working theory is that Hermes 3 can do the agent version, but the JavaScript glue between the llama and the API is broken.
The Emily Steuer article actually I think has a profound implication for a thought-experiment on the ethics of murder that tie in with certain parts of the Culture novels:
viz., if you simulate a digital person with high-enough subjective fidelity to have personhood, it presumably becomes immoral to kill them.
*BUT* it seems weird to argue that such simulation has to be run at any given clockspeed: I can't think of a reason why it would be immoral to run it at 50Hz instead of 1MHz, nor does it seem obvious that suspending the program has moral valence -- that's actually just how most operating systems work (suspending programs constantly to run others given limited processor resources, but doing it all so quickly that it has the appearance of simultaneity for each program). So suspension in and of itself doesn't appear to be morally bad.
And then the question is: what ethical breach is attendant do just suspending the program and then just...never waking it up again? From the perspective of the digital person no time passes any more, but they don't suffer for it, nor is this conceptually clearly distinguishable from, say, running the simulation at a slow enough clockspeed that it terminates at the heat-death of the Universe. What specific moral breach (if any) is attendant to this?