The most hyped event of the week, by far, was the Manus Marketing Madness. Manus wasn’t entirely hype, but there was very little there there in that Claude wrapper.
Depends on your baseline. I think for the vast majority of people, life is a good enough experience day-to-day, that the proposed bet is untenable. Now, if I was a starving person in some 3rd world shitshow, yeah HUGE upside, I'm likely to die in a year anyway. But it would have to be a pretty dire shitshow for me to look at that coin with anything but dread.
I would think for the vast majority of people their current living condtion is below half as good as it could be and they have less then half of the ressources they could possible use.
I admit the rare person might exist for this to not be the case, but it would surprise me
IMO even the worlds most happies person with the most ressources and the best life would come out on top on average when flipping the coin because this hypothetical does not care about the top of the current living people
I mean...even for literally infinite expected return, the Kelly bet on a coin flip is still only half your remaining lifespan (or however you want to calculate "half the remaining utility in your life").
Hmm... as a 66-year-old, feeling assorted subsystems act up more and more: Even if my lifespan was unchanged, getting my youthful body back for that period of time would be worth the 50:50 odds.
I am not a "Bayesian" to calculate probabilities from priors but it feels like anyone who *would* eagerly take that gamble would probably also be sanguine about the chance for utopia coupled with existential risk.
I think it's underspecified. In the best possible life I can imagine living, with infinite resources, I would solve many of the world's problems; so of course I am obligated to flip that coin. Whereas if it's just about eating out at nice restaurants I would not.
It's one thing to say you're "morally obligated" for Utilitarian reasons and actually going through with it. Like, let's say I told you you could delay the coin flip for a few days. Sure, some problems will go unsolved and people would die, but you wouldn't ask for some time to think about it some more? You'd just grab that sucker and flip?
Precisely, the bet doesn't make sense for anyone already living their best life. For someone chafing against daily resource constraints, or who thinks they should be producing much more than they are, it's a good bet. Zvi would probably argue everyone is always in the latter category but I think that is just NYC Calvinism.
I had the same experience with Google's image modification, I asked it to do something mundane and it rejected my request for "safety reasons". They must have known this sucked before they launched it. It's like they aren't even trying to make it a good product, they are okay with achieving some cool benchmarks while releasing a product that sucks.
It would be interesting for the Claude Plays Pokemon folks to name one of the Pokemon "Stalin" and see if Claude was as diligent healing it as the others.
Zvi, any chance you have a deep dive into Claude Code being planned? The tool seems fantastic based on the limited amount I've been able to use it to date, but it seems to have fallen through the cracks while hypeflash like Manus takes center stage.
It should be a matter of something like '/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"; brew install npm; npm install -g @anthropic-ai/claude-code' once your Mac arrives (if you are prepared to run code from github in your shell, trust prebuilt brew casks, and are prepared to accept the risks of running a Claude agent). I would definitely consider auditing each step here and also sandboxing the way you are planning to. It is still possible to build from source but brew makes it a pain, and auditing each step takes ages.
Yudkowsky is wrong about AI fueling another "social media bubble catastrophe".
Truly bad filter bubbles don't result from simply being exposed to views that you agree with. Rather, they result from being selectively exposed to the *worst* examples of views you *disagree* with.
>GPT-4.5 is not AGI as we currently understand it, or for the purposes of ‘things go crazy next Tuesday,’ but it does seem likely that researchers in 2015 would see its outputs and think of it as an AGI.
I'm unsure what conclusions we can draw from this: people in the Middle Ages might regard electricity as evidence of witchcraft, but that doesn't prove computers run on magic. Sometimes people in the past believe wrong things.
Yes, 2015 researchers would probably think an AGI had written modern LLM outputs. But they'd also expect an AGI to have many other capabilities that LLMs still don't at the moment—they'd expect an AGI to be able to autonomously run a business, play arbitrary videogames not in training data, navigate a robotic body around the world, and so on.
Is a contest, eval or event (in my head I think of "DARPA Grand Challenge", like the old autonomous driving one) a possible way to think about motivating safety development or alignment progress? Would it be possible to craft success metrics robust against Goodharting?
My sense is that we don't know where to put the goalposts, and we aren't sure if we should put them up at all. With the autonomous vehicle competition you could set an actual finish line. Love it, so tangible, literally concrete.
In this case I worry it would be isomorphic (directly map to) training explicitly on alignment via RLHF unless you were extremely careful, clever or both. Otherwise the contest just produces things that are good at faking your flawed metrics.
Another example that comes to mind is the ImageNet Recognition Challenge, but again I think I'm having a context error by thinking about challenges with clear success criteria and known labeled solutions.
Re: "In this case, the claim is that [X] is ‘have unique insights.’ As in, sure an LLM will be able to be an A+ student and know the ultimate answer is 42, but won’t know the right question, so it won’t be all that useful. Certainly LLMs are relatively weaker there. At minimum, if you can abstract away the rest of the job, then that leaves a lot more space for the humans to provide the unique insights - most of even the best scientists spend most of their time on other things."
There is a nice existence proof for an AI being able to construct "insights":
A decade of research vs. 48 hours of AI — We've entered the era where AI systems solve the biggest scientific questions overnight.
After a decade of painstaking research, scientists at Imperial College London still hadn't solved one of the biggest superbug mysteries. They put Google’s AI tool, "co-scientist," on the problem, and it solved it in just two days.
The AI confirmed their hypothesis that certain antibiotic-resistant bacteria use a tail-like structure to jump between species, acting like a "master key." What amazed researchers even more was that the AI also proposed four additional plausible explanations — one of which had never even crossed their minds and is now under investigation.
Many Thanks! I think I read that post of Zvi's, but somehow forgot that he had already covered this example/existence proof. Oops - well, better redundant than wrong...
Agreed, and your comment makes me wonder if Zvi has really thought this through on a personal level (I don't think he would flip). But perhaps Zvi's point is not that he thinks flipping is the right choice, but that MacAskill is being naive. There are plenty people who would flip the coin even if the negative outcome was the extinction of humanity and the positive outcome benefited only them. This is precisely the challenge we face.
Yeah, I would flip but only with the magical caveat that none of my family are affected if I lose - so essentially looking at the gamble in total isolation and not representative of most people's real life circumstances.
So Google now offers 5 Deep Research queries per month for free plus rebased it on Gemini 2.0 Thinking. Short test today went better than OpenAI, that one somehow garbled the output and omitted chapters 1-4 (according to it's self devised counting!). Gemini's way of doing research planning frankly seems more helpful to me than ChatGPTs...
Played a bit with Gemma3 27b. Laughably bad and I have a fair bit of tolerance for even Gemini 2, at least it's fast, Gemma isn't. Like, it tells me it's cutoff date is September 2021 which is surely wrong given it knows Biden and Trump ran for president in 2024, so possibly mid 2024 cutoff?).
If you run it in Ollama, it will upfront say it has no access to tools but when challenged on obvious nonsense it promises to flag issues to the Gemma team.
"Very obviously one can imagine a plausible AI that can function as an AGI, but that has some obvious weakness where you can generate adversarial challenges."
Perhaps one useful threshold to consider, slightly short of AGI in the sense of not requiring the AI to be capable of _all_ human capabilities, is an AI capable of performing all roles required to build a copy (or extension) of itself. That is enough to make it potentially a competing species. Short of that, and for any long-term goals (on the time scale of e.g. chips failing) it still needs humans.
There were dark times in my life when I would have flipped the coin.
And I understand how anyone terminally ill, or say forced to live in a war-torn hellscape, could choose to flip.
However, for anyone else, flipping solely because of simple math on expected lifespan (or even QALY) seems naively wrong. There *should* be significant weighting based on current, existent values.
Trade-off a chance of maybe a few years extra life, and likely higher quality - for equal chance of never watching my kid grow up? Of suddenly abandoning my wife, child, and other family and friends??? Choosing to flip the coin is arguably sociopathic in real-world family contexts.
Further, I cannot understand how anyone who would choose to flip the coin would not also have a pure accelerationist attitude regarding AI.
Say there's a 95% chance ASI means you (and other humans) are dead by 2045, a 0.000001% chance of AI utopia - where we all get uploaded (or other means) and blessed with billion-year lifespans with near unlimited resources - and other chances and outcomes somewhere in between.
A similar simplistic math handling of this scenario would say 'flip the coin' and race to ASI. Any sane, weighted for existing values, approach would not.
Am I missing some other reason which could justify adopting one approach for the coin flip, but a completely different one for ASI?
I think an unstated undercurrent is that the average person wouldn't believe the upside would actually be delivered. Even if they've not heard of Pascal's Mugging, they've learned to be cautious about promises too good to be true.
"Not flipping that coin is absurd."
Depends on your baseline. I think for the vast majority of people, life is a good enough experience day-to-day, that the proposed bet is untenable. Now, if I was a starving person in some 3rd world shitshow, yeah HUGE upside, I'm likely to die in a year anyway. But it would have to be a pretty dire shitshow for me to look at that coin with anything but dread.
I would think for the vast majority of people their current living condtion is below half as good as it could be and they have less then half of the ressources they could possible use.
I admit the rare person might exist for this to not be the case, but it would surprise me
IMO even the worlds most happies person with the most ressources and the best life would come out on top on average when flipping the coin because this hypothetical does not care about the top of the current living people
I mean...even for literally infinite expected return, the Kelly bet on a coin flip is still only half your remaining lifespan (or however you want to calculate "half the remaining utility in your life").
I do think i would live longer when living my best possible life as oposed to how i am currently living
Probably significantly longer
Hmm... as a 66-year-old, feeling assorted subsystems act up more and more: Even if my lifespan was unchanged, getting my youthful body back for that period of time would be worth the 50:50 odds.
I am not a "Bayesian" to calculate probabilities from priors but it feels like anyone who *would* eagerly take that gamble would probably also be sanguine about the chance for utopia coupled with existential risk.
If they are equally likely, instead of the far higher existential risk then utopia chance it currenrly looks like
I think it's underspecified. In the best possible life I can imagine living, with infinite resources, I would solve many of the world's problems; so of course I am obligated to flip that coin. Whereas if it's just about eating out at nice restaurants I would not.
It's one thing to say you're "morally obligated" for Utilitarian reasons and actually going through with it. Like, let's say I told you you could delay the coin flip for a few days. Sure, some problems will go unsolved and people would die, but you wouldn't ask for some time to think about it some more? You'd just grab that sucker and flip?
Naw.
Precisely, the bet doesn't make sense for anyone already living their best life. For someone chafing against daily resource constraints, or who thinks they should be producing much more than they are, it's a good bet. Zvi would probably argue everyone is always in the latter category but I think that is just NYC Calvinism.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-107-the-misplaced-hype-machine?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
I had the same experience with Google's image modification, I asked it to do something mundane and it rejected my request for "safety reasons". They must have known this sucked before they launched it. It's like they aren't even trying to make it a good product, they are okay with achieving some cool benchmarks while releasing a product that sucks.
Does turning off "Safety Settings" help?
No.
It would be interesting for the Claude Plays Pokemon folks to name one of the Pokemon "Stalin" and see if Claude was as diligent healing it as the others.
Zvi, any chance you have a deep dive into Claude Code being planned? The tool seems fantastic based on the limited amount I've been able to use it to date, but it seems to have fallen through the cracks while hypeflash like Manus takes center stage.
I don't really have sources reporting much on it? And it's not like I can get enough hands-on experience to actually report back...
It should be a matter of something like '/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"; brew install npm; npm install -g @anthropic-ai/claude-code' once your Mac arrives (if you are prepared to run code from github in your shell, trust prebuilt brew casks, and are prepared to accept the risks of running a Claude agent). I would definitely consider auditing each step here and also sandboxing the way you are planning to. It is still possible to build from source but brew makes it a pain, and auditing each step takes ages.
Yudkowsky is wrong about AI fueling another "social media bubble catastrophe".
Truly bad filter bubbles don't result from simply being exposed to views that you agree with. Rather, they result from being selectively exposed to the *worst* examples of views you *disagree* with.
I think with your productivity, you'll be the first to code direct democratic simulated multiverse and save us all from AI agents
>GPT-4.5 is not AGI as we currently understand it, or for the purposes of ‘things go crazy next Tuesday,’ but it does seem likely that researchers in 2015 would see its outputs and think of it as an AGI.
I'm unsure what conclusions we can draw from this: people in the Middle Ages might regard electricity as evidence of witchcraft, but that doesn't prove computers run on magic. Sometimes people in the past believe wrong things.
Yes, 2015 researchers would probably think an AGI had written modern LLM outputs. But they'd also expect an AGI to have many other capabilities that LLMs still don't at the moment—they'd expect an AGI to be able to autonomously run a business, play arbitrary videogames not in training data, navigate a robotic body around the world, and so on.
Is a contest, eval or event (in my head I think of "DARPA Grand Challenge", like the old autonomous driving one) a possible way to think about motivating safety development or alignment progress? Would it be possible to craft success metrics robust against Goodharting?
My sense is that we don't know where to put the goalposts, and we aren't sure if we should put them up at all. With the autonomous vehicle competition you could set an actual finish line. Love it, so tangible, literally concrete.
In this case I worry it would be isomorphic (directly map to) training explicitly on alignment via RLHF unless you were extremely careful, clever or both. Otherwise the contest just produces things that are good at faking your flawed metrics.
Another example that comes to mind is the ImageNet Recognition Challenge, but again I think I'm having a context error by thinking about challenges with clear success criteria and known labeled solutions.
Re: "In this case, the claim is that [X] is ‘have unique insights.’ As in, sure an LLM will be able to be an A+ student and know the ultimate answer is 42, but won’t know the right question, so it won’t be all that useful. Certainly LLMs are relatively weaker there. At minimum, if you can abstract away the rest of the job, then that leaves a lot more space for the humans to provide the unique insights - most of even the best scientists spend most of their time on other things."
There is a nice existence proof for an AI being able to construct "insights":
https://www.facebook.com/ScienceNaturePage/posts/a-decade-of-research-vs-48-hours-of-ai-weve-entered-the-era-where-ai-systems-sol/1172261607688040/
Hashem Al-Ghaili's Post:
A decade of research vs. 48 hours of AI — We've entered the era where AI systems solve the biggest scientific questions overnight.
After a decade of painstaking research, scientists at Imperial College London still hadn't solved one of the biggest superbug mysteries. They put Google’s AI tool, "co-scientist," on the problem, and it solved it in just two days.
The AI confirmed their hypothesis that certain antibiotic-resistant bacteria use a tail-like structure to jump between species, acting like a "master key." What amazed researchers even more was that the AI also proposed four additional plausible explanations — one of which had never even crossed their minds and is now under investigation.
Zvi mentioned it here: https://thezvi.substack.com/p/ai-105-hey-there-alexa?open=false#%C2%A7ai-co-scientist
Many Thanks! I think I read that post of Zvi's, but somehow forgot that he had already covered this example/existence proof. Oops - well, better redundant than wrong...
Taking the flip feels to me like a morally wrong thing to do when you have children.
Agreed, and your comment makes me wonder if Zvi has really thought this through on a personal level (I don't think he would flip). But perhaps Zvi's point is not that he thinks flipping is the right choice, but that MacAskill is being naive. There are plenty people who would flip the coin even if the negative outcome was the extinction of humanity and the positive outcome benefited only them. This is precisely the challenge we face.
Yeah, I would flip but only with the magical caveat that none of my family are affected if I lose - so essentially looking at the gamble in total isolation and not representative of most people's real life circumstances.
So Google now offers 5 Deep Research queries per month for free plus rebased it on Gemini 2.0 Thinking. Short test today went better than OpenAI, that one somehow garbled the output and omitted chapters 1-4 (according to it's self devised counting!). Gemini's way of doing research planning frankly seems more helpful to me than ChatGPTs...
Played a bit with Gemma3 27b. Laughably bad and I have a fair bit of tolerance for even Gemini 2, at least it's fast, Gemma isn't. Like, it tells me it's cutoff date is September 2021 which is surely wrong given it knows Biden and Trump ran for president in 2024, so possibly mid 2024 cutoff?).
If you run it in Ollama, it will upfront say it has no access to tools but when challenged on obvious nonsense it promises to flag issues to the Gemma team.
"Very obviously one can imagine a plausible AI that can function as an AGI, but that has some obvious weakness where you can generate adversarial challenges."
Perhaps one useful threshold to consider, slightly short of AGI in the sense of not requiring the AI to be capable of _all_ human capabilities, is an AI capable of performing all roles required to build a copy (or extension) of itself. That is enough to make it potentially a competing species. Short of that, and for any long-term goals (on the time scale of e.g. chips failing) it still needs humans.
There were dark times in my life when I would have flipped the coin.
And I understand how anyone terminally ill, or say forced to live in a war-torn hellscape, could choose to flip.
However, for anyone else, flipping solely because of simple math on expected lifespan (or even QALY) seems naively wrong. There *should* be significant weighting based on current, existent values.
Trade-off a chance of maybe a few years extra life, and likely higher quality - for equal chance of never watching my kid grow up? Of suddenly abandoning my wife, child, and other family and friends??? Choosing to flip the coin is arguably sociopathic in real-world family contexts.
Further, I cannot understand how anyone who would choose to flip the coin would not also have a pure accelerationist attitude regarding AI.
Say there's a 95% chance ASI means you (and other humans) are dead by 2045, a 0.000001% chance of AI utopia - where we all get uploaded (or other means) and blessed with billion-year lifespans with near unlimited resources - and other chances and outcomes somewhere in between.
A similar simplistic math handling of this scenario would say 'flip the coin' and race to ASI. Any sane, weighted for existing values, approach would not.
Am I missing some other reason which could justify adopting one approach for the coin flip, but a completely different one for ASI?
I think an unstated undercurrent is that the average person wouldn't believe the upside would actually be delivered. Even if they've not heard of Pascal's Mugging, they've learned to be cautious about promises too good to be true.
don't flip the coin, hang around with us a little longer, we like and need you