34 Comments
User's avatar
Rapa-Nui's avatar

"Not flipping that coin is absurd."

Depends on your baseline. I think for the vast majority of people, life is a good enough experience day-to-day, that the proposed bet is untenable. Now, if I was a starving person in some 3rd world shitshow, yeah HUGE upside, I'm likely to die in a year anyway. But it would have to be a pretty dire shitshow for me to look at that coin with anything but dread.

Expand full comment
Sanoy1997's avatar

I would think for the vast majority of people their current living condtion is below half as good as it could be and they have less then half of the ressources they could possible use.

I admit the rare person might exist for this to not be the case, but it would surprise me

IMO even the worlds most happies person with the most ressources and the best life would come out on top on average when flipping the coin because this hypothetical does not care about the top of the current living people

Expand full comment
Shockz's avatar

I mean...even for literally infinite expected return, the Kelly bet on a coin flip is still only half your remaining lifespan (or however you want to calculate "half the remaining utility in your life").

Expand full comment
Sanoy1997's avatar

I do think i would live longer when living my best possible life as oposed to how i am currently living

Probably significantly longer

Expand full comment
Jeffrey Soreff's avatar

Hmm... as a 66-year-old, feeling assorted subsystems act up more and more: Even if my lifespan was unchanged, getting my youthful body back for that period of time would be worth the 50:50 odds.

Expand full comment
The Last Moderate's avatar

I am not a "Bayesian" to calculate probabilities from priors but it feels like anyone who *would* eagerly take that gamble would probably also be sanguine about the chance for utopia coupled with existential risk.

Expand full comment
Sanoy1997's avatar

If they are equally likely, instead of the far higher existential risk then utopia chance it currenrly looks like

Expand full comment
jpr's avatar

I think it's underspecified. In the best possible life I can imagine living, with infinite resources, I would solve many of the world's problems; so of course I am obligated to flip that coin. Whereas if it's just about eating out at nice restaurants I would not.

Expand full comment
Rapa-Nui's avatar

It's one thing to say you're "morally obligated" for Utilitarian reasons and actually going through with it. Like, let's say I told you you could delay the coin flip for a few days. Sure, some problems will go unsolved and people would die, but you wouldn't ask for some time to think about it some more? You'd just grab that sucker and flip?

Naw.

Expand full comment
Victualis's avatar

Precisely, the bet doesn't make sense for anyone already living their best life. For someone chafing against daily resource constraints, or who thinks they should be producing much more than they are, it's a good bet. Zvi would probably argue everyone is always in the latter category but I think that is just NYC Calvinism.

Expand full comment
Kevin's avatar

I had the same experience with Google's image modification, I asked it to do something mundane and it rejected my request for "safety reasons". They must have known this sucked before they launched it. It's like they aren't even trying to make it a good product, they are okay with achieving some cool benchmarks while releasing a product that sucks.

Expand full comment
loonloozook's avatar

Does turning off "Safety Settings" help?

Expand full comment
Kevin's avatar

No.

Expand full comment
Matt Wigdahl's avatar

It would be interesting for the Claude Plays Pokemon folks to name one of the Pokemon "Stalin" and see if Claude was as diligent healing it as the others.

Expand full comment
Matt Wigdahl's avatar

Zvi, any chance you have a deep dive into Claude Code being planned? The tool seems fantastic based on the limited amount I've been able to use it to date, but it seems to have fallen through the cracks while hypeflash like Manus takes center stage.

Expand full comment
Zvi Mowshowitz's avatar

I don't really have sources reporting much on it? And it's not like I can get enough hands-on experience to actually report back...

Expand full comment
Victualis's avatar

It should be a matter of something like '/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"; brew install npm; npm install -g @anthropic-ai/claude-code' once your Mac arrives (if you are prepared to run code from github in your shell, trust prebuilt brew casks, and are prepared to accept the risks of running a Claude agent). I would definitely consider auditing each step here and also sandboxing the way you are planning to. It is still possible to build from source but brew makes it a pain, and auditing each step takes ages.

Expand full comment
Auspicious's avatar

Yudkowsky is wrong about AI fueling another "social media bubble catastrophe".

Truly bad filter bubbles don't result from simply being exposed to views that you agree with. Rather, they result from being selectively exposed to the *worst* examples of views you *disagree* with.

Expand full comment
Melon Usk - e/uto's avatar

I think with your productivity, you'll be the first to code direct democratic simulated multiverse and save us all from AI agents

Expand full comment
Coagulopath's avatar

>GPT-4.5 is not AGI as we currently understand it, or for the purposes of ‘things go crazy next Tuesday,’ but it does seem likely that researchers in 2015 would see its outputs and think of it as an AGI.

I'm unsure what conclusions we can draw from this: people in the Middle Ages might regard electricity as evidence of witchcraft, but that doesn't prove computers run on magic. Sometimes people in the past believe wrong things.

Yes, 2015 researchers would probably think an AGI had written modern LLM outputs. But they'd also expect an AGI to have many other capabilities that LLMs still don't at the moment—they'd expect an AGI to be able to autonomously run a business, play arbitrary videogames not in training data, navigate a robotic body around the world, and so on.

Expand full comment
Sean's avatar

Is a contest, eval or event (in my head I think of "DARPA Grand Challenge", like the old autonomous driving one) a possible way to think about motivating safety development or alignment progress? Would it be possible to craft success metrics robust against Goodharting?

My sense is that we don't know where to put the goalposts, and we aren't sure if we should put them up at all. With the autonomous vehicle competition you could set an actual finish line. Love it, so tangible, literally concrete.

In this case I worry it would be isomorphic (directly map to) training explicitly on alignment via RLHF unless you were extremely careful, clever or both. Otherwise the contest just produces things that are good at faking your flawed metrics.

Another example that comes to mind is the ImageNet Recognition Challenge, but again I think I'm having a context error by thinking about challenges with clear success criteria and known labeled solutions.

Expand full comment
Jeffrey Soreff's avatar

Re: "In this case, the claim is that [X] is ‘have unique insights.’ As in, sure an LLM will be able to be an A+ student and know the ultimate answer is 42, but won’t know the right question, so it won’t be all that useful. Certainly LLMs are relatively weaker there. At minimum, if you can abstract away the rest of the job, then that leaves a lot more space for the humans to provide the unique insights - most of even the best scientists spend most of their time on other things."

There is a nice existence proof for an AI being able to construct "insights":

https://www.facebook.com/ScienceNaturePage/posts/a-decade-of-research-vs-48-hours-of-ai-weve-entered-the-era-where-ai-systems-sol/1172261607688040/

Hashem Al-Ghaili's Post:

A decade of research vs. 48 hours of AI — We've entered the era where AI systems solve the biggest scientific questions overnight.

After a decade of painstaking research, scientists at Imperial College London still hadn't solved one of the biggest superbug mysteries. They put Google’s AI tool, "co-scientist," on the problem, and it solved it in just two days.

The AI confirmed their hypothesis that certain antibiotic-resistant bacteria use a tail-like structure to jump between species, acting like a "master key." What amazed researchers even more was that the AI also proposed four additional plausible explanations — one of which had never even crossed their minds and is now under investigation.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks! I think I read that post of Zvi's, but somehow forgot that he had already covered this example/existence proof. Oops - well, better redundant than wrong...

Expand full comment
Sylvain Ribes's avatar

Taking the flip feels to me like a morally wrong thing to do when you have children.

Expand full comment
Curious mathematician's avatar

Agreed, and your comment makes me wonder if Zvi has really thought this through on a personal level (I don't think he would flip). But perhaps Zvi's point is not that he thinks flipping is the right choice, but that MacAskill is being naive. There are plenty people who would flip the coin even if the negative outcome was the extinction of humanity and the positive outcome benefited only them. This is precisely the challenge we face.

Expand full comment
Alex Scorer's avatar

Yeah, I would flip but only with the magical caveat that none of my family are affected if I lose - so essentially looking at the gamble in total isolation and not representative of most people's real life circumstances.

Expand full comment
SOMEONE's avatar

So Google now offers 5 Deep Research queries per month for free plus rebased it on Gemini 2.0 Thinking. Short test today went better than OpenAI, that one somehow garbled the output and omitted chapters 1-4 (according to it's self devised counting!). Gemini's way of doing research planning frankly seems more helpful to me than ChatGPTs...

Expand full comment
SOMEONE's avatar

Played a bit with Gemma3 27b. Laughably bad and I have a fair bit of tolerance for even Gemini 2, at least it's fast, Gemma isn't. Like, it tells me it's cutoff date is September 2021 which is surely wrong given it knows Biden and Trump ran for president in 2024, so possibly mid 2024 cutoff?).

If you run it in Ollama, it will upfront say it has no access to tools but when challenged on obvious nonsense it promises to flag issues to the Gemma team.

Expand full comment
Jeffrey Soreff's avatar

"Very obviously one can imagine a plausible AI that can function as an AGI, but that has some obvious weakness where you can generate adversarial challenges."

Perhaps one useful threshold to consider, slightly short of AGI in the sense of not requiring the AI to be capable of _all_ human capabilities, is an AI capable of performing all roles required to build a copy (or extension) of itself. That is enough to make it potentially a competing species. Short of that, and for any long-term goals (on the time scale of e.g. chips failing) it still needs humans.

Expand full comment
Aaron Bilger's avatar

There were dark times in my life when I would have flipped the coin.

And I understand how anyone terminally ill, or say forced to live in a war-torn hellscape, could choose to flip.

However, for anyone else, flipping solely because of simple math on expected lifespan (or even QALY) seems naively wrong. There *should* be significant weighting based on current, existent values.

Trade-off a chance of maybe a few years extra life, and likely higher quality - for equal chance of never watching my kid grow up? Of suddenly abandoning my wife, child, and other family and friends??? Choosing to flip the coin is arguably sociopathic in real-world family contexts.

Further, I cannot understand how anyone who would choose to flip the coin would not also have a pure accelerationist attitude regarding AI.

Say there's a 95% chance ASI means you (and other humans) are dead by 2045, a 0.000001% chance of AI utopia - where we all get uploaded (or other means) and blessed with billion-year lifespans with near unlimited resources - and other chances and outcomes somewhere in between.

A similar simplistic math handling of this scenario would say 'flip the coin' and race to ASI. Any sane, weighted for existing values, approach would not.

Am I missing some other reason which could justify adopting one approach for the coin flip, but a completely different one for ASI?

Expand full comment
Rick Schwall's avatar

I think an unstated undercurrent is that the average person wouldn't believe the upside would actually be delivered. Even if they've not heard of Pascal's Mugging, they've learned to be cautious about promises too good to be true.

Expand full comment
Mark Russell's avatar

don't flip the coin, hang around with us a little longer, we like and need you

Expand full comment