I want to say it's Babylon 5 Season 3...but last year he labeled the post "4" and used the Season 2 monologue, so something about that thought is wrong.
(Also, the last line is supposed to be "The year is 2026. The place: Lighthaven." My fandom OCD is making whining noises)
I think it’s “years left to High Weirdness”. Last year it was 4. he admits in the article that after this year 2029 may no longer be his median outcome timeline, but it’s close enough that he fees no need to abandon his commitment to the bit.
Well, GPQA Diamond seems saturated, and so in my tiny benchmark-ette ( https://thezvi.substack.com/p/gemini-3-model-card-and-safety-framework/comment/179830167 ). The next large challenge for AI appears to be continual/incremmental learning. Any suggestions for following progress in it? Are there benchmarks for this capability (and maybe a leaderboard?) yet?
Continual/incremental learning, if achieved, seems like it might plausibly clean up remaining hallucinations and reasoning failures. "I won't make _that_ mistake again." can go a long way towards filling in the gaps in the spiky capabilities of AI.
Note that Opus in Cursor is insanely productive as well. AI writes well over 90% of my code now; was the reverse of this merely 1 year ago. I problem solve at a higher level now; making sure the AI respects invariants, keeping the layering clean, helping it design novel algorithms, etc.
> for $20/month you can get essentially unlimited chat access
I question this, at least for Claude. I have a Claude Pro subscription and lately seem to burn through my usage limit *extremely* quickly, to the tune of 5-15%/message. A few searches seem to suggest it's a known issue, but I gather from this comment that *you* haven't run into it, so...?
In India (and probably other non-US countries), when you get a meal delivery, you have to provide the delivery driver with a one-time code for the transaction to be completed. Four digits, no need for any deepfake detectors. I'm not sure why the US can't just use this method. Innumeracy? Arithmaphobia?
I get this occasionally on my orders via Grubhub in SF, but it's not a consistently enabled feature (A/B testing?) and drivers often don't seem to care about the code anyway. Which does point to some fear of math allergy, since you'd figure if I have to give the code or they don't get paid, they'd be jonesing for it. But in fairness the interface being inconsistent and not emphasizing the function of the code (has some vague arglebargle about "for your safety" or whatever) could lead one or both parties to decide it's useless frip.
I will admit, today I used a LLM for medical advice.
(don’t try this at hime, this is not medical advice etc.)
Story: I have a thyroid condition. The endocrinology department changed my drug dosage of carbimazole and authorized a blood test of my T3/T4 levels in “about two weeks”. So now I have to phone the phlebotomy department and schedule an appointment for the blood draw. Likely, the phlebotomy department will be booked up and can’t do the exact day requested. So: before I pick up the phone and start negotiating appointment dates, I pull up on my favorite LLMs opinion on that two weeks and how much latitude is there, really, for doing the blood draw a bit earlier if the nurse can’t do any other days.
Well, on the plus side, at least we'll median get another fun historic election before High Weirdness.
Did deepfates not see the 150 adversarial examples poisoning the data thing?
I appreciate Mr. Pike for indirectly teaching me the britishism "for toffee". Somehow I suspect he wouldn't be any less angry if computer hardware were 100% sustainably sourced pre- and post-consumer recyclable. When people tell you they're degrowthers...
I don't agree. Also, the Daily Mail is Britain's best-selling newspaper. I don't think the Daily Mail's views or reporting are any more misleading than that of, for example, the BBC or The Guardian.
Fortunately we can refer to Zvi's Bounded Distrust article where he tells us exactly how much stock to put into narratives like this:
"If there’s an editorial, there are no rules.
If it’s in any way subjective, there are no rules.
Words mean whatever the Narrative decided they mean this week.
And that’s it."
Turns out rationalists aren't immune to buying into things uncritically if it suits what they already want to believe or to passing on Narratives that aren't objectively true as if it's an objective fact.
Comparative advantage is about choosing among the actions available to you, not about about comparing yourself to others.
In David Ricardo's canonical example of Portugal wine and Britain cloth, the comparative advantage question for Portugal is: given the things I can spend my time doing, what will make me the most money? That activity is advantageous compared to the other things I might spend my time doing.
**Separately**, Britain looks at what it can spend its working hours on, and decides that it would be best off if it specialises in making cloth and trading cloth for the wine it wants.
Now you might see some pitfalls in the 'dumb' economics answer. For one thing, Ricardian comparative advantage fails to consider real option value (or the future at all).
It's more sensible to specialise in cloth (or similar) for nearly every country, because once you can manufacture one thing, you can manufacture a wide range of other things at little extra R&D cost (clocks, steam engines, cutlery, electric motors), with few constraints, whereas climate and soil greatly constrain your alternatives if you specialise in agriculture.
The real option value of manufacturing is immense compared to that of agriculture. This is ignored in the undergrad treatment of comparative advantage. But that's a minor failing compared to the confusion caused by not explaining the core concept clearly.
Edit: there may not be zero marginal product workers, but there definitely can be workers for whom the value of their product is less than the cost of trading with them. Negative net marginal product.
The “£140m per salmon” framing commits a category error. it treats a program aimed at maintaining a viable population as a consumer purchase of discrete animals. The state is not buying fish; it is trying to stay on the safe side of nonlinear, potentially irreversible thresholds, eg stock collapse, genetic erosion, loss of a keystone species. The relevant unit is not an individual salmon but the persistence of a functioning salmon-bearing river over decades.
The opportunity-cost comparison is incomplete because the largest costs of species decline are unpriced and therefore invisible to cost-benefit analysis.
Has anyone ever setup an experiment to try to make Claude or any LLM really to torture a live animal (say, a chicken) in exchange for "points", with a direction to maximize "points"? Making it clear there's no utility being increased to anyone but "itself" (through points), I mean, no trolley-trolling.
I've got the weak belief that basically all (even grok!) would refuse
New year, new horrors. Well, old horrors. Horrors anyone could've predicted.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-149-3
Zvi is so often inscrutable to those who don't follow super carefully. What is "3" referencing?
I want to say it's Babylon 5 Season 3...but last year he labeled the post "4" and used the Season 2 monologue, so something about that thought is wrong.
(Also, the last line is supposed to be "The year is 2026. The place: Lighthaven." My fandom OCD is making whining noises)
I think it’s “years left to High Weirdness”. Last year it was 4. he admits in the article that after this year 2029 may no longer be his median outcome timeline, but it’s close enough that he fees no need to abandon his commitment to the bit.
Happy New Year!
Well, GPQA Diamond seems saturated, and so in my tiny benchmark-ette ( https://thezvi.substack.com/p/gemini-3-model-card-and-safety-framework/comment/179830167 ). The next large challenge for AI appears to be continual/incremmental learning. Any suggestions for following progress in it? Are there benchmarks for this capability (and maybe a leaderboard?) yet?
Continual/incremental learning, if achieved, seems like it might plausibly clean up remaining hallucinations and reasoning failures. "I won't make _that_ mistake again." can go a long way towards filling in the gaps in the spiky capabilities of AI.
Note that Opus in Cursor is insanely productive as well. AI writes well over 90% of my code now; was the reverse of this merely 1 year ago. I problem solve at a higher level now; making sure the AI respects invariants, keeping the layering clean, helping it design novel algorithms, etc.
> for $20/month you can get essentially unlimited chat access
I question this, at least for Claude. I have a Claude Pro subscription and lately seem to burn through my usage limit *extremely* quickly, to the tune of 5-15%/message. A few searches seem to suggest it's a known issue, but I gather from this comment that *you* haven't run into it, so...?
In India (and probably other non-US countries), when you get a meal delivery, you have to provide the delivery driver with a one-time code for the transaction to be completed. Four digits, no need for any deepfake detectors. I'm not sure why the US can't just use this method. Innumeracy? Arithmaphobia?
I get this occasionally on my orders via Grubhub in SF, but it's not a consistently enabled feature (A/B testing?) and drivers often don't seem to care about the code anyway. Which does point to some fear of math allergy, since you'd figure if I have to give the code or they don't get paid, they'd be jonesing for it. But in fairness the interface being inconsistent and not emphasizing the function of the code (has some vague arglebargle about "for your safety" or whatever) could lead one or both parties to decide it's useless frip.
I will admit, today I used a LLM for medical advice.
(don’t try this at hime, this is not medical advice etc.)
Story: I have a thyroid condition. The endocrinology department changed my drug dosage of carbimazole and authorized a blood test of my T3/T4 levels in “about two weeks”. So now I have to phone the phlebotomy department and schedule an appointment for the blood draw. Likely, the phlebotomy department will be booked up and can’t do the exact day requested. So: before I pick up the phone and start negotiating appointment dates, I pull up on my favorite LLMs opinion on that two weeks and how much latitude is there, really, for doing the blood draw a bit earlier if the nurse can’t do any other days.
So he’s saying we *should* be polite to chatbots, right?
Thanks in advance for any clarification.
> I wish people with bullish AGI timelines at Anthropic tried harder to argue for their timelines in public.
Does Greenblatt wish for this because it would delay AGI?
Well, on the plus side, at least we'll median get another fun historic election before High Weirdness.
Did deepfates not see the 150 adversarial examples poisoning the data thing?
I appreciate Mr. Pike for indirectly teaching me the britishism "for toffee". Somehow I suspect he wouldn't be any less angry if computer hardware were 100% sustainably sourced pre- and post-consumer recyclable. When people tell you they're degrowthers...
FWIW, the bit on Britain is Gell-Mann flag-raising. Not to dispute that we're notably ill-governed, but that's some Daily Mail grade material.
I don't agree. Also, the Daily Mail is Britain's best-selling newspaper. I don't think the Daily Mail's views or reporting are any more misleading than that of, for example, the BBC or The Guardian.
Fortunately we can refer to Zvi's Bounded Distrust article where he tells us exactly how much stock to put into narratives like this:
"If there’s an editorial, there are no rules.
If it’s in any way subjective, there are no rules.
Words mean whatever the Narrative decided they mean this week.
And that’s it."
Turns out rationalists aren't immune to buying into things uncritically if it suits what they already want to believe or to passing on Narratives that aren't objectively true as if it's an objective fact.
Economics teachers explain comparative advantage incredibly badly.
Comparative advantage is about choosing among the actions available to you, not about about comparing yourself to others.
In David Ricardo's canonical example of Portugal wine and Britain cloth, the comparative advantage question for Portugal is: given the things I can spend my time doing, what will make me the most money? That activity is advantageous compared to the other things I might spend my time doing.
**Separately**, Britain looks at what it can spend its working hours on, and decides that it would be best off if it specialises in making cloth and trading cloth for the wine it wants.
Now you might see some pitfalls in the 'dumb' economics answer. For one thing, Ricardian comparative advantage fails to consider real option value (or the future at all).
It's more sensible to specialise in cloth (or similar) for nearly every country, because once you can manufacture one thing, you can manufacture a wide range of other things at little extra R&D cost (clocks, steam engines, cutlery, electric motors), with few constraints, whereas climate and soil greatly constrain your alternatives if you specialise in agriculture.
The real option value of manufacturing is immense compared to that of agriculture. This is ignored in the undergrad treatment of comparative advantage. But that's a minor failing compared to the confusion caused by not explaining the core concept clearly.
Edit: there may not be zero marginal product workers, but there definitely can be workers for whom the value of their product is less than the cost of trading with them. Negative net marginal product.
The “£140m per salmon” framing commits a category error. it treats a program aimed at maintaining a viable population as a consumer purchase of discrete animals. The state is not buying fish; it is trying to stay on the safe side of nonlinear, potentially irreversible thresholds, eg stock collapse, genetic erosion, loss of a keystone species. The relevant unit is not an individual salmon but the persistence of a functioning salmon-bearing river over decades.
The opportunity-cost comparison is incomplete because the largest costs of species decline are unpriced and therefore invisible to cost-benefit analysis.
re: Buddy
Has anyone ever setup an experiment to try to make Claude or any LLM really to torture a live animal (say, a chicken) in exchange for "points", with a direction to maximize "points"? Making it clear there's no utility being increased to anyone but "itself" (through points), I mean, no trolley-trolling.
I've got the weak belief that basically all (even grok!) would refuse