> Be checking email for a particular subtopic (predictions for 2025 thread)
> See Zvi's email pop into inbox
Zvi.
You madman.
Respect.
(I'll read this later; #CantStopTheWork. I'd love to interview you for my podcast, or just hear more how you keep up the INSANE PACE of work that you do. You're incredibly organized and I just
well I'd like to learn from ya, Zvi. Cheers, friend.)
Which of these stories do you think are the strongest? I read "Echoes of Genesis" and "In the Wake of Old Waters".
To speak honestly, I find LLM fiction to be extremely poor, and o1 pro no better than regular GPT4-o (which seems to be what OpenAI's internal benchmarks found, too). If that's all you're using o1 for, save your money.
Most AI-generated stories fundamentally don't *feel* like stories; they're a sequence of events, described one after the next, with the emotional engagement of a police report. "This happens. Then this happens. Then something else happens." Characters are drawn in the most generic and banal cliches. The "theme" is invariably some heavy-handed moral lesson, loudly preached at the reader ("Nahiro people would carry forward this lesson: that knowledge must coexist with compassion, and that the echo of old waters could guide them toward a future both bold and humane.")
Maybe I'm being harsher because I know they're AI-generated, but to be honest, I don't think so. There's a storytelling spark that AI fiction just doesn't appear to have. I used to read fanfic written by literal children (with spelling mistakes everywhere) that was nevertheless gripping and absorbing and made me care. This just...doesn't do those things. It's like trudging through a dry desert made of words.
You'd expect o1's "overthinking" to offer *some* advantages over just blurting out text, like a better structure. But it's still messy and contradictory on a technical plot level. (At the start of "In the Wake" Rhea is "only about five cycles pregnant", but later she's described as being in her third cycle. Dr Virgil's methods are described as non-invasive yet everyone in the story acts like she's trying to kill the baby.)
Thanks for the comment. Good questions. I'm not sure which story is strongest! I like Bartered Reflections. As for the stories being very plot-driven: in some sense I think this is inherent to the format. Short stories are necessarily plot driven, relative to novels, which have more room for narrative exposition. That said, maybe it's possible to generate less plot-driven output from o1 pro? I haven't tried so I am not sure. So I am not sure whether being plot-driven is a limitation inherent to o1 pro. O1 pro can't as far as I am aware, generate novel-length fiction, nor can any other model. At least not yet. As the saying goes, today's AI is the worst you'll ever use: perhaps future models (o3?) will generate novel-length fiction.
Re the death of the ex openAI employee, I’m not clear on what one does if one thinks someone was assassinated. Try to get the police to spend resources on double checking the case? How did they come to different conclusions?
Escalate to the next layer, state attorney general, or FBI.
In this case, his mother hired a PI to investigate independently. So then that information could be either publicly released on shared with certain news agencies. Although this is mostly an indirect way to escalate to the next layer... i.e. to get the attorney general or FBI to do an investigation.
> Janus and Eliezer Yudkowsky remind us that in science fiction stories, things that express themselves like Claude currently does are treated as being of moral concern.
Not in Blindsight they aren't.
Regardless, science fiction writers are simply among the everyone else who couldn't properly imagine a system that could output human appearing text via non-human internal processes.
Or - no matter how many times Janet tells you she doesn't have feelings people will still trick themselves with the anthropomorphized exterior: https://www.youtube.com/watch?v=etJ6RmMPGko.
In a practical sense, though, what are we doing with current LLMs that qualifies as immoral? In what ways should we treat them differently to treat them as being of moral concern?
However, it's pretty clear that Claude isn't conscious in this sense. It has nothing like a working memory, just a context window which seems more analogous to an animal's sense impressipns. For o3 who knows.
Interesting, thanks, but I think it's flawed. This conclusion specifically from the article seems reductive:
"It seems that everything that goes into your memory gets there by going through conscious experience. And that's the reason you can't have a creature without consciousness and expect it to interact productively with the world."
It seems to hold true for the biological intelligences that we know about but that doesn't mean that it must hold true for different types of intelligence architectures. Consciousness may be required to produce some result X in some architectures but there may be many ways to achieve X without it in different setups.
This is making the same mistake as taking the output of an LLM, noticing that it sounds similar to outputs produced by humans, and inferring that there must be an internal process to the LLM that's meaningfully similar to the human process. This sometimes results in people going a step further and asserting that we should treat LLMs with some unspecified form of moral concern.
> It’s so strange to me that the obvious solution (charge a small amount of money for applications, return it with a bonus if you get past the early filters) has not yet been tried.
This seems like the kind of thing that might work if normalized, and with the money held in escrow, but it's hard to get from the current world over to that one. Today an employer that asks for money to process an application is going to be perceived as a scammer by most applicants, and probably also featured in the NYT as "taking advantage" of poor workers who can't afford to apply for a job. Many companies are reluctant even to require coding tests, because the best experienced candidates don't want to waste their time with that stuff.
Instead I predict we'll see less legible, more unfair behaviors like greater reliance on referrals, social networks, and recruiters reaching out directly to candidates.
If a company wants to try something like this, I would suggest having an option to apply in person or by mail instead of making a deposit (or some other way which is inconvenient-but-free), which at least offers the appearance of equability. This is actually what the post office does with change-of-address verification.
It would be easier for a hiring manager to trial this in a year when the company doesn't need many new hires.
Using money from filter-failers to pay the bonus to filter-passers could be bad PR even if it doesn't seem scam-like. To mitigate this impression, I would donate the filter-failers' money to charity (and give them a receipt for 'their' donation), then pay the filter-passers their bonuses separately.
On AI and capital/resources, I genuinely just don't understand the position Zvi appears to be endorsing here. Perhaps there is a standard argument for this somewhere that I just haven't seen?
The position seems to be that super-intelligence somehow magically leads to super-abundance; that if only AI is "smart" enough then physical constraints like scarce resources stop binding?
I can imagine how you hand-wave an argument about how this might happen *eventually* -- that is, ASI will figure out a way to mine asteroids (or other planets) for rare elements, build Dyson spheres for energy, etc. But even if that's the long-run plan, there are going to be resources that are scarce on earth in the meantime. And the resources you would need to set up an asteroid mining operation or build a Dyson sphere are very much the same ones you need to build things to provide for humanity. How, then, within the medium term (which appears to be the target given the reference to people who are alive right now saving money) do you get to a point of functionally unlimited abundance?
And, of course, that sets aside the fact that human desires seem to scale pretty directly in proportion to our productive capacity. An upper middle class American today has super-abundant resources in comparison to the vast majority of humans ever to live, but they don't feel that way or live that way.
The way many people are talking about the economic consequences of AI in the early stages of the coming information revolution really reminds me of the early Marxists -- this vague sensibility that technological change will *somehow* produce a big and positive political economic shift without any real attention to how that will happen or what the intervening steps might look like. In fact, it's even the *same* prediction (the rough orthodox Marxist position is that someday we'd end up in a post-scarcity society as the result of technology and then we'd live in a utopia where everyone's needs are met). Then the Bolsheviks and the Maoists showed up and proved that it matters a great deal *how* those changes happen (and not in a good way).
This strikes me as staggeringly ill-informed. Perhaps I'm not imaginative enough, but I do think that your prediction has to find some way to draw a line between the present and the future. And the present is that AGI is being built under a capitalist system by companies that are aiming to make money building it (and the only exception is -- as noted above -- working hard to adopt that model). And what "alignment" really means in any kind of practical sense is that the AI gets the values given to it by its creators -- which means that any AGI built on the current pathway is going to have "capitalist" values in its DNA and that will guide all that comes later.
Caveat: I don't know if this is Zvi's view at all. This is mostly my view.
I tend to believe that despite quite a few eyeballs on various problems, when a smart-enough AI is able to read all research, it will be able to:
* Identify low-hanging fruit that we haven't pieced together.
* Create additional cohesive theories in many domains of knowledge.
* More rapidly solve existing technical challenges.
* These together will bring about rapid technological change and reduce costs across a broad range of activities.
For example, we have lots of medical research and knowledge, but I'd guess there is a much deeper understanding that an AI could get, that would lead to cures for many diseases, or simply better general health. This in itself could free up large amounts of capital for other purposes. What would I do if I could get away with spending 1/10 the dollars on medical insurance? (And that would be in addition to just spending less on Medicare and thus taxes.)
> Right now, yes, humans are addicted to TikTok and related offerings, but they are fully aware of this, and could take a step back and decide not to be.
This seems to undersell the existing concerns around algorithmic content and media addiction.
I think a better model is: most people are not fully self-aware, and most people don’t have the willpower to self-modify addictive habits. You just need to look at the obesity or opiate epidemics for clear evidence here.
It seems more likely to me that content addiction will become an increasingly important issue.
Agreed. If they could decide not to be, why haven't they done it already? Nobody ever thinks "I'll spend the next half hour scrolling slop instead of doing something meaningful", they open the app out of habit/compulsion and it inevitably spirals from there. The advances (hate to use the word) in algorithms and app design have made the compulsion worse, and I don't see a compelling reason why AI-ifying them will lead to anything other than the default conclusion of more slop consumption.
Because integrating AI models into current business processes is way harder than making new ones. Businesses are slow and you can't really just take an AI model and make it do useful work, a lot of human tinkering is still required, and there aren't many people qualified to do that in most companies. Meanwhile AI labs already have everything they need to train stronger and stronger models.
Also, current AI models don't really offer a big qualitative change like what printing press or the Internet did. They don't let you do anything you weren't able to before, they just let you do it for cheaper by removing humans out of the equation. And in current economic landscape all that does is increase unemployment and the company's profits, but not its useful economic output.
I find it weird that 7 and 8 were included in the bet at all. I don't think they're necessary or sufficient for AGI or even ASI. Literary work is not something frontier labs seem to be working on or particularly care about, aside from "write a poem about..." prompts, where it's more like another kind of language task you can benchmark. The economic incentives are to make serviceable prose that complies with the content policy. That is not going to be a recipe for top quality prose writing, pretty much by definition. It seems perfectly consistent that you could have a system which is superhuman in science, engineering, medicine, law, business and many other fields which writes serviceable prose that complies with the content policy. Maybe brilliant writing will come about as some emergent property, but it seems orthogonal to the important questions.
Separately, I think 10 (automatically formalizing math from human inputs) is both almost certainly feasible in the next three years and a huge deal for automated takeoff of capabilities. It's probably 90+% of the way to 9 (Nobel-level work), assuming Nobel-level includes the Turing Award of Abel Prize. If you can get it to the point where it can fill in and correct gaps in human proofs (formalization is hard because these are always slightly incorrect or incomplete), then it will be able to do that on LLM-generated proofs. Even if those are by default much worse than human mathematician proofs, quantity will have a quality of its own. Do tree search/mega-CoT/annealing/whatever to make creative output, convert to formally-checked outputs and feed successful outputs back into RL. Completely automated formal proofs will allow for much faster software (proving compiler optimizations, finding new classes of compiler optimizations, removing redundant checks, finding faster algorithms), more optimally-laid-out computer chips with higher fabrication yields, etc. It's also plausible infinite math could lead to breakthroughs in physics, most relevantly condensed matter physics and quantum computing that would accelerate things even more.
> Be me
> Be working on my own blog update over @ klob.substack.com
> Be checking email for a particular subtopic (predictions for 2025 thread)
> See Zvi's email pop into inbox
Zvi.
You madman.
Respect.
(I'll read this later; #CantStopTheWork. I'd love to interview you for my podcast, or just hear more how you keep up the INSANE PACE of work that you do. You're incredibly organized and I just
well I'd like to learn from ya, Zvi. Cheers, friend.)
- klob // Caleb Oscar Ditchfield.
P.S. - #first.
Re o1 pro and fiction: I have experimented with this. Some good output, some slop. https://open.substack.com/pub/aiscifi
Which of these stories do you think are the strongest? I read "Echoes of Genesis" and "In the Wake of Old Waters".
To speak honestly, I find LLM fiction to be extremely poor, and o1 pro no better than regular GPT4-o (which seems to be what OpenAI's internal benchmarks found, too). If that's all you're using o1 for, save your money.
Most AI-generated stories fundamentally don't *feel* like stories; they're a sequence of events, described one after the next, with the emotional engagement of a police report. "This happens. Then this happens. Then something else happens." Characters are drawn in the most generic and banal cliches. The "theme" is invariably some heavy-handed moral lesson, loudly preached at the reader ("Nahiro people would carry forward this lesson: that knowledge must coexist with compassion, and that the echo of old waters could guide them toward a future both bold and humane.")
Maybe I'm being harsher because I know they're AI-generated, but to be honest, I don't think so. There's a storytelling spark that AI fiction just doesn't appear to have. I used to read fanfic written by literal children (with spelling mistakes everywhere) that was nevertheless gripping and absorbing and made me care. This just...doesn't do those things. It's like trudging through a dry desert made of words.
You'd expect o1's "overthinking" to offer *some* advantages over just blurting out text, like a better structure. But it's still messy and contradictory on a technical plot level. (At the start of "In the Wake" Rhea is "only about five cycles pregnant", but later she's described as being in her third cycle. Dr Virgil's methods are described as non-invasive yet everyone in the story acts like she's trying to kill the baby.)
Thanks for the comment. Good questions. I'm not sure which story is strongest! I like Bartered Reflections. As for the stories being very plot-driven: in some sense I think this is inherent to the format. Short stories are necessarily plot driven, relative to novels, which have more room for narrative exposition. That said, maybe it's possible to generate less plot-driven output from o1 pro? I haven't tried so I am not sure. So I am not sure whether being plot-driven is a limitation inherent to o1 pro. O1 pro can't as far as I am aware, generate novel-length fiction, nor can any other model. At least not yet. As the saying goes, today's AI is the worst you'll ever use: perhaps future models (o3?) will generate novel-length fiction.
Just never ask what happened to Lighthaven 4.
Typo: "Correctly realize that no, there is no Encarto 2." - Encanto
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-97-4?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
Loved the intro reference. Thank you.
Re the death of the ex openAI employee, I’m not clear on what one does if one thinks someone was assassinated. Try to get the police to spend resources on double checking the case? How did they come to different conclusions?
Escalate to the next layer, state attorney general, or FBI.
In this case, his mother hired a PI to investigate independently. So then that information could be either publicly released on shared with certain news agencies. Although this is mostly an indirect way to escalate to the next layer... i.e. to get the attorney general or FBI to do an investigation.
"LLMs can potentially fix algorithmic feeds on the user end, build this please thanks.
Otherwise I’ll have to, and that might take a whole week to MVP. Maybe two." Yes, but there is augmentation to it called Communication Currency that would make it work better, a tool for your personal AI to use with other Personal AIs https://www.dropbox.com/scl/fi/mvranm2d7ece6yp08rmth/Contact-Credits.pdf?rlkey=7twdum8cecqz5m4r1l70bk78k&dl=0
OH, and ZVI I just resent you what I hope is the favorite email, ever!
> Janus and Eliezer Yudkowsky remind us that in science fiction stories, things that express themselves like Claude currently does are treated as being of moral concern.
Not in Blindsight they aren't.
Regardless, science fiction writers are simply among the everyone else who couldn't properly imagine a system that could output human appearing text via non-human internal processes.
Or - no matter how many times Janet tells you she doesn't have feelings people will still trick themselves with the anthropomorphized exterior: https://www.youtube.com/watch?v=etJ6RmMPGko.
In a practical sense, though, what are we doing with current LLMs that qualifies as immoral? In what ways should we treat them differently to treat them as being of moral concern?
Eh, Blindsight misunderstood some of the science involved. https://hopefullyintersting.blogspot.com/2019/11/the-limitations-of-blindsight.html
However, it's pretty clear that Claude isn't conscious in this sense. It has nothing like a working memory, just a context window which seems more analogous to an animal's sense impressipns. For o3 who knows.
Interesting, thanks, but I think it's flawed. This conclusion specifically from the article seems reductive:
"It seems that everything that goes into your memory gets there by going through conscious experience. And that's the reason you can't have a creature without consciousness and expect it to interact productively with the world."
It seems to hold true for the biological intelligences that we know about but that doesn't mean that it must hold true for different types of intelligence architectures. Consciousness may be required to produce some result X in some architectures but there may be many ways to achieve X without it in different setups.
This is making the same mistake as taking the output of an LLM, noticing that it sounds similar to outputs produced by humans, and inferring that there must be an internal process to the LLM that's meaningfully similar to the human process. This sometimes results in people going a step further and asserting that we should treat LLMs with some unspecified form of moral concern.
> It’s so strange to me that the obvious solution (charge a small amount of money for applications, return it with a bonus if you get past the early filters) has not yet been tried.
This seems like the kind of thing that might work if normalized, and with the money held in escrow, but it's hard to get from the current world over to that one. Today an employer that asks for money to process an application is going to be perceived as a scammer by most applicants, and probably also featured in the NYT as "taking advantage" of poor workers who can't afford to apply for a job. Many companies are reluctant even to require coding tests, because the best experienced candidates don't want to waste their time with that stuff.
Instead I predict we'll see less legible, more unfair behaviors like greater reliance on referrals, social networks, and recruiters reaching out directly to candidates.
If a company wants to try something like this, I would suggest having an option to apply in person or by mail instead of making a deposit (or some other way which is inconvenient-but-free), which at least offers the appearance of equability. This is actually what the post office does with change-of-address verification.
I had other thoughts about that proposal:
It would be easier for a hiring manager to trial this in a year when the company doesn't need many new hires.
Using money from filter-failers to pay the bonus to filter-passers could be bad PR even if it doesn't seem scam-like. To mitigate this impression, I would donate the filter-failers' money to charity (and give them a receipt for 'their' donation), then pay the filter-passers their bonuses separately.
With all due respect to the bettors
> Can AI do 8 of these 10 by the end of 2027?
9 is the only one that matters:
> With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.
The rest is wordcel stuff that the LLMs are well suited for already, and just need more scaling/training.
So, 8 out 10 without 9 in the mix is meaningless.
On AI and capital/resources, I genuinely just don't understand the position Zvi appears to be endorsing here. Perhaps there is a standard argument for this somewhere that I just haven't seen?
The position seems to be that super-intelligence somehow magically leads to super-abundance; that if only AI is "smart" enough then physical constraints like scarce resources stop binding?
I can imagine how you hand-wave an argument about how this might happen *eventually* -- that is, ASI will figure out a way to mine asteroids (or other planets) for rare elements, build Dyson spheres for energy, etc. But even if that's the long-run plan, there are going to be resources that are scarce on earth in the meantime. And the resources you would need to set up an asteroid mining operation or build a Dyson sphere are very much the same ones you need to build things to provide for humanity. How, then, within the medium term (which appears to be the target given the reference to people who are alive right now saving money) do you get to a point of functionally unlimited abundance?
And, of course, that sets aside the fact that human desires seem to scale pretty directly in proportion to our productive capacity. An upper middle class American today has super-abundant resources in comparison to the vast majority of humans ever to live, but they don't feel that way or live that way.
The way many people are talking about the economic consequences of AI in the early stages of the coming information revolution really reminds me of the early Marxists -- this vague sensibility that technological change will *somehow* produce a big and positive political economic shift without any real attention to how that will happen or what the intervening steps might look like. In fact, it's even the *same* prediction (the rough orthodox Marxist position is that someday we'd end up in a post-scarcity society as the result of technology and then we'd live in a utopia where everyone's needs are met). Then the Bolsheviks and the Maoists showed up and proved that it matters a great deal *how* those changes happen (and not in a good way).
This strikes me as staggeringly ill-informed. Perhaps I'm not imaginative enough, but I do think that your prediction has to find some way to draw a line between the present and the future. And the present is that AGI is being built under a capitalist system by companies that are aiming to make money building it (and the only exception is -- as noted above -- working hard to adopt that model). And what "alignment" really means in any kind of practical sense is that the AI gets the values given to it by its creators -- which means that any AGI built on the current pathway is going to have "capitalist" values in its DNA and that will guide all that comes later.
Caveat: I don't know if this is Zvi's view at all. This is mostly my view.
I tend to believe that despite quite a few eyeballs on various problems, when a smart-enough AI is able to read all research, it will be able to:
* Identify low-hanging fruit that we haven't pieced together.
* Create additional cohesive theories in many domains of knowledge.
* More rapidly solve existing technical challenges.
* These together will bring about rapid technological change and reduce costs across a broad range of activities.
For example, we have lots of medical research and knowledge, but I'd guess there is a much deeper understanding that an AI could get, that would lead to cures for many diseases, or simply better general health. This in itself could free up large amounts of capital for other purposes. What would I do if I could get away with spending 1/10 the dollars on medical insurance? (And that would be in addition to just spending less on Medicare and thus taxes.)
> Right now, yes, humans are addicted to TikTok and related offerings, but they are fully aware of this, and could take a step back and decide not to be.
This seems to undersell the existing concerns around algorithmic content and media addiction.
I think a better model is: most people are not fully self-aware, and most people don’t have the willpower to self-modify addictive habits. You just need to look at the obesity or opiate epidemics for clear evidence here.
It seems more likely to me that content addiction will become an increasingly important issue.
Agreed. If they could decide not to be, why haven't they done it already? Nobody ever thinks "I'll spend the next half hour scrolling slop instead of doing something meaningful", they open the app out of habit/compulsion and it inevitably spirals from there. The advances (hate to use the word) in algorithms and app design have made the compulsion worse, and I don't see a compelling reason why AI-ifying them will lead to anything other than the default conclusion of more slop consumption.
Why is economic growth not changed by current AI models?
Because integrating AI models into current business processes is way harder than making new ones. Businesses are slow and you can't really just take an AI model and make it do useful work, a lot of human tinkering is still required, and there aren't many people qualified to do that in most companies. Meanwhile AI labs already have everything they need to train stronger and stronger models.
Also, current AI models don't really offer a big qualitative change like what printing press or the Internet did. They don't let you do anything you weren't able to before, they just let you do it for cheaper by removing humans out of the equation. And in current economic landscape all that does is increase unemployment and the company's profits, but not its useful economic output.
that's not the reason
I find it weird that 7 and 8 were included in the bet at all. I don't think they're necessary or sufficient for AGI or even ASI. Literary work is not something frontier labs seem to be working on or particularly care about, aside from "write a poem about..." prompts, where it's more like another kind of language task you can benchmark. The economic incentives are to make serviceable prose that complies with the content policy. That is not going to be a recipe for top quality prose writing, pretty much by definition. It seems perfectly consistent that you could have a system which is superhuman in science, engineering, medicine, law, business and many other fields which writes serviceable prose that complies with the content policy. Maybe brilliant writing will come about as some emergent property, but it seems orthogonal to the important questions.
Separately, I think 10 (automatically formalizing math from human inputs) is both almost certainly feasible in the next three years and a huge deal for automated takeoff of capabilities. It's probably 90+% of the way to 9 (Nobel-level work), assuming Nobel-level includes the Turing Award of Abel Prize. If you can get it to the point where it can fill in and correct gaps in human proofs (formalization is hard because these are always slightly incorrect or incomplete), then it will be able to do that on LLM-generated proofs. Even if those are by default much worse than human mathematician proofs, quantity will have a quality of its own. Do tree search/mega-CoT/annealing/whatever to make creative output, convert to formally-checked outputs and feed successful outputs back into RL. Completely automated formal proofs will allow for much faster software (proving compiler optimizations, finding new classes of compiler optimizations, removing redundant checks, finding faster algorithms), more optimally-laid-out computer chips with higher fabrication yields, etc. It's also plausible infinite math could lead to breakthroughs in physics, most relevantly condensed matter physics and quantum computing that would accelerate things even more.