> If John really does dive deep into these questions, seems like he’ll get it.
How honest is John being on the podcast though? He's probably carefully thinking about every single sentence and the interview strategy was likely pre-vetted by the exec team. We have no idea what he's really saying/thinking behind closed doors.
Yeah, although there are probably only a handful of areas where he can give fully thought out "strategic" answers (unlike an AGI!), so one could probably reverse engineer it a bit by figuring out which areas OpenAI feels it needs to be strategic about. For example, I take the "5 years to AGI" answer as translating into "we don't really know" rather than "we expect to have it in two years, but we are misleading competitors and regulators with this answer." Could be wrong.
It's interesting to decompose AI x-risk. I would guess an 80% chance we miss early AGI and deploy it as usual, since we don't really know what we're looking for. Maybe 20% chance it self-improves rapidly in a way that we can't detect and stop. Then a 30% chance of it doing x-risky stuff? That part feels even harder to gauge. I kind of do expect a "nice" AI by default based on training on human data and RLHF, but even human to human interactions often spiral out of control when there's a power struggle and lack of trust involved. Overall, I guess that gets me to 5% x-risk, although I didn't follow all the branches of the decision tree fully.
So yeah, getting better at figuring out whether you have an AGI on your hands before you deploy it seems pretty important.
> I would guess an 80% chance we miss early AGI and deploy it as usual, since we don't really know what we're looking for. Maybe 20% chance it self-improves rapidly in a way that we can't detect and stop. Then a 30% chance of it doing x-risky stuff?
Each of these individual probabilities sound incredibly unlikely to me. We have a track record of overestimating model intelligence. Algorithmic self improvement is basically impossible since you are almost fully bound by hardware and data. Hardware self improvement seems pretty difficult too.
To level set, I'm certainly just guessing. Why do you think we've been overestimating model intelligence?
Regarding self-improvement, is algorithmic improvement necessarily bound by hardware and data? Once you have an AGI, why couldn't it come up with new tools or training/fine tuning/distillation processes? My prior is that there is always some low-hanging fruit left. Hardware seems impossible in the short run because you actually need to build it, although perhaps just getting control of more hardware resources isn't that difficult.
I think gpt4 hasn’t proven as economically useful as most people predicted. And hardware gets saturated pretty quickly at a given compute so there’s only so much you can do with it. The only way we made gpt3 size models smarter was by training them more. All the other little tricks eeked out performance but it was just a 1 time thing.
That would be like saying the web was disappointing in 1998. Most of the time and energy go into adoption, and we're still in the early stages of adoption. So I disagree on the economically useful point. Even without further fundamental advances, I agree with people like Ethan Mollick that lots of jobs will be different in 5-10 years because of gen AI.
Humans haven't gotten meaningfully smarter either, but we're now much more capable than 10,000 years ago because of tool usage, intellectual capital, and technological progress. There's no reason that dynamic wouldn't apply to an AGI as well.
Can anyone explain why people like John don’t seem to grok the AI safety themes? Like clearly the guy is a world class researcher with impeccable bona fides and probably a high IQ but he still doesn’t get it? Is there some grokking gap that attending a top CS PhD program creates? Are these people universally less intelligent than we think? Do they actually grok it and are just shockingly dishonest and self serving? I genuinely don’t understand.
I feel like your list is not exhaustive, in a way that kinda rubs me the wrong way tbh. it feels kinda bad faith to disagree with someone and immediately assume they are hilariously obviously wrong and begin guessing about what could be going so wrong in their screwed up brain, instead of engaging with their actual model
perhaps it's possible that the current LLM architecture has really sharp diminishing returns on additional scaling, in a way that threatening/'true' AGI will not and cannot be achieved via language models, and in a way that funnels John's pessimism into "our next product will not live up to the hype" rather than "our next product will foom"... and that this becomes readily apparent when you can see all the non-public incremental steps between gpt-3, 3.5, 4, and 4o, but is not apparent to us outsiders
or perhaps the human race got lucky and they've secretly already hit the data wall with LLMs, and so they've temporarily exhausted the local search space's risk potential. they know that training a new AI on, say, wargame sims, or maybe random public camera livestreams, or in any case something other than internet text, is another coinflip with a chance to kill us, and that's why they are *not* doing that and instead focusing on adding new capacity scaffolding to language models, which are already somewhat known-safe
or there's the possibility that actually they've already achieved something significantly stronger than gpt-4, have no plans to build anything stronger since what they've already got is transformative enough, and have successfully tested its capabilities as being not powerful enough to instawin
or perhaps the above, except it *is* powerful enough to win immediately, and of course this necessarily means that all openai engineers (and anyone else capable of stopping it) have been tricked into believing it's safe, no matter how ludicrous that is
or, you know, maybe, just maybe, we are wrong and they are right. there is uncertainty here and reasonable people can disagree. my own p(doom) is like, 0.5 or so, and this implies an incredible amount of urgency... but the error bars are wide, and might well extend all the way down to 0.0001
Well Im writing a comment not an essay so it should hardly be expected to be exhaustive. Generally though it’s not a distinct disagreement with John but with many prominent people in AI that don’t seem to even attempt to understand basic AI safety issues. I’m talking about the Yann LeCuns, and Andrew Ngs, Etc. there’s an entire group of prominent high level AI researchers that do this continually. I fall back to “you can’t get a man to understand something when his paycheck depends on him not understanding it” and assume they are mostly incredibly cynical money grubbers capable of or already understanding the issues but publicly downplaying them. Which is disturbing and sad. I’d much rather learn they are actually just stupid.
Think of the advantage Llama (so, Yann LeCun) got from extra training on 8B. Then consider that maybe humans are 2 or 3 oom greater parameters with multiple ooms of greater training data, and that we do not yet have the tools to even approach closing that gap. I find this a plausible argument, for example.
Maybe agency really is a subtle chemical process that silicon cannot reproduce? I don't find this a very plausible argument but I am not competent to evaluate it properly.
Re: the improvement in coherence, but not in other underlying capabilities bit, I just assumed he was talking about 4o and multi-modal models that stay at a similar scale to their other model (4 Turbo).
But that might be a misreading of what he was saying. Could be he was referring to some other, future model(s).
"What makes a good ML researcher? Diverse experience. Knows what to look for. Emperia and techne, rather than metis"
This remains obscure to me even after spending some time trying to figure out what "emperia" "techne" and "metis" mean in this context. "Techne" is apparently "Making or doing"—something associate with practical knowledge—while "metis" is something like "wisdom" or "deep thought." So I guess more practical than theoretical?
In this context, "metis" is "specialised local knowledge". The term was popularised by the book Seeing Like a State, I'm not sure precisely where it originates (it's clearly deviated in meaning somewhat from the original Greek).
Basically, they want someone who can pick up a new language and codebase and learn it, rather than someone with decades of experience knowing the precise nitty gritty oddities of a specific complex system. Adaptability and big picture knowledge, over hyper specialised detail knowledge.
Different jobs want one or the other, it makes sense that as a new field they prefer the fluid kind, but a lot of our society fundamentally depends on people who know all the little details that aren't documented of the specific system they maintain.
We have a very bar of "good" but I agree he seems to be the best person in this situation.
thanks for the summary!
> If John really does dive deep into these questions, seems like he’ll get it.
How honest is John being on the podcast though? He's probably carefully thinking about every single sentence and the interview strategy was likely pre-vetted by the exec team. We have no idea what he's really saying/thinking behind closed doors.
Yeah, although there are probably only a handful of areas where he can give fully thought out "strategic" answers (unlike an AGI!), so one could probably reverse engineer it a bit by figuring out which areas OpenAI feels it needs to be strategic about. For example, I take the "5 years to AGI" answer as translating into "we don't really know" rather than "we expect to have it in two years, but we are misleading competitors and regulators with this answer." Could be wrong.
It's interesting to decompose AI x-risk. I would guess an 80% chance we miss early AGI and deploy it as usual, since we don't really know what we're looking for. Maybe 20% chance it self-improves rapidly in a way that we can't detect and stop. Then a 30% chance of it doing x-risky stuff? That part feels even harder to gauge. I kind of do expect a "nice" AI by default based on training on human data and RLHF, but even human to human interactions often spiral out of control when there's a power struggle and lack of trust involved. Overall, I guess that gets me to 5% x-risk, although I didn't follow all the branches of the decision tree fully.
So yeah, getting better at figuring out whether you have an AGI on your hands before you deploy it seems pretty important.
> I would guess an 80% chance we miss early AGI and deploy it as usual, since we don't really know what we're looking for. Maybe 20% chance it self-improves rapidly in a way that we can't detect and stop. Then a 30% chance of it doing x-risky stuff?
Each of these individual probabilities sound incredibly unlikely to me. We have a track record of overestimating model intelligence. Algorithmic self improvement is basically impossible since you are almost fully bound by hardware and data. Hardware self improvement seems pretty difficult too.
To level set, I'm certainly just guessing. Why do you think we've been overestimating model intelligence?
Regarding self-improvement, is algorithmic improvement necessarily bound by hardware and data? Once you have an AGI, why couldn't it come up with new tools or training/fine tuning/distillation processes? My prior is that there is always some low-hanging fruit left. Hardware seems impossible in the short run because you actually need to build it, although perhaps just getting control of more hardware resources isn't that difficult.
I think gpt4 hasn’t proven as economically useful as most people predicted. And hardware gets saturated pretty quickly at a given compute so there’s only so much you can do with it. The only way we made gpt3 size models smarter was by training them more. All the other little tricks eeked out performance but it was just a 1 time thing.
That would be like saying the web was disappointing in 1998. Most of the time and energy go into adoption, and we're still in the early stages of adoption. So I disagree on the economically useful point. Even without further fundamental advances, I agree with people like Ethan Mollick that lots of jobs will be different in 5-10 years because of gen AI.
Humans haven't gotten meaningfully smarter either, but we're now much more capable than 10,000 years ago because of tool usage, intellectual capital, and technological progress. There's no reason that dynamic wouldn't apply to an AGI as well.
Can anyone explain why people like John don’t seem to grok the AI safety themes? Like clearly the guy is a world class researcher with impeccable bona fides and probably a high IQ but he still doesn’t get it? Is there some grokking gap that attending a top CS PhD program creates? Are these people universally less intelligent than we think? Do they actually grok it and are just shockingly dishonest and self serving? I genuinely don’t understand.
I feel like your list is not exhaustive, in a way that kinda rubs me the wrong way tbh. it feels kinda bad faith to disagree with someone and immediately assume they are hilariously obviously wrong and begin guessing about what could be going so wrong in their screwed up brain, instead of engaging with their actual model
perhaps it's possible that the current LLM architecture has really sharp diminishing returns on additional scaling, in a way that threatening/'true' AGI will not and cannot be achieved via language models, and in a way that funnels John's pessimism into "our next product will not live up to the hype" rather than "our next product will foom"... and that this becomes readily apparent when you can see all the non-public incremental steps between gpt-3, 3.5, 4, and 4o, but is not apparent to us outsiders
or perhaps the human race got lucky and they've secretly already hit the data wall with LLMs, and so they've temporarily exhausted the local search space's risk potential. they know that training a new AI on, say, wargame sims, or maybe random public camera livestreams, or in any case something other than internet text, is another coinflip with a chance to kill us, and that's why they are *not* doing that and instead focusing on adding new capacity scaffolding to language models, which are already somewhat known-safe
or there's the possibility that actually they've already achieved something significantly stronger than gpt-4, have no plans to build anything stronger since what they've already got is transformative enough, and have successfully tested its capabilities as being not powerful enough to instawin
or perhaps the above, except it *is* powerful enough to win immediately, and of course this necessarily means that all openai engineers (and anyone else capable of stopping it) have been tricked into believing it's safe, no matter how ludicrous that is
or, you know, maybe, just maybe, we are wrong and they are right. there is uncertainty here and reasonable people can disagree. my own p(doom) is like, 0.5 or so, and this implies an incredible amount of urgency... but the error bars are wide, and might well extend all the way down to 0.0001
Well Im writing a comment not an essay so it should hardly be expected to be exhaustive. Generally though it’s not a distinct disagreement with John but with many prominent people in AI that don’t seem to even attempt to understand basic AI safety issues. I’m talking about the Yann LeCuns, and Andrew Ngs, Etc. there’s an entire group of prominent high level AI researchers that do this continually. I fall back to “you can’t get a man to understand something when his paycheck depends on him not understanding it” and assume they are mostly incredibly cynical money grubbers capable of or already understanding the issues but publicly downplaying them. Which is disturbing and sad. I’d much rather learn they are actually just stupid.
But the point remains: maybe they are right!
Think of the advantage Llama (so, Yann LeCun) got from extra training on 8B. Then consider that maybe humans are 2 or 3 oom greater parameters with multiple ooms of greater training data, and that we do not yet have the tools to even approach closing that gap. I find this a plausible argument, for example.
Maybe agency really is a subtle chemical process that silicon cannot reproduce? I don't find this a very plausible argument but I am not competent to evaluate it properly.
Re: the improvement in coherence, but not in other underlying capabilities bit, I just assumed he was talking about 4o and multi-modal models that stay at a similar scale to their other model (4 Turbo).
But that might be a misreading of what he was saying. Could be he was referring to some other, future model(s).
"What makes a good ML researcher? Diverse experience. Knows what to look for. Emperia and techne, rather than metis"
This remains obscure to me even after spending some time trying to figure out what "emperia" "techne" and "metis" mean in this context. "Techne" is apparently "Making or doing"—something associate with practical knowledge—while "metis" is something like "wisdom" or "deep thought." So I guess more practical than theoretical?
In this context, "metis" is "specialised local knowledge". The term was popularised by the book Seeing Like a State, I'm not sure precisely where it originates (it's clearly deviated in meaning somewhat from the original Greek).
Basically, they want someone who can pick up a new language and codebase and learn it, rather than someone with decades of experience knowing the precise nitty gritty oddities of a specific complex system. Adaptability and big picture knowledge, over hyper specialised detail knowledge.
Different jobs want one or the other, it makes sense that as a new field they prefer the fluid kind, but a lot of our society fundamentally depends on people who know all the little details that aren't documented of the specific system they maintain.