Read the following comment as the ML equivalent of an "in mice" reply to bio/medicine hype posting:
#15. Some context I think the explanation misses: There have been _tons_ of papers (starting right after BERT blew up) proposing ways of extending transformer context length. Just to name a few, you've got:
1. Transformer-XL
2. The Compressive transformer
3. ReFormers
4. State space models
5. Approximately a million different efficient attention variants which reduce the cost which Pete's explanation discusses
The thing is, none of these have replaced good old fashioned (basically-)dense attention for large model pretraining yet, and I don't think the experiments in this paper establish that it will be the one to do so. It might be, but the question is always just "should you spend your flops on bigger model/more data or longer context".
Re: Superforecasters, I _really_ want to see the question wording since I feel like if it included: "we do AlphaFold type stuff for biology research" even the most AI-detractor types would have >1%. Maybe it was about some very specific scenario?
The primary issue is that we have no objective means to determine how likely any of the various necessary conditions (AI can bootstrap, AI will be destructive, AI will be able to overcome humans, etc.) are, or what it would take for them to exist. So you can say 1% or 5% or 50% or 0.00001% and there's no way to debate it. That many people seem to think it's >5% doesn't make that true. Many more people believe that Jesus will return with much higher than 5% chance, and the term Pascal's Mugging was invented about those people saying it (simplifying a much longer conversation, assuming the readers here are aware of the longer conversation).
If you can provide a formula to determine the chances of a world-ending AI scenario that's objective and reasonable - rather than speculative and subjective as with the current models - then we can get away from the Pascal situation. Otherwise you need an objective mathematical model to determine the chances of Jesus returning, in order to differentiate between the two scenarios. If you have no objective method to differentiate, then you have done nothing to overcome the criticism.
One seemingly important thing I never see talked about in FOOM debates: What if the difficulty of increasing intelligence increases faster than the increase in intelligence?
Obviously no one knows the answer, but FOOM _requires_ that difficulty of making yourself smarter increases more slowly than the intelligence gains, and it is not obvious to me that this _must_ be the case. It _might_ be the case but I can easily imagine that this is not in fact true and it _might_ be the case that increases in intelligence are inherently self limiting in the sense that it takes longer and longer and more and more intellectual work to make the next advance (even when accounting for the fact that you are now smarter).
Since no one can _know_ if this is true or not, I'm not arguing that we should depend on it, or that we shouldn't be concerned about existential risk, but I think that the possibility that this is true _does_ put a limit on how confident we should be in P(Doom).
I was thinking about the same thing recently. Assuming each advancement is more difficult than the last seems non-controversial. FOOM posits that the AI gets better at solving the next stage even faster than the difficulty grows, but there's no reason for us to think that's true. Diminishing returns should actually encourage us to think otherwise. And that's if the AI can both develop *and* implement the changes with increasing speed. If there are hardware requirements instead of software, then there may be significant bottlenecks while new production facilities and processes are developed.
The EU stuff makes me sad. It seems increasingly likely that we'll regulate away anything fun or useful and live boring lives superficially safe but with horrible downsides that are not obvious. Worse, in enough time, our descendants will forget this stuff was even possible, or at least regard it as wildly unsafe and immoral.
I agree with the mood, but in fact the world is larger than the EU, so the outcome is that the EU becomes the world's retirement village and holiday home whilst everything real happens elsewhere.
I'm not as confident. Safety-ism has ways of spreading; it looks better on paper and people can get all offended if you want to, say, use fireworks or gather in large crowds. I mean, I get why - they have an anxiety response that is overwhelming and assume that everyone else should have the same response - but I think they tend to win in the end.
If the Dems were in power for the long term, I would be more worried. But leaving aside the extreme left, I don't see the same determination to commit cultural suicide in America, Asia or Africa.
> Brian: He has a point it did this to my D&D stuff too, asked it to put together 3 encounters for a party of 4 level 7 characters and it was like nope.
For what it's worth, I tried 5 times with GPT4 and got good answers every time. (I'm using the API, so maybe it's different in the playground? Or maybe it's only 3.5 that has this restriction? Maybe a different, less-generic system prompt? Didn't bother exploring.) My prompt was
> I'm running a D&D campaign. Can you come up with 3 encounters for a party of 4 level 7 characters?
I feel bad for Eliezer. He has taken it upon himself to explain to everyone why they’re wrong. This is incredibly valuable work. Someone has to do it. Unfortunately that means interacting with the absolute dregs of intellectual society. I can’t blame him for being smug and arrogant after dealing with objections like this. I never would have expected anyone to make the mistake that Rohit did.
Curious on people's thoughts about AI winter likelihood over the next couple years. I am pretty bullish on transformative AI over the next few decades but my short-run guesses have gotten a bit more conservative -- I feel like the GPT-3.5 -> GPT-4 differences are a bit more subtle than I would have expected, it seems like Sam Altman is pretty confident we are done scaling for a little bit and I do see some mundane utility but not much, although I'm sure that will change over a months-years timespan even with current capabilities (but maybe not rapidly). Certainly we can all agree that the hype cycle that went "prepare for every week to get weirder from here on out" was wrong, right?
It doesn't look like further incremental advancements would be very useful at the moment. GPT-4 is advanced enough to transform a decent number of jobs, especially office and tech work. We could spend the next 10-20 years implementing that technology in various fields the way that Excel transformed middle management in the 90s. In terms of sales and contracts with big businesses, the goal now should be refining the use of existing potential, as it's more likely to create a saleable product than pushing capabilities faster than they can be absorbed. Not to mention taking the time now to deal with the existing regulatory burden without creating new burdens.
To me, that means OpenAI and other companies have less incentive to push for non-incremental changes as well. They would be looking at creating new products while existing products still haven't recouped R&D expenses, which would be a horrible idea unless they felt they were close to a major breakthrough. By all accounts they feel closer to stagnation than a breakthrough, so it's a good time to monetize what they have instead of burning cash with limited returns.
I've been entertaining this line of thinking as well, recently. Just seems weird that OpenAI would train, say, GPT-5 so quickly after GPT-4 (as I've seen some people suggest). 4 is plenty good already imo, makes more sense to iron it out than to create its replacement. The shift in focus towards products, that OA has been taking since last year, also supports this, I think.
We might have an "AI dot com crash", but I believe another "AI Winter" is unlikely.
I knew people who worked at AI companies before the first AI Winter. Fundamentally, the 80s AI wave combined an overheated hype cycle with a fundamentally inadequate technology, "expert systems." Expert systems used rule-based reasoning, not "inscrutable matrices". And they really couldn't do much outside of a few tiny niches. Most forms of rule-based reasoning do not lead to intelligence. (p > 99%, and I'd be willing to consider a carefully written bet at those odds.)
So when the hype cycle flamed out, there wasn't really anyone left working on the problem. And there wasn't any "deployment overhang" to deal with.
When the dotcom hype cycle flamed out, we saw massive job loss. But there was also a _massive_ "deployment overhang". The internet made it possible to transform thousands of industries, and only a small portion of that work was mature by 2001. Much of the last 20 years has been spent working through that overhang.
We're currently in a massive hype cycle. But even if GPT 4 never improves _at all_, it still has the ability to transform countless industries.
So even if the hype cycle flames out, society will still spend the next 20 years coming to terms with the tech we have now.
And I don't think we're done with tech improvements. All it may take is 1 or 2 brilliant papers to improve algorithms, and the future gets pretty weird.
Re: LLM scaling; it's hard to imagine where to extract more performance, at least in my (admittedly limited) view. GPT-4 likely has around 1T to 10T parameters, but according to the new scaling laws, it would require 20T to 200T tokens for efficient training. I'm not sure obtaining that much high-quality data is feasible. And the compute cost would be absurd.
Genuinely curious what an Epic AI generated letter will be like. I’m a psychiatrist, and my patient assessment letters are often to myself in the future more than anything (GPs don’t actually read anything but the last few lines!) - I suspect AI generated letters will lack flair for describing a patient’s mental state, but we’ll see. The Epic tools for writing a mental state exam are broadly terrible to begin with.
For the business process documentation, the implication seemed pretty clear to me from the text that the counterfactual here without ChatGPT is in fact that they would never have gotten around to doing it due to the (non-) trivial inconvenience of having to find someone to spend a day and a half on it.
If it's important, companies will do things. Not getting paid by your customers is kind of a big deal, so they would have done it. Worst case scenario, they hire a consultant to do the same thing (and because the consultant is hopefully a subject-matter expert, they can do it quickly). More likely, someone goes online and looks up the same thing and finds six procedures other people already made, and picks the one closest to their own needs to adapt. It takes maybe two to four hours of work. The hard part in either case is tying the new procedure into existing processes and training the staff. That's going to take the same amount of time whether GPT writes it or someone on staff puts it together. If the staff person is particularly competent, the procedure may match the companies' operations better.
I've worked with small companies, specifically getting tasked with complying with new regulations or fixing problems exactly like the one being discussed.
If you're talking about a 10 person organization, then you're probably correct. I look at small as more like 100 people, which is about the smallest I've worked for.
#1: Regarding telling the chatbot AIs they're experts, now I finally understand why, in the original Tron movie, Flynn was telling Clu, "You're the best program that's ever been written. You're dogged and relentless, remember?"
I think you're conflating Sharp Left Turn with Treacherous Turn.
Sharp Left Turn (https://www.lesswrong.com/posts/GNhMPAWcfBCASy8e6/a-central-ai-alignment-problem-capabilities-generalization): An AI starts out basically aligned but only having capabilities in a narrow range of domains. Then its capabilities become more general, which allows it to rapidly become more capable at many new domains. But the internal properties that made it aligned in its original domain fail to generalize to the new domains, and the AI is now misaligned.
Treacherous Turn (https://www.lesswrong.com/tag/treacherous-turn): An AI starts out already misaligned with humans, but without the resources to seize control of the world yet. It pretends to be aligned for long enough to gather more power, then betrays the humans and seizes control when it has the opportunity.
Indeed as you note humans do small-scale Treacherous Turns all the time. But we don't do Sharp Left Turns all the time. The only human Sharp Left Turn (according to Nate) is the evolutionary transition from chimps to humans (which Quentin disputes as an example).
I think it's reasonable to conflate the two given how hard it is to know with any particular confidence that intelligence (including humans) is aligned anyways.
I think it's reasonable to think that humans perform both kinds of turns. Examples of sharp left turns might include 'good' people that struggle after they start leading (very) large groups of other people.
'Sociopaths'/'psychopaths' seem like the most prototypical category of people that perform Treacherous Turns but I'm not sure that everyone else is entirely/mostly 'aligned' either. People respond to all kinds of incentives besides morality or ethics (if or when they even do to any considerable extent). It's hard for me to how much socialization is a component of 'human alignment' (as an actively functional 'alignment system').
But you might be right that some additional precision would be warranted. Maybe we should refer to both Turns (and any other similar ones) as Dangerous Turns
Read the following comment as the ML equivalent of an "in mice" reply to bio/medicine hype posting:
#15. Some context I think the explanation misses: There have been _tons_ of papers (starting right after BERT blew up) proposing ways of extending transformer context length. Just to name a few, you've got:
1. Transformer-XL
2. The Compressive transformer
3. ReFormers
4. State space models
5. Approximately a million different efficient attention variants which reduce the cost which Pete's explanation discusses
The thing is, none of these have replaced good old fashioned (basically-)dense attention for large model pretraining yet, and I don't think the experiments in this paper establish that it will be the one to do so. It might be, but the question is always just "should you spend your flops on bigger model/more data or longer context".
Re: Superforecasters, I _really_ want to see the question wording since I feel like if it included: "we do AlphaFold type stuff for biology research" even the most AI-detractor types would have >1%. Maybe it was about some very specific scenario?
You can read most of the question details in my post here: https://damienlaird.substack.com/p/forecast-from-the-2022-hybrid-forecasting
It's Demis Hassabis, not Dennis. Possibly autocorrect malfunction.
*Demis Hassabis, not Dennis
Re: Pascal's Mugging
The primary issue is that we have no objective means to determine how likely any of the various necessary conditions (AI can bootstrap, AI will be destructive, AI will be able to overcome humans, etc.) are, or what it would take for them to exist. So you can say 1% or 5% or 50% or 0.00001% and there's no way to debate it. That many people seem to think it's >5% doesn't make that true. Many more people believe that Jesus will return with much higher than 5% chance, and the term Pascal's Mugging was invented about those people saying it (simplifying a much longer conversation, assuming the readers here are aware of the longer conversation).
If you can provide a formula to determine the chances of a world-ending AI scenario that's objective and reasonable - rather than speculative and subjective as with the current models - then we can get away from the Pascal situation. Otherwise you need an objective mathematical model to determine the chances of Jesus returning, in order to differentiate between the two scenarios. If you have no objective method to differentiate, then you have done nothing to overcome the criticism.
One seemingly important thing I never see talked about in FOOM debates: What if the difficulty of increasing intelligence increases faster than the increase in intelligence?
Obviously no one knows the answer, but FOOM _requires_ that difficulty of making yourself smarter increases more slowly than the intelligence gains, and it is not obvious to me that this _must_ be the case. It _might_ be the case but I can easily imagine that this is not in fact true and it _might_ be the case that increases in intelligence are inherently self limiting in the sense that it takes longer and longer and more and more intellectual work to make the next advance (even when accounting for the fact that you are now smarter).
Since no one can _know_ if this is true or not, I'm not arguing that we should depend on it, or that we shouldn't be concerned about existential risk, but I think that the possibility that this is true _does_ put a limit on how confident we should be in P(Doom).
I was thinking about the same thing recently. Assuming each advancement is more difficult than the last seems non-controversial. FOOM posits that the AI gets better at solving the next stage even faster than the difficulty grows, but there's no reason for us to think that's true. Diminishing returns should actually encourage us to think otherwise. And that's if the AI can both develop *and* implement the changes with increasing speed. If there are hardware requirements instead of software, then there may be significant bottlenecks while new production facilities and processes are developed.
The EU stuff makes me sad. It seems increasingly likely that we'll regulate away anything fun or useful and live boring lives superficially safe but with horrible downsides that are not obvious. Worse, in enough time, our descendants will forget this stuff was even possible, or at least regard it as wildly unsafe and immoral.
I agree with the mood, but in fact the world is larger than the EU, so the outcome is that the EU becomes the world's retirement village and holiday home whilst everything real happens elsewhere.
Still very sad for the EU though!
I'm not as confident. Safety-ism has ways of spreading; it looks better on paper and people can get all offended if you want to, say, use fireworks or gather in large crowds. I mean, I get why - they have an anxiety response that is overwhelming and assume that everyone else should have the same response - but I think they tend to win in the end.
If the Dems were in power for the long term, I would be more worried. But leaving aside the extreme left, I don't see the same determination to commit cultural suicide in America, Asia or Africa.
> Strangely, also D&D encounters?
> Brian: He has a point it did this to my D&D stuff too, asked it to put together 3 encounters for a party of 4 level 7 characters and it was like nope.
For what it's worth, I tried 5 times with GPT4 and got good answers every time. (I'm using the API, so maybe it's different in the playground? Or maybe it's only 3.5 that has this restriction? Maybe a different, less-generic system prompt? Didn't bother exploring.) My prompt was
> I'm running a D&D campaign. Can you come up with 3 encounters for a party of 4 level 7 characters?
I just tried a prompt very similar to your last one and I got a great answer (with three encounters).
That Pascal’s Mugging comment, oh my God…
I feel bad for Eliezer. He has taken it upon himself to explain to everyone why they’re wrong. This is incredibly valuable work. Someone has to do it. Unfortunately that means interacting with the absolute dregs of intellectual society. I can’t blame him for being smug and arrogant after dealing with objections like this. I never would have expected anyone to make the mistake that Rohit did.
Curious on people's thoughts about AI winter likelihood over the next couple years. I am pretty bullish on transformative AI over the next few decades but my short-run guesses have gotten a bit more conservative -- I feel like the GPT-3.5 -> GPT-4 differences are a bit more subtle than I would have expected, it seems like Sam Altman is pretty confident we are done scaling for a little bit and I do see some mundane utility but not much, although I'm sure that will change over a months-years timespan even with current capabilities (but maybe not rapidly). Certainly we can all agree that the hype cycle that went "prepare for every week to get weirder from here on out" was wrong, right?
It doesn't look like further incremental advancements would be very useful at the moment. GPT-4 is advanced enough to transform a decent number of jobs, especially office and tech work. We could spend the next 10-20 years implementing that technology in various fields the way that Excel transformed middle management in the 90s. In terms of sales and contracts with big businesses, the goal now should be refining the use of existing potential, as it's more likely to create a saleable product than pushing capabilities faster than they can be absorbed. Not to mention taking the time now to deal with the existing regulatory burden without creating new burdens.
To me, that means OpenAI and other companies have less incentive to push for non-incremental changes as well. They would be looking at creating new products while existing products still haven't recouped R&D expenses, which would be a horrible idea unless they felt they were close to a major breakthrough. By all accounts they feel closer to stagnation than a breakthrough, so it's a good time to monetize what they have instead of burning cash with limited returns.
I've been entertaining this line of thinking as well, recently. Just seems weird that OpenAI would train, say, GPT-5 so quickly after GPT-4 (as I've seen some people suggest). 4 is plenty good already imo, makes more sense to iron it out than to create its replacement. The shift in focus towards products, that OA has been taking since last year, also supports this, I think.
We might have an "AI dot com crash", but I believe another "AI Winter" is unlikely.
I knew people who worked at AI companies before the first AI Winter. Fundamentally, the 80s AI wave combined an overheated hype cycle with a fundamentally inadequate technology, "expert systems." Expert systems used rule-based reasoning, not "inscrutable matrices". And they really couldn't do much outside of a few tiny niches. Most forms of rule-based reasoning do not lead to intelligence. (p > 99%, and I'd be willing to consider a carefully written bet at those odds.)
So when the hype cycle flamed out, there wasn't really anyone left working on the problem. And there wasn't any "deployment overhang" to deal with.
When the dotcom hype cycle flamed out, we saw massive job loss. But there was also a _massive_ "deployment overhang". The internet made it possible to transform thousands of industries, and only a small portion of that work was mature by 2001. Much of the last 20 years has been spent working through that overhang.
We're currently in a massive hype cycle. But even if GPT 4 never improves _at all_, it still has the ability to transform countless industries.
So even if the hype cycle flames out, society will still spend the next 20 years coming to terms with the tech we have now.
And I don't think we're done with tech improvements. All it may take is 1 or 2 brilliant papers to improve algorithms, and the future gets pretty weird.
Re: LLM scaling; it's hard to imagine where to extract more performance, at least in my (admittedly limited) view. GPT-4 likely has around 1T to 10T parameters, but according to the new scaling laws, it would require 20T to 200T tokens for efficient training. I'm not sure obtaining that much high-quality data is feasible. And the compute cost would be absurd.
Genuinely curious what an Epic AI generated letter will be like. I’m a psychiatrist, and my patient assessment letters are often to myself in the future more than anything (GPs don’t actually read anything but the last few lines!) - I suspect AI generated letters will lack flair for describing a patient’s mental state, but we’ll see. The Epic tools for writing a mental state exam are broadly terrible to begin with.
Re AI discovers Kepler's law: Symbolic regressions have been able to do this for quite a while, e..g https://www.science.org/doi/10.1126/sciadv.aav6971
For the business process documentation, the implication seemed pretty clear to me from the text that the counterfactual here without ChatGPT is in fact that they would never have gotten around to doing it due to the (non-) trivial inconvenience of having to find someone to spend a day and a half on it.
If it's important, companies will do things. Not getting paid by your customers is kind of a big deal, so they would have done it. Worst case scenario, they hire a consultant to do the same thing (and because the consultant is hopefully a subject-matter expert, they can do it quickly). More likely, someone goes online and looks up the same thing and finds six procedures other people already made, and picks the one closest to their own needs to adapt. It takes maybe two to four hours of work. The hard part in either case is tying the new procedure into existing processes and training the staff. That's going to take the same amount of time whether GPT writes it or someone on staff puts it together. If the staff person is particularly competent, the procedure may match the companies' operations better.
You don't work with enough companies, especially small ones!
I've worked with small companies, specifically getting tasked with complying with new regulations or fixing problems exactly like the one being discussed.
If you're talking about a 10 person organization, then you're probably correct. I look at small as more like 100 people, which is about the smallest I've worked for.
#1: Regarding telling the chatbot AIs they're experts, now I finally understand why, in the original Tron movie, Flynn was telling Clu, "You're the best program that's ever been written. You're dogged and relentless, remember?"
https://www.youtube.com/watch?v=PQwKV7lCzEI
I think you're conflating Sharp Left Turn with Treacherous Turn.
Sharp Left Turn (https://www.lesswrong.com/posts/GNhMPAWcfBCASy8e6/a-central-ai-alignment-problem-capabilities-generalization): An AI starts out basically aligned but only having capabilities in a narrow range of domains. Then its capabilities become more general, which allows it to rapidly become more capable at many new domains. But the internal properties that made it aligned in its original domain fail to generalize to the new domains, and the AI is now misaligned.
Treacherous Turn (https://www.lesswrong.com/tag/treacherous-turn): An AI starts out already misaligned with humans, but without the resources to seize control of the world yet. It pretends to be aligned for long enough to gather more power, then betrays the humans and seizes control when it has the opportunity.
Indeed as you note humans do small-scale Treacherous Turns all the time. But we don't do Sharp Left Turns all the time. The only human Sharp Left Turn (according to Nate) is the evolutionary transition from chimps to humans (which Quentin disputes as an example).
I think it's reasonable to conflate the two given how hard it is to know with any particular confidence that intelligence (including humans) is aligned anyways.
I think it's reasonable to think that humans perform both kinds of turns. Examples of sharp left turns might include 'good' people that struggle after they start leading (very) large groups of other people.
'Sociopaths'/'psychopaths' seem like the most prototypical category of people that perform Treacherous Turns but I'm not sure that everyone else is entirely/mostly 'aligned' either. People respond to all kinds of incentives besides morality or ethics (if or when they even do to any considerable extent). It's hard for me to how much socialization is a component of 'human alignment' (as an actively functional 'alignment system').
But you might be right that some additional precision would be warranted. Maybe we should refer to both Turns (and any other similar ones) as Dangerous Turns