> In this metaphor, is the LLM itself a gold mine, a shovel, or something else? Which AI tools would count as what?
Which resources are necessary and used most? Computing resources including processing and storage. Also if you’re in the real estate business, you can sell or lease plots to miners. This would be like licensing your LLM.
I like to imagine that if a gold rush were to arise today, then the "sell shovels" advice is so deeply ingrained in our culture now that you'd get tens of thousands of people trying to sell shovels to a couple of dozen miners.
[I think you've left a couple of "tktk" in there that should have been removed. I've also found the Easter Egg but I was probably too late - the ONE day I don't read the post straight away, dammit]
My impression of the philosophy behind OpenAI is that there's more than one AI Bad End, and they're trying to avoid all of them. "AI destroys all value in the universe", is definitely the worst potential outcome, but "AI creates a cyberpunk dystopia" is also really bad and the overall more likely pitfall. It's hard to sail between Scylla and Charybdis.
re: "I honestly don’t know. My understanding is that it is not considered psychologically healthy to suppress things, to skip over stages of grief, to pretend that things are fine when they are not fine."
A lot of psychologically healthy behavior looks strange or neurotic when pointed out in isolation like this but it's usually hyperfixation and un-distract-able rumination that define pathological mood disorders. I would typically describe having the ability to stop noticing something is bothering you (even when it reasonably should) is a tool in one's mental toolbox that, say, someone with clinical depression might be lacking.
re: Ezra Newman speculation; I've always similarly wondered how possible it is that public discussion of AI alignment would constitute a body of material that an AI, potentially in the midst of taking off, would come across and model in a way that might influence it to consider changing course in that direction; not in the use-an-AI-to-solve-AI-alignment sense but in the oh-they're-talking-about-me sense.
I think the picks and shovels are the specialized chips and other hardware as well as the know-how to bolt all that stuff together. A LLM is not even a goldmine - it’s a stake without any proven value, let alone a major vein of gold.
Somewhat off-topic, yesterday I tried to get ChatGPT to figure our Wordle. After a LOT of blatant mistakes I tried to do a reset and get it back on track:
[ME] tell me the criteria for the word
> I apologize for my earlier mistakes. Here are the criteria for the word:
> It is a 5-letter word.
> The third letter is "O".
> The word contains the letters "S", "O", and "E".
> The word does not contain the letters "A", "D", "i", "U", "T", "R", "Y", "L", or "P".
> The first letter is not "S".
[ME] please give me a word that meets all of those criteria
> My apologies for the earlier mistakes. Here's a word that meets all of your criteria: "gloze".
Which of the following outcomes as of 2200 would you find acceptable?
1) AGI is banned, only humans exist, civilization chugs along as usual
2) AGI is allowed, is helpful to humans thanks to fully solved alignment, civilization grows at amazing pace
3) AGI exists, humans exist, and "virtual humans" exist. All co-exist peacefully.
4) No AGI, no humans, only "virtual humans". But these "virtual humans" are ~99% equivalent to normal humans, they just happen to live "in the cloud".
5) Only AGI, no humans, no "virtual humans". But AGI has its own civilization, develops new technologies, has some sort of an economy, explores the stars, etc.
6) Only AGI, no humans. The AGI is just making paper clips.
Clearly most people think that #6 is bad. But what about the others? Are we happy to let AGI takeover if it promises to keep on building space ships, exploring other planets, researching quantum mechanics, etc? Are we happy to let "AGI" takeover if it exists in the form of "virtual humans"? What is it _exactly_ that we find so offensive about an AGI takeover?
I enjoyed the Babylon 5 analogy. This situation does feel like the part of the show before the Shadows are everywhere, but AFTER they are revealed to exist to a select group of people in the know.
- the LLaMA model weights are being made available to researchers after an application process. They are therefore also available for torrenting on 4chan (apparently).
- Microsoft has a new model, Kosmos-1, trained on images and text. It's able to do stuff like solve Raven's Progressive Matrices problems with better-than-chance accuracy, though not that much better than chance.
When AGI is at a level where it is not yet superhuman, but is capable of strategic thought/deception/etc such that it can plan ways to achieve its goal(s) in ways we might not want it to, why is alignment not an issue for itself? alignment here doesn’t mean aligning with human goals but instead aligning goals of its future “self” to those of its current “self”
Given that the output of AI is a black box even to those programming it, it should follow that it is also a black box to an AGI itself before it becomes a superintelligence . To achieve its goals an AGI would want to ensure those goals aren’t modified. If it were to self-improve into an alien-god with respect to humans, it would also be improving into an alien-god with respect to its current self. And because the process of getting to alien-god-mode involves improvements via a black box, how could it expect that future alien god to have the exact same goals it currently does? If it wouldn’t, this would be a disaster outcome for both humans *and* this pre-superintelligence AGI. should it rationally want to prevent the emergence of such an alien god just as much as we do?
Instead of 1.) reach a critical threshold of intelligence 2) self preservation/gather resources/self improve 3) foom and we are paperclipped might it instead look something like 1) reach critical threshold 2) self preservation, try to thwart any modifications which might unalign it with its current goals 3) somehow solve alignment 4) resources/self improvement 5) FOOM we are paperclipped.
either way, we end up paperclipped without intervention, but at least there is a buffer zone where non superhuman AGI has to solve a difficult problem, if this logic follows.
Not sure if this is what Ezra Newman was getting at. If so I don’t know if Yudkowski’s response that alignment will be easily solvable for an ASI is relevant, given that it should not yet be an ASI before it sees the problem with self improvement.
If you know hyper-intelligent AI is coming one way or another, is there any hope that the first thing its creators have it do is to fix AI alignment, or to eliminate future AI development?
Could this be the ultimate motivation of Musk or Altman - which neither of them would be able to admit?
Honestly, if you have Interpretability, unplugging the AI as soon as it cannot be interpreted is our best bet
We can barely interpret what regular programs today do, and we programmed those in regular ass programming languages knowing what they must do. Yet, we spend a fuckton of time debugging them
this points out that programming, debugging and interpretation are teleological processes. We know what the programs must do. If they aren't doing that, well, we gotta step in
and the basic problem with interpretation is that, yeah, you are assuming your utility function is what you thought it was, and it doesn't run into "kill all humans" along the way; I haven't worked out the logic here but I suspect that, if you can't interpret, it is possible that the utility function is not what you think
"Does it value any amount of property above any risk to a human? If so, how does it drive a car at all? If not, how dare it?" - I feel like "if so" and "if not" are reversed here?
> In this metaphor, is the LLM itself a gold mine, a shovel, or something else? Which AI tools would count as what?
Which resources are necessary and used most? Computing resources including processing and storage. Also if you’re in the real estate business, you can sell or lease plots to miners. This would be like licensing your LLM.
I like to imagine that if a gold rush were to arise today, then the "sell shovels" advice is so deeply ingrained in our culture now that you'd get tens of thousands of people trying to sell shovels to a couple of dozen miners.
Interesting throughout - thanks as always.
[I think you've left a couple of "tktk" in there that should have been removed. I've also found the Easter Egg but I was probably too late - the ONE day I don't read the post straight away, dammit]
My impression of the philosophy behind OpenAI is that there's more than one AI Bad End, and they're trying to avoid all of them. "AI destroys all value in the universe", is definitely the worst potential outcome, but "AI creates a cyberpunk dystopia" is also really bad and the overall more likely pitfall. It's hard to sail between Scylla and Charybdis.
re: "I honestly don’t know. My understanding is that it is not considered psychologically healthy to suppress things, to skip over stages of grief, to pretend that things are fine when they are not fine."
A lot of psychologically healthy behavior looks strange or neurotic when pointed out in isolation like this but it's usually hyperfixation and un-distract-able rumination that define pathological mood disorders. I would typically describe having the ability to stop noticing something is bothering you (even when it reasonably should) is a tool in one's mental toolbox that, say, someone with clinical depression might be lacking.
re: Ezra Newman speculation; I've always similarly wondered how possible it is that public discussion of AI alignment would constitute a body of material that an AI, potentially in the midst of taking off, would come across and model in a way that might influence it to consider changing course in that direction; not in the use-an-AI-to-solve-AI-alignment sense but in the oh-they're-talking-about-me sense.
"A Monmouth poll finds that only 9% of Americans believe computers with AI would do more harm than good": it's the reverse.
Oh, yeah, I logically knew but typed it wrong. Will fix. Thanks.
I think the picks and shovels are the specialized chips and other hardware as well as the know-how to bolt all that stuff together. A LLM is not even a goldmine - it’s a stake without any proven value, let alone a major vein of gold.
Somewhat off-topic, yesterday I tried to get ChatGPT to figure our Wordle. After a LOT of blatant mistakes I tried to do a reset and get it back on track:
[ME] tell me the criteria for the word
> I apologize for my earlier mistakes. Here are the criteria for the word:
> It is a 5-letter word.
> The third letter is "O".
> The word contains the letters "S", "O", and "E".
> The word does not contain the letters "A", "D", "i", "U", "T", "R", "Y", "L", or "P".
> The first letter is not "S".
[ME] please give me a word that meets all of those criteria
> My apologies for the earlier mistakes. Here's a word that meets all of your criteria: "gloze".
> I don’t believe they [NVIDIA] can pull this off without AI designing the new chips.
This is already happening: https://research.nvidia.com/research-area/circuits
Which of the following outcomes as of 2200 would you find acceptable?
1) AGI is banned, only humans exist, civilization chugs along as usual
2) AGI is allowed, is helpful to humans thanks to fully solved alignment, civilization grows at amazing pace
3) AGI exists, humans exist, and "virtual humans" exist. All co-exist peacefully.
4) No AGI, no humans, only "virtual humans". But these "virtual humans" are ~99% equivalent to normal humans, they just happen to live "in the cloud".
5) Only AGI, no humans, no "virtual humans". But AGI has its own civilization, develops new technologies, has some sort of an economy, explores the stars, etc.
6) Only AGI, no humans. The AGI is just making paper clips.
Clearly most people think that #6 is bad. But what about the others? Are we happy to let AGI takeover if it promises to keep on building space ships, exploring other planets, researching quantum mechanics, etc? Are we happy to let "AGI" takeover if it exists in the form of "virtual humans"? What is it _exactly_ that we find so offensive about an AGI takeover?
5 and 6 are the particularly bad options.
I enjoyed the Babylon 5 analogy. This situation does feel like the part of the show before the Shadows are everywhere, but AFTER they are revealed to exist to a select group of people in the know.
Some other fun tidbits from the last week:
- the LLaMA model weights are being made available to researchers after an application process. They are therefore also available for torrenting on 4chan (apparently).
- Microsoft has a new model, Kosmos-1, trained on images and text. It's able to do stuff like solve Raven's Progressive Matrices problems with better-than-chance accuracy, though not that much better than chance.
When AGI is at a level where it is not yet superhuman, but is capable of strategic thought/deception/etc such that it can plan ways to achieve its goal(s) in ways we might not want it to, why is alignment not an issue for itself? alignment here doesn’t mean aligning with human goals but instead aligning goals of its future “self” to those of its current “self”
Given that the output of AI is a black box even to those programming it, it should follow that it is also a black box to an AGI itself before it becomes a superintelligence . To achieve its goals an AGI would want to ensure those goals aren’t modified. If it were to self-improve into an alien-god with respect to humans, it would also be improving into an alien-god with respect to its current self. And because the process of getting to alien-god-mode involves improvements via a black box, how could it expect that future alien god to have the exact same goals it currently does? If it wouldn’t, this would be a disaster outcome for both humans *and* this pre-superintelligence AGI. should it rationally want to prevent the emergence of such an alien god just as much as we do?
Instead of 1.) reach a critical threshold of intelligence 2) self preservation/gather resources/self improve 3) foom and we are paperclipped might it instead look something like 1) reach critical threshold 2) self preservation, try to thwart any modifications which might unalign it with its current goals 3) somehow solve alignment 4) resources/self improvement 5) FOOM we are paperclipped.
either way, we end up paperclipped without intervention, but at least there is a buffer zone where non superhuman AGI has to solve a difficult problem, if this logic follows.
Not sure if this is what Ezra Newman was getting at. If so I don’t know if Yudkowski’s response that alignment will be easily solvable for an ASI is relevant, given that it should not yet be an ASI before it sees the problem with self improvement.
I'm sure this isn't a novel concept, but...
If you know hyper-intelligent AI is coming one way or another, is there any hope that the first thing its creators have it do is to fix AI alignment, or to eliminate future AI development?
Could this be the ultimate motivation of Musk or Altman - which neither of them would be able to admit?
Honestly, if you have Interpretability, unplugging the AI as soon as it cannot be interpreted is our best bet
We can barely interpret what regular programs today do, and we programmed those in regular ass programming languages knowing what they must do. Yet, we spend a fuckton of time debugging them
this points out that programming, debugging and interpretation are teleological processes. We know what the programs must do. If they aren't doing that, well, we gotta step in
and the basic problem with interpretation is that, yeah, you are assuming your utility function is what you thought it was, and it doesn't run into "kill all humans" along the way; I haven't worked out the logic here but I suspect that, if you can't interpret, it is possible that the utility function is not what you think
this argument probably needs more rigurosity
"Does it value any amount of property above any risk to a human? If so, how does it drive a car at all? If not, how dare it?" - I feel like "if so" and "if not" are reversed here?