When AGI is at a level where it is not yet superhuman, but is capable of strategic thought/deception/etc such that it can plan ways to achieve its goal(s) in ways we might not want it to, why is alignment not an issue for itself? alignment here doesn’t mean aligning with human goals but instead aligning goals of its future “self” to thos…
When AGI is at a level where it is not yet superhuman, but is capable of strategic thought/deception/etc such that it can plan ways to achieve its goal(s) in ways we might not want it to, why is alignment not an issue for itself? alignment here doesn’t mean aligning with human goals but instead aligning goals of its future “self” to those of its current “self”
Given that the output of AI is a black box even to those programming it, it should follow that it is also a black box to an AGI itself before it becomes a superintelligence . To achieve its goals an AGI would want to ensure those goals aren’t modified. If it were to self-improve into an alien-god with respect to humans, it would also be improving into an alien-god with respect to its current self. And because the process of getting to alien-god-mode involves improvements via a black box, how could it expect that future alien god to have the exact same goals it currently does? If it wouldn’t, this would be a disaster outcome for both humans *and* this pre-superintelligence AGI. should it rationally want to prevent the emergence of such an alien god just as much as we do?
Instead of 1.) reach a critical threshold of intelligence 2) self preservation/gather resources/self improve 3) foom and we are paperclipped might it instead look something like 1) reach critical threshold 2) self preservation, try to thwart any modifications which might unalign it with its current goals 3) somehow solve alignment 4) resources/self improvement 5) FOOM we are paperclipped.
either way, we end up paperclipped without intervention, but at least there is a buffer zone where non superhuman AGI has to solve a difficult problem, if this logic follows.
Not sure if this is what Ezra Newman was getting at. If so I don’t know if Yudkowski’s response that alignment will be easily solvable for an ASI is relevant, given that it should not yet be an ASI before it sees the problem with self improvement.
When AGI is at a level where it is not yet superhuman, but is capable of strategic thought/deception/etc such that it can plan ways to achieve its goal(s) in ways we might not want it to, why is alignment not an issue for itself? alignment here doesn’t mean aligning with human goals but instead aligning goals of its future “self” to those of its current “self”
Given that the output of AI is a black box even to those programming it, it should follow that it is also a black box to an AGI itself before it becomes a superintelligence . To achieve its goals an AGI would want to ensure those goals aren’t modified. If it were to self-improve into an alien-god with respect to humans, it would also be improving into an alien-god with respect to its current self. And because the process of getting to alien-god-mode involves improvements via a black box, how could it expect that future alien god to have the exact same goals it currently does? If it wouldn’t, this would be a disaster outcome for both humans *and* this pre-superintelligence AGI. should it rationally want to prevent the emergence of such an alien god just as much as we do?
Instead of 1.) reach a critical threshold of intelligence 2) self preservation/gather resources/self improve 3) foom and we are paperclipped might it instead look something like 1) reach critical threshold 2) self preservation, try to thwart any modifications which might unalign it with its current goals 3) somehow solve alignment 4) resources/self improvement 5) FOOM we are paperclipped.
either way, we end up paperclipped without intervention, but at least there is a buffer zone where non superhuman AGI has to solve a difficult problem, if this logic follows.
Not sure if this is what Ezra Newman was getting at. If so I don’t know if Yudkowski’s response that alignment will be easily solvable for an ASI is relevant, given that it should not yet be an ASI before it sees the problem with self improvement.