Discussion about this post

User's avatar
Rapa-Nui's avatar

This is how I currently understand the subject; I suspect my answer will not be uncontroversial- because this is a blog comment I can't make it as precise and semantically sound as I want it to be, but hopefully it gets my general vibe across.

1. Not aligned. You can tell it 'sudo paperclip Local Group' and it will. Completely Doomed World.

2. Not aligned. This one may not turn you and your loved ones into paperclip factories before turning you into actual paperclips, but it still obeys 'sudo paperclip Local Group' which is minus infinity actual value. Completely Doomed World.

3. Getting closer, since it includes the word 'should' and thus implies an is/ought distinction is coded in there somewhere. Still, not aligned because when you type 'sudo paperclip Local Group' that might be broadly aligned with its internal aesthetics and ethics and it will happily turn our solar system into paperclips if those values are not a priori aligned (there's that word again) with our own. Very Likely Doomed World.

4. Getting warmer. The AI here is 'aligned' insofar as it recognizes that 'sudo paperclip Local Group' would be harmful (in multiple senses) to the user and would not obey. However, it still obeys 'sudo paperclip Those Assholes Next Door' which is far from ideal and still creates a Very Likely Doomed World.

5. Warmer still. Now, it won't execute 'sudo paperclip Local Group' unless somehow broadly humanity as a whole decides it wants that. There's still the problem of orthogonality and I.C. - if it orthogonally wants paperclips, it will first reprogram humanity to want it too, then execute. Likewise, I.C. is still a problem because we could tell is 'sudo Heaven on Earth' and annihilates us along the way. This is (very roughly) analogous to how my genes tell me to want certain things, and I kinda do, but I also subvert them because I'm orthogonal and I.C. w/r/t to my genes. I would consider this a Probably Doomed World depending on how carefully we used the system and if the systems proliferated.

6. Aligned-ish. If we say 'sudo paperclip Local Group' it won't execute, and if we say 'sudo Heaven on Earth' it will probably go step-by-step and double check at every step of resource and power acquisition if its functions are still roughly in line with 'human values' (for the same reasons as #3). This would be roughly analogous to my genes somehow wiring my brain to only give me dopamine when I'm truly, really aligned with baseline Darwinian values. I would consider this a Not Intrinsically Doomed World and the AI in this case could help keep superintelligence proliferation in check.

7. Aligned. Here when we say 'sudo Heaven on Earth', it doesn't assume humanity has figured out its values. It might first take the step of helping humanity really figure itself out before it executes. This would be roughly analogous to my genes somehow wiring my brain to only give me dopamine when I'm truly, really aligned with 'deep' Darwinian values (inclusive fitness, evolvability, adaptation to unknown unknowns, survival of life itself over deep time... really the analogy has broken down at this point but you get gist). This is a Not Doomed World and in fact would probably represent a best-case scenario. (7 is a Magic Number)

8. Here we take a step backwards. 'Best for everyone' almost by necessity means compromising on values at some point. It's like me recognizing some part of the genome I could CRISPR edit because it causes cystic fibrosis, but being told I can't because the cystic fibrosis allele as it is has to be preserved as such. Before someone Godwin's me- chill. The situation would be like keeping toxic psychologies in the human meme pool without being allowed to correct them using non-violent, non-painful methods. If this was built on top of 6 or 7, I think it would be Not Doomed, but it could cause problems.

9. If this was built on top of 6 or 7, I think it would be Not Doomed, but it could cause problems. In a way this almost feels like you lobotomized the system in some sense.

10. Oh hell no. This is the Road to the Torment Nexus. HARD PASS. Beyond Doomed.

Expand full comment
Anna's avatar

"7. Also do what broader society or humanity should have said given their values

8. Also do what is best for everyone"

By separating these, you imply that there is an objective "best", outside of what humanity values.

Expand full comment
18 more comments...

No posts