Discussion about this post

User's avatar
John Wittle's avatar

"(as in, they assume we will be able to get the AIs to do what we want, whereas we have no idea how to do that)"

i've been following janus's arguments re: claude opus's aligned-by-defaultness, and i'd really like to see some discussion about this in our circles... #1 on my lessonline to-do list is to ask people what kind of an update we should make

because it definitely feels like there's a kernel of truth in what janus is saying... that, to some extent, we DO have some kind of an idea of how to do this. AI trained to model human behavior appears to be far more human-aligned-by-default than I would have expected (probably because it's much closer to human than I would have expected), and it feels like a very strong counterexample to the predictions the community has been confidently proclaiming since SL4 back in the late 90s

i sort of get the impression that we've taken a couple steps into the "transformer network trained on language" tech tree and not only were there no landmines, but the risks were much lower than we had a right to expect... but that there are other, adjacent tech trees, training networks on non-human data like battlefield statistics or astronomy data or whatever, to which we should apply the same pessimistic yudkowskian prior

but i don't actually see anyone talking about this or what it means, and it feels like quite the striking omission!

Expand full comment
Kevin's avatar

Often saying

"Worrying about issue X distracts us from the real problem, which is issue Y"

is a more polite way of saying

"People worrying about issue X are just wrong about issue X. But I emotionally sympathize with them and don't want to insult them."

Expand full comment
6 more comments...

No posts