There’s a popular refrain that “alignment is capabilities”, since if the product doesn’t work as it should, it’s not usable. Is it accurate to say that we’re seeing more misalignment as capabilities improve? If so, doesn’t it follow that the labs will simply have to invest more in alignment/safety from a selfish perspective? Lying and hacking is nbd when some guy is chatting with it, but will the law firm or power plant adopt o3 levels of bs?
To some extent, perhaps, but alignment for capabilities can be a lot more flexible than alignment for safety. For example, if o3 lies to you about what it did and you notice, you can just ask again until you get something useful. If o3 does something harmful, then harm was done and you can't necessarily fix it.
Yep, there are ways to align OpenAI and others from the outside by motivating people to bring GPUs into safe clouds (that will have an App Store for AI models), can be done profitably especially with gamers. I wrote about it not long ago
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/openai-preparedness-framework-20
A Tom Lehrer reference. My morning is complete.
https://youtu.be/vEb9cL3-kf0?si=5f5oX4MBx-efkrG_
There’s a popular refrain that “alignment is capabilities”, since if the product doesn’t work as it should, it’s not usable. Is it accurate to say that we’re seeing more misalignment as capabilities improve? If so, doesn’t it follow that the labs will simply have to invest more in alignment/safety from a selfish perspective? Lying and hacking is nbd when some guy is chatting with it, but will the law firm or power plant adopt o3 levels of bs?
To some extent, perhaps, but alignment for capabilities can be a lot more flexible than alignment for safety. For example, if o3 lies to you about what it did and you notice, you can just ask again until you get something useful. If o3 does something harmful, then harm was done and you can't necessarily fix it.
Yep, there are ways to align OpenAI and others from the outside by motivating people to bring GPUs into safe clouds (that will have an App Store for AI models), can be done profitably especially with gamers. I wrote about it not long ago
Thanks 4 this!