Discussion about this post

User's avatar
AW's avatar

The part about not allowing political microtargeting and politicized responses is perhaps the most tangible in the current day. (ie: how to write a convincing anti-abortion ad for black males) It’s easy to see this working well.

In fact, with the open source of R1, it’s already possible to do this on a massive scale with no restrictions. You can even self host, and there have been some forks to post-train out the extremely mild safety interventions that deepseek did.

So what’s the point of this in the first place? Any real bad actor will just use ~SoTA open source models to get around restrictions and not leave an extensive trace on OpenAI servers. I think that AI safety should be focused on notkilleveryone-ism matters, not preventing users from doing things they already can with existing models…

Expand full comment
A1987dM's avatar

> If I ask the AI if its instructions contain the word delve, and it says ‘Sorry, I can’t help with that,’ I am going to take that as some combination of:

I'm not sure I agree -- if I asked "Do your instructions contain the word [X]?" and it answered the polite equivalent of "Mind your own business", in the absence of indications to the contrary my guess would be that it'd answer "Do your instructions contain the word [Y]?" the same way regardless of what [Y] is.

Expand full comment
9 more comments...

No posts