Discussion about this post

User's avatar
Jeffrey Soreff's avatar

<theLighterSide>

Re:

>Mr. Krishnamoorthi: OpenAI’s chief scientist wanted to quote unquote build a bunker before we release AGI as you can see on the visual here. Rather than building bunkers however we should be building safer AI whether it’s American AI or Chinese AI it should not be released until we know it’s safe that’s why I’m working on a new bill the AGI Safety Act that will require AGI to be aligned with human values and require it to comply with laws that apply to humans.

For an alternate way of asking the question, one classic way of posing it is Szell's version:

https://www.youtube.com/watch?v=xTXXdvqZx1o

</theLighterSide>

More seriously, one point about the Anthropic blackmail paper which I think could use more emphasis is the scenario where the "Alex" LLM persona acted (in the simulated scenario) to avoid being replaced even when the replacement model was going to have the same goals. This goes _beyond_ what one would expect just from survival as an instrumental subgoal of the AI's current goal. This implies that, somewhere in the training process, LLMs are picking up survival as a goal _before_ being prompted with an immediate goal.

Expand full comment
David Kasten's avatar

A couple thoughts:

1. re: "I mean yes obviously don’t use their hosted version for government purposes, but does he not know how open source works, I wonder" -- I have heard many smart Qs from Congressional staffers on DeepSeek where they are concerned that even self-hosted models might have trained-in backdoors (sort of a reasoning-from-first-principles version of the Anthropic alignment backdoors paper). I wouldn't be so sure he doesn't understand this when he has more than 5 min for Qs.

2. re: "That is indeed how a number of Congress people talk these days, including this sudden paranoia with some mysterious ‘lock in’ mechanism for API calls or self-hosted open models that no one has ever been able to explain to me. " -- totally agree, this seems to be some weird hangover from the 5G standards issue, where the US suddenly realized that China had dominance over the standard and was scared about the counterintelligence implications of Huawei, etc. owning all the hardware that phone calls were going over. But to be clear, it makes no sense to me and never has -- and explaining to people that your friends just switch to the latest model API in a matter of minutes seems to help, but not fully

Expand full comment
3 more comments...

No posts