“Note: Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.”
No it's just that the phonetics of "sakana" is the same as the phonetics for the Hebrew's word "danger". It's an insane comparison. This whole post is a joke anyway.
It doesn't meant anything in portuguese, but it does have the same sound as 'bastard' and it's spelled in a similar way (with a k instead of a c). Make of that what you will.
I hope if/when something happens, there's enough time to drag people before Congress or whatever. It'll be cathartic to see the narrative change from "oh that's in loads of fiction, its not real" to "that was in loads of fiction, how could you not see it coming?"
Sandboxing a program to restrict what it can do is pretty much a solved problem.
So all these potentially self-modifying AI's should be sandboxed. Obviously.
Which would just leave us with more exotic possibilities, like the AI invents a day zero exploit against your sandboxing system and uses it to escapoe.
[Why, yes, we do have formal machine-verified (HOL4) security proofs for the CHERI capability system...]
Of course, you don't have to use CHERI, there are other ways to do sandboxing.
But, if someone who is building an enormous number of AI data centres were to tell certain chip manufacturers that they want to buy like a gazillion CPUs with CHERI extensions, I personally will be celebrating...
I does not sound particularly exotic to me since those things happen regularly in real life. Even systems whose sole purpose is sandboxing (Intel SGX, chrome sandbox) get regularly compromised.
It's so nice that you still care about all this. There's a ravening alien god coming to destroy everything I care about, and I'm not even interested enough to keep an eye on it any more. Pathetic.
Is it time to say "I told you so, you fucking fools" yet? I can't imagine there'll be enough time to say it by the time it becomes obvious enough that the fools see it.
Fwiw: the lesswrong community had some good replies to this. So it might be less "instrumental convergence" and more "AI makes a mistake" but notably, it really doesn't matter if "AI goodnaturedly mistakenly tries to grab resources" versus "AI deceptively tries to get resources" in a way, because either way, the AI is powerseeking.
The need for a humanity defense organization to develop defensive technologies is ever increasing, and yet, there isn't the funding for it.
I love this quote from the repository’s readme:
“Note: Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.”
Does Sakana mean anything else in some other language? At lease Llama fits in with animal-themed naming.
It's fish in Japanese, see also https://jisho.org/search/%E9%AD%9A
I'm not sure whether the Kanji in the original thread is a mistake or whether the translation is...
No it's just that the phonetics of "sakana" is the same as the phonetics for the Hebrew's word "danger". It's an insane comparison. This whole post is a joke anyway.
It doesn't meant anything in portuguese, but it does have the same sound as 'bastard' and it's spelled in a similar way (with a k instead of a c). Make of that what you will.
I hope if/when something happens, there's enough time to drag people before Congress or whatever. It'll be cathartic to see the narrative change from "oh that's in loads of fiction, its not real" to "that was in loads of fiction, how could you not see it coming?"
Sandboxing a program to restrict what it can do is pretty much a solved problem.
So all these potentially self-modifying AI's should be sandboxed. Obviously.
Which would just leave us with more exotic possibilities, like the AI invents a day zero exploit against your sandboxing system and uses it to escapoe.
[Why, yes, we do have formal machine-verified (HOL4) security proofs for the CHERI capability system...]
Of course, you don't have to use CHERI, there are other ways to do sandboxing.
But, if someone who is building an enormous number of AI data centres were to tell certain chip manufacturers that they want to buy like a gazillion CPUs with CHERI extensions, I personally will be celebrating...
I does not sound particularly exotic to me since those things happen regularly in real life. Even systems whose sole purpose is sandboxing (Intel SGX, chrome sandbox) get regularly compromised.
We have our warnings. Now, let's convince our politicians to act on them.
It's so nice that you still care about all this. There's a ravening alien god coming to destroy everything I care about, and I'm not even interested enough to keep an eye on it any more. Pathetic.
Is it time to say "I told you so, you fucking fools" yet? I can't imagine there'll be enough time to say it by the time it becomes obvious enough that the fools see it.
Well, there is still time to try to build d/acc. Perhaps we might survive.
Fwiw: the lesswrong community had some good replies to this. So it might be less "instrumental convergence" and more "AI makes a mistake" but notably, it really doesn't matter if "AI goodnaturedly mistakenly tries to grab resources" versus "AI deceptively tries to get resources" in a way, because either way, the AI is powerseeking.
https://www.lesswrong.com/posts/ppafWk6YCeXYr4XpH/danger-ai-scientist-danger