Thanks for your solid commentary, as always. Few thoughts:
- It's kinda worrying they didn't do any red teaming for finetuning, only for queries. I'm basically counting down the months until someone uses this to cause real havoc. It feels like I’m in a constant “waiting for something bad to happen” mood, surprised that no one has exploited any of this stuff YET.
- Less than 12/18 months until we see GPT-4 level capabilities in OS models?
- As an Italian, I notice you're a man of culture when it comes to pizza toppings!
1) Yes, although it might not take months. It is possible that Llama-2 simply isn't powerful enough to cause real havoc, Llama-1 wasn't. If they keep doing this it will happen.
2) If I had to predict off top of head 12-18 months seems reasonable. High variance. Maybe lower end.
"A chat interface to the 70B version of Meta’s Llama 2 refuses to write in the voice of the author of the Declaration of Independence, President Thomas Jefferson."
Are they seriously going to defend the idea that history students shouldn't be allowed to interact with what a prominent founder of this country and former president might say?
I get the impression that there's an implicit understanding in LLM development that "safety" is a pointless distraction, so you get kinda goofy token-stuff like here designed only to placate tech-non-savvy outsiders to delay the point in time where someone might be willing and able to actually hold them properly accountable. I hope there's a behind-the-scenes cynical-but-pragmatic attitude that safety is something that's potentially important on the level of everything, and the kinda goofy token-stuff is smokescreen for the adults in the room building the Future Tech, but I definitely wouldn't be comfortable betting on that being the case.
Also, my banned-in-2010-after-saying-something-snarky-about-Islam Facebook account is very upset with LLaMa right now.
Isn't this exactly like open source? "I notice this is the opposite of the usual case with open source. If open source software has a security flaw that is undesirable to the user, being open source makes it easier to find, identify and fix that flaw. If the security flaw is instead desirable to the user, who will often desire this [unsafe content], they won’t agree to let you fix it."
Just, in this case, "bugs" might actually be things desirable to the user (a bad actor).
The difference is that there's usually no point in 'adding a bug' to _your own copy_ of some open source code/project, tho you're right that that's ambiguous/nebulous for some things, e.g. DRM.
Thanks for your solid commentary, as always. Few thoughts:
- It's kinda worrying they didn't do any red teaming for finetuning, only for queries. I'm basically counting down the months until someone uses this to cause real havoc. It feels like I’m in a constant “waiting for something bad to happen” mood, surprised that no one has exploited any of this stuff YET.
- Less than 12/18 months until we see GPT-4 level capabilities in OS models?
- As an Italian, I notice you're a man of culture when it comes to pizza toppings!
1) Yes, although it might not take months. It is possible that Llama-2 simply isn't powerful enough to cause real havoc, Llama-1 wasn't. If they keep doing this it will happen.
2) If I had to predict off top of head 12-18 months seems reasonable. High variance. Maybe lower end.
3) Why thank you!
Lama means why in Hebrew, I assume that's intentional, but just for people who don't know.
Yep. It kills me that it's sitting right there and everyone continuously doesn't notice.
re: "They claim that ‘false-refusals’ are exceedingly rare, that the system will almost never refuse a request it should have accepted:"
This page details that:
https://societyandai.substack.com/p/llama-2-erasing-history-and-moats
"A chat interface to the 70B version of Meta’s Llama 2 refuses to write in the voice of the author of the Declaration of Independence, President Thomas Jefferson."
Are they seriously going to defend the idea that history students shouldn't be allowed to interact with what a prominent founder of this country and former president might say?
I get the impression that there's an implicit understanding in LLM development that "safety" is a pointless distraction, so you get kinda goofy token-stuff like here designed only to placate tech-non-savvy outsiders to delay the point in time where someone might be willing and able to actually hold them properly accountable. I hope there's a behind-the-scenes cynical-but-pragmatic attitude that safety is something that's potentially important on the level of everything, and the kinda goofy token-stuff is smokescreen for the adults in the room building the Future Tech, but I definitely wouldn't be comfortable betting on that being the case.
Also, my banned-in-2010-after-saying-something-snarky-about-Islam Facebook account is very upset with LLaMa right now.
Do we think the open source will remain like, an absolute value behind the closed source options, or a proportionate value?
I don't think the answer is written yet, but I am operating under 'some factor of effective compute worse' as the default.
Isn't this exactly like open source? "I notice this is the opposite of the usual case with open source. If open source software has a security flaw that is undesirable to the user, being open source makes it easier to find, identify and fix that flaw. If the security flaw is instead desirable to the user, who will often desire this [unsafe content], they won’t agree to let you fix it."
Just, in this case, "bugs" might actually be things desirable to the user (a bad actor).
The difference is that there's usually no point in 'adding a bug' to _your own copy_ of some open source code/project, tho you're right that that's ambiguous/nebulous for some things, e.g. DRM.