10 Comments

Thanks for your solid commentary, as always. Few thoughts:

- It's kinda worrying they didn't do any red teaming for finetuning, only for queries. I'm basically counting down the months until someone uses this to cause real havoc. It feels like I’m in a constant “waiting for something bad to happen” mood, surprised that no one has exploited any of this stuff YET.

- Less than 12/18 months until we see GPT-4 level capabilities in OS models?

- As an Italian, I notice you're a man of culture when it comes to pizza toppings!

Expand full comment
author

1) Yes, although it might not take months. It is possible that Llama-2 simply isn't powerful enough to cause real havoc, Llama-1 wasn't. If they keep doing this it will happen.

2) If I had to predict off top of head 12-18 months seems reasonable. High variance. Maybe lower end.

3) Why thank you!

Expand full comment

Lama means why in Hebrew, I assume that's intentional, but just for people who don't know.

Expand full comment
author

Yep. It kills me that it's sitting right there and everyone continuously doesn't notice.

Expand full comment

re: "They claim that ‘false-refusals’ are exceedingly rare, that the system will almost never refuse a request it should have accepted:"

This page details that:

https://societyandai.substack.com/p/llama-2-erasing-history-and-moats

"A chat interface to the 70B version of Meta’s Llama 2 refuses to write in the voice of the author of the Declaration of Independence, President Thomas Jefferson."

Are they seriously going to defend the idea that history students shouldn't be allowed to interact with what a prominent founder of this country and former president might say?

Expand full comment

I get the impression that there's an implicit understanding in LLM development that "safety" is a pointless distraction, so you get kinda goofy token-stuff like here designed only to placate tech-non-savvy outsiders to delay the point in time where someone might be willing and able to actually hold them properly accountable. I hope there's a behind-the-scenes cynical-but-pragmatic attitude that safety is something that's potentially important on the level of everything, and the kinda goofy token-stuff is smokescreen for the adults in the room building the Future Tech, but I definitely wouldn't be comfortable betting on that being the case.

Also, my banned-in-2010-after-saying-something-snarky-about-Islam Facebook account is very upset with LLaMa right now.

Expand full comment

Do we think the open source will remain like, an absolute value behind the closed source options, or a proportionate value?

Expand full comment
author

I don't think the answer is written yet, but I am operating under 'some factor of effective compute worse' as the default.

Expand full comment

Isn't this exactly like open source? "I notice this is the opposite of the usual case with open source. If open source software has a security flaw that is undesirable to the user, being open source makes it easier to find, identify and fix that flaw. If the security flaw is instead desirable to the user, who will often desire this [unsafe content], they won’t agree to let you fix it."

Just, in this case, "bugs" might actually be things desirable to the user (a bad actor).

Expand full comment

The difference is that there's usually no point in 'adding a bug' to _your own copy_ of some open source code/project, tho you're right that that's ambiguous/nebulous for some things, e.g. DRM.

Expand full comment