AI #44: Copyright Confrontation

Dec 28, 2023

The New York Times has thrown down the gauntlet, suing OpenAI and Microsoft for copyright infringement.

30 Comments

Dec 28, 2023

I just want to say something about NYT. Why do they refuse to reprint anything on this site? That pisses me off, as I found out after posting article after article here in a series thinking hell, my pals there will reprint, no problem. Did I sign something here other than the right to print once, and reprint as I please? This is what distresses me as a writer, not whether NYT is sucking in other's works. They do that without AI.

Expand full comment

Reply (1)

Zvi Mowshowitz

Dec 28, 2023

My understanding is that NYT wants original content to preserve its brand, it has to go there first. They then want a window (I forget how long, but it's at most a few months) before you can print it yourself wherever you want.

But I get that NYT thinks it is above previously published work, even if they shouldn't be.

Expand full comment

Reply (1)

Jigs Gaton

Dec 29, 2023Edited

Yes, that is my understanding as well, after getting a note. They do think they are above these stacks, like FAR above all of these subs. And then this platform, thinks it's above them. WTF>we live in perilous times. Oh, and on THEIR conditions: this added time factor is non-standard, and I guess another "broken" thing in the race against "time" that does not exist, except in the heads of editors and publishers. I do not agree with those conditions.

Expand full comment

Hadi

Dec 28, 2023

"high performers benefitted, low performers were harmed" this is very interesting!

Expand full comment

Miles

Dec 28, 2023

shocked the list given to Sam Altman did not include "uncensored option"

Though maybe that was captured under "control over degree of wokeness"?

But as you have mentioned before, it is preposterous that I have to associate with 4chan trolls to remove ALL the locks from these models just to be able to do some synthetic dirty talk. F*ing Puritans.

Expand full comment

Dave92f1

Dec 28, 2023

Re NYT vs OpenAI - I've never understood how using a copyrighted source to train an AI is different from using the same source to train a human. *People* output things reminiscent of the things we've seen previously - we use the things we're read and seen as templates. Every artist's work is derivative of all the other artwork they've seen (or at least admired). This is normal, and I don't see why it should be any different for AI.

Expand full comment

Reply (2)

sean pan

Dec 28, 2023

Because machines are not people, and sufficiently deriatice people are held responsible.

Expand full comment

Reply (1)

Brian Moore

Dec 28, 2023

an accurate point, but if the NYT is just asking that gpt stop regurgitating identical/sufficiently derivative content, then that's a very boring case - they get some damages, gpt says "can't do that, it's copyrighted" and we're done. If they are saying "it is not allowed that you have a copy of our copyrighted articles in your training data" then that is a different claim with more wide-ranging implications.

Expand full comment

Joe

Dec 29, 2023

Humans are not good at memorizing and recreating content on a single pass. Our brains inline-process the language down to raw ideas, filtering out irrelevant details, and work directly with the resulting representations. I can read your comment to _understand_ it - although maybe I missed a fine detail! - but the only way I can directly reproduce it is to retype it one word at a time.

I've personally observed this human characteristic many times in dealing with very specific wordings in e.g. game rulebooks. It takes a very specific mindset and practice to parse the _exact_ words that were written for the _exact_ intended meaning. Even then - when I have done this - I can't repeat the original text, I can merely cite it.

LLMs are self-evidently _not_ doing this, because otherwise it would be essentially impossible for them to ever repeat original text, except purely by chance.

Expand full comment

sean pan

Dec 28, 2023

It would be most positive indeed if Sam had children. Having value for a human world is essential for the future.

Expand full comment

D0TheMath

Dec 28, 2023

Seems an oversight not to include the ChatGPT (literal) mirror test: https://old.reddit.com/r/ChatGPT/comments/18qhcst/yooo_gpt4_just_recognized_its_own_reflection_guys/

Expand full comment

Reply (1)

Zvi Mowshowitz

Dec 28, 2023

Hadn't seen that. I'll put it into the queue.

Expand full comment

Cinna the Poet

Dec 28, 2023

I'm not super thrilled to see those predictions about who wins the suit. Seems to me that OAI is providing a device that makes plagiarism easier, but are not plagiarizing.

It's also easy to violate someone's IP rights by pressing a button on a copy machine.

Expand full comment

Brian Moore

Dec 28, 2023

re: NYT case, I think they have a really strong case in the "if you ask it to repeat a copyrighted NYT article word for word, with temp=0, and the company is charging money for that service, that is infringement" category. But for the "you shouldn't be allowed to have those articles in the training data" category, I don't see how they have a case at all. At least, I hope they don't. It seems like the solution is exactly what it would be for a human with perfect memory recall:

1. them just reading the article and then remembering it (perfectly) to riff on it, or talk to others about it, or write a summary of it, or write a new article based on the info in it - should be perfectly legal, even though they do have a perfect copy in the brain/harddrive somewhere

2. If they are asked informally/rarely to repeat the content of a NYT article, they will - and this is "technically" illegal but not really illegal so long as you don't do it a lot, or get paid to do it.

3. If they do repeat the content to tons of people in exchange for money, they should stop, and when asked to do so in the future, say "sorry, the exact regurgitation of this content is prevented by the copyright on the data, but I can do all the other stuff from #1 for you."

It seems very much like we are in "software patent litigation" land, where the judges, lawyers and commentators do not seem to understand what is being debated, perhaps on purpose.

Expand full comment

Kevin

Dec 28, 2023

If there are 2000 terrorists working together, they only need one GPU each. (This is a generically applicable objection to any compute regulation scheme I've seen.)

Expand full comment

Reply (1)

Joachim

Dec 29, 2023

Not really, for various reasons.

Some technical, like communication between GPUs is much slower if they're connected through internet instead of being in the same cluster.

Some organizational - 2000 terrorist are really hard to coordinate, much easier to notice and so on.

Expand full comment

Celarix

Dec 28, 2023

You know, from an AI x-risk perspective, NYT winning could be a great boon for the movement that pushes possible AGI back by decades. In the strongest case, we could be looking at the following:

- All existing powerful LLMs (GPT4, Anthropic, etc.) being deleted

- OpenAI and other such companies being sued into dust

- Open source LLMs being pulled from the Internet, the biggest players in the field getting sued

- Likely also image generation models to go the same way; Stability AI and Midjourney gone, companies like Microsoft and Adobe paying a pretty penny in lawsuit settlements over including image generation

- The near-inability to train any useful model because almost everything worth training on is copyrighted and rights-holders will likely not accept "we can generate unlimited derivative content of your work" for any price

In other words, we could be kicked back to about 2018 and remove the biggest path to AGI for decades to come. We'd need a breakthrough that WILDLY reduces model training set size in order for AGI to be feasible, or come up with a new strategy entirely.

Obviously, damages probably won't be this much, but it might behoove AI x-risk researchers to fight for that outcome.

Expand full comment

Reply (3)

Nikita Sokolsky

Dec 28, 2023

EAs teaming up with NYT to prevent AGI doom would be one heck on an outcome.

Expand full comment

Tim Oertel

Dec 28, 2023

It's unlikely to slow things by decades. Even in a "strongest case", the open source models might be pulled from the obvious places, but they'll still be available, and still used. We've been at or near the limits of training data already and the focus has already shifted towards more economical use of data.

This sort of environmental shift might indeed cause a massive acceleration towards building equivalently strong models that only use limited amounts of data, and thus generalize better than current models and cause AGI sooner...

Expand full comment

Reply (1)

Celarix

Dec 28, 2023

It seems like open source struggles with training models on the scale that OpenAI can, though. Even with distributed training, it'd still be a prohibitively expensive and slow endeavor for open-source types to train a GPT4-scale model, not to mention that the legal system will likely regard a distributed model training network, even if joined voluntarily by its users, as little more than a botnet.

Expand full comment

sean pan

Jan 2, 2024

Anything that slows down the death of our children is a good thing.

Expand full comment

SCPantera

Dec 28, 2023

Arcen Games (makers of AI War 1 & 2—neat procedurally generated PvE RTSs) is also working on a game where you play as a rogue AI trying to build up and then wipe out the planet, called Heart of the Machine https://arcengames.com/heart-of-the-machine/

Expand full comment

Holly

Dec 28, 2023

"Anton: AGI is more likely to come out of someone's basement (some mega-merge-hermes-4000) than a giant datacenter

Roon: i don’t think this is remotely true but it’s hard to fight open source copium because people act like you shot a dog or something."

The fact that we should support open source comes precisely from the fact that Anton is exactly wrong. AGI is going to come from a giant datacenter, and nothing less than AGI is worth worrying about compared to the benefits of AI, so therefore what people do in their basements with a computer shouldn't be regulated.

Expand full comment

Reply (1)

timunderwood9

Dec 28, 2023

The worry about computers in the basement is based on a view that the first corporate AIs would not be able to/ allowed to make a move that prevents future AIs, including basement ones, from bootstrapping into being sufficiently powerful to destroy everything including the earlier AI.

Mainly a friendly ASI only saves you from doom if the friendly ASI stops future unfriendly ASI.

Expand full comment

hnau

Dec 29, 2023

> life over death, freedom over slavery, happiness over suffering.

Life before death, strength before weakness, journey before destination.

Expand full comment

Jonathan Weil

Dec 30, 2023

>But knowledge is "created" just by inference from observations.

I’d object to Andrew Critch’s “just” here. Inference takes many forms, and what eg Einstein was doing when he reconceptualised unresolved mathematics as colours, shapes, and musical notes, went away and played the violin for a few hours, then grokked it a day later, is very different from the sort of inference that the rest of us get up to most of the time. Now i come to think of it, the scare quotes around "created" are pretty objectionable too, if we believe some of the accounts of how fundamental breakthroughs in science actually come about, given by those who made them.

Expand full comment

Kenny

Dec 31, 2023

Regarding #8 in "What I Wish Someone Had Told Me":

You're right about the "middle path". One can't aim purely at one specific result. One always has to also decide what kind of person one is willing to be, and what the whole future will be, in all of the Many Possible Worlds. There is a Code, The Way, The Good Chase of The Good.

(And we shouldn't generally expect analytically simple but optimal strategies to be discoverable, or even exist. We're bounded, we're on the clock; the game is unbalanced because the designer was (and is) Insane (relative to our values).)

(Maybe it's only ALMOST NEVER reasonable to do violate The Code, but you should probably NOT be conflicted about it being worth it. It's too easy to fool oneself about the ends justifying the means.)

Expand full comment

Tapatakt

Jan 1, 2024

> Ideal, of course, would be children.

If you decide to have children because you expect you will care about them and so will be more motivated to stop human extinction and at the same time you admit extinction is very probable, isn't it some kind of dark doublethink that makes you a worse person overall (with all kinds of nasty consequences of being a worse person some of which can be bad for AGI-risk)?

Expand full comment

Reply (1)

Thor Odinson

Jan 10, 2024

Even taking possible end of the world within a few decades at face value, that's not reason to not have kids? if the world ends suddenly 10 years from now, my child will have enjoyed being alive for those 10 years. I would obviously greatly prefer her to live for 100 years or more instead, but even an abbreviated existence is net positive unless one has reason to expect some sort of horrid torture rather than swift death (which I don't think follows from AI Doom)

Expand full comment

Don't Worry About the Vase

AI #44: Copyright Confrontation