Discussion about this post

User's avatar
Dave92f1's avatar

Re NYT vs OpenAI - I've never understood how using a copyrighted source to train an AI is different from using the same source to train a human. *People* output things reminiscent of the things we've seen previously - we use the things we're read and seen as templates. Every artist's work is derivative of all the other artwork they've seen (or at least admired). This is normal, and I don't see why it should be any different for AI.

Expand full comment
Brian Moore's avatar

re: NYT case, I think they have a really strong case in the "if you ask it to repeat a copyrighted NYT article word for word, with temp=0, and the company is charging money for that service, that is infringement" category. But for the "you shouldn't be allowed to have those articles in the training data" category, I don't see how they have a case at all. At least, I hope they don't. It seems like the solution is exactly what it would be for a human with perfect memory recall:

1. them just reading the article and then remembering it (perfectly) to riff on it, or talk to others about it, or write a summary of it, or write a new article based on the info in it - should be perfectly legal, even though they do have a perfect copy in the brain/harddrive somewhere

2. If they are asked informally/rarely to repeat the content of a NYT article, they will - and this is "technically" illegal but not really illegal so long as you don't do it a lot, or get paid to do it.

3. If they do repeat the content to tons of people in exchange for money, they should stop, and when asked to do so in the future, say "sorry, the exact regurgitation of this content is prevented by the copyright on the data, but I can do all the other stuff from #1 for you."

It seems very much like we are in "software patent litigation" land, where the judges, lawyers and commentators do not seem to understand what is being debated, perhaps on purpose.

Expand full comment
28 more comments...

No posts