April 20, 2024

New York Times Copyright Lawsuit Could Take Down OpenAI

If you’re old enough to remember watching the hit children’s show animaniacsYou probably also remember Napster. The peer-to-peer file-sharing site, which made it easy to download free music in an era before Spotify and Apple Music, took college campuses by storm in the late 1990s. This did not go unnoticed by record companies, and in 2001, a federal court ruled that Napster was liable for copyright infringement. Content producers fought against the technological platform and won.

But that was 2001, before the iPhone, before YouTube, and before generative AI. This generation’s great copyright battle pits journalists against artificially intelligent software that has learned from their reporting and can regurgitate it.

Late last year, the New York Times sued OpenAI and Microsoft, alleging that the companies are stealing its copyrighted content to train their large language models and then profit from it. In a point-by-point rebuttal of the lawsuit’s allegations, OpenAI claimed it had committed no wrongdoing. Meanwhile, the Senate Judiciary Subcommittee on Privacy, Technology and the Law held a hearing in which news executives implored lawmakers to force AI companies to pay publishers for using their content.

Depending on who you ask, what’s at stake is the future of the news business, the future of copyright law, the future of innovation, or, specifically, the future of OpenAI and other generative AI companies. . Or all of the above.

Ideally, Congress would intervene to resolve the debate, but as James Grimmelmann, a professor of digital and information law at Cornell Law School, told me: “Congress doesn’t like to legislate copyright unless there is a consensus.” of the majority of the actors in the debate.” the room, and there is nothing resembling that consensus right now. So Congress can hold hearings and talk about it, but we are really far from any legislative action.”

Then what is? Advocates of technological innovation would say that AI technology is full of promise and we had better not stifle it while it is in the early days of development. Media companies would say that even the coolest tech companies have to pay when they use copyrighted content, and if we give AI a free hand, journalism as we know it could eventually cease to exist.

The consensus of both casual observers and legal experts is that this New York Times lawsuit is a big deal. Not only does the Times seem to have a strong case, but OpenAI has a lot to lose, perhaps its very existence.

The case against OpenAI, briefly explained

If you ask ChatGPT a question about, say, the fall of the Berlin Wall, there’s a good chance that some of the information in the answer was taken from New York Times articles. That’s because the large language model, or LLM, that powers ChatGPT has been trained with more than 500 gigabytes of data, including newspaper files. Generative AI tools only work because this training data helps them know how to respond effectively to prompts. In other words, copyrighted data is, in part, what makes this new technology powerful and what makes OpenAI such a valuable company.

The New York Times claims that OpenAI trained its model on copyrighted Times content and failed to pay appropriate licensing fees. That, the lawsuit says, allows OpenAI to “closely compete with and imitate” the New York Times, perhaps by summarizing a news story based on Times reporting or summarizing a product recommendation based on Wirecutter reviews.

Even worse is what the lawsuit calls “regurgitation,” which is when OpenAI spits out text that matches the Times articles verbatim. The Times provides 100 examples of such “regurgitation” in the lawsuit. In its rebuttal, OpenAI said regurgitation is a “rare bug” that the company is “working to reduce to zero.” It also claims that the Times “intentionally manipulated cues” to make this happen and “cherry-picked its examples from many attempts.”

But at the end of the day, the New York Times maintains that OpenAI is making money from the content and costing the newspaper “billions of dollars in legal and actual damages.” By one estimate, given the millions of articles potentially involved and the cost per copy, the New York Times could be seeking $450 billion in damages.

OpenAI has a clear solution to this conflict: pay copyright owners upfront. The company has already announced licensing deals with the likes of the Associated Press and Axel Springer. OpenAI also claims that it was negotiating a settlement with the New York Times just before the newspaper filed its lawsuit.

It is unclear how much OpenAI is willing to pay media outlets. A Jan. 4 report in Information said that OpenAI has offered some media companies “as little as $1 million to $5 million to license its items for use in training their large language models,” which seems like a small amount of money for OpenAI. It is currently targeting a valuation of up to $100 billion. But the mounting lawsuits, if they go against the company, could be much more costly than paying higher licensing fees.

The New York Times is also not the only party suing OpenAI and other tech companies for copyright infringement. A growing list of authors and artists have filed lawsuits since ChatGPT made its splashy debut in fall 2022, accusing these companies of copying their works to train their models. The copyright holders filing these lawsuits also go far beyond writers. Developers have sued OpenAI and Microsoft for allegedly stealing software code, while Getty Images is embroiled in a lawsuit against Stability AI, the creators of the Stable Diffusion imaging model, over its copyrighted photographs.

“When you’re talking about copyright and getting legal damages,” said Corynne McSherry, legal director at the Electronic Frontier Foundation, “if you lose it, the downside and financial risk are enormous.”

The case of innovation

While it’s easy to compare the Times case to Napster, the best precedent is VCR, according to McSherry.

In 1984, a years-long copyright case between Sony and Universal Studios over the practice of using VCRs to record television programs reached the United States Supreme Court. The studio alleged that Sony’s Betamax video tapes could be used for copyright infringement, while Sony’s lawyers argued that recording shows was fair use, which is the doctrine that allows copyrighted material to be reused without permission. nor payment.

Sony won. The judge’s decision, which has never been overturned, said that if machines, including the VCR, have non-infringing uses, then the company that makes them cannot be held liable if customers use them to infringe copyrights.

The entertainment industry changed forever with this case. The VCR allowed people to watch whatever was broadcast on television whenever they wanted, and in just a few years, Hollywood studios ended up seeing their profits soar in the VCR era. The machine made people more excited about watching movies and they watched them more, both at home and in theaters.

“If you have to go to copyright owners to get permission for technological innovation, you’re going to get a lot less innovation,” McSherry told Vox.

With this in mind, there is another copyright lawsuit worth looking into: the Google Books case. In 2004, Google began scanning books, including copyrighted works, so that “snippets” of their text would appear in search results. It partnered with libraries at places like Harvard, Stanford, and the University of Michigan, as well as magazines like New York. Magazine and Mecánica Popular, which wanted to digitize its files.

Then came the lawsuits, including a 2005 class-action lawsuit from the Authors Guild. The authors claimed copyright infringement, and Google claimed that making the books searchable amounted to fair use. As Judge Denny Chin said in a 2013 decision dismissing the authors’ lawsuit, Google Books is transformative because, thanks to the tool, “words in books are used in a way they have never been used before.” It took about a decade, but Google finally won and Google Books is now legal.

Like Sony and Napster before it, the Google Books case is ultimately about the battle between new technology platforms and copyright holders. It also raises the question of innovation. Is it possible that giving too much power to copyright holders could stifle technological progress?

In that 2013 decision, Judge Chin said his technology “promotes the progress of the arts and sciences, while maintaining respectful consideration of the rights of authors and other creative persons, and without adversely affecting the rights of copyright holders. And a 2023 economic study on the effects of Google Books found that “digitization significantly increases demand for physical versions” and “allows independent publishers to introduce new editions of existing books, further increasing sales.” So consider it another point in favor of giving technology platforms room to innovate.

Few would disagree that technological progress has shaped the media business since the invention of the printing press. That’s basically why the first copyright laws were written more than 300 years ago: technology made copying easier, and authors needed some way to protect their intellectual property.

But AI represents a greater technological advance than the VCR, Napster and Google Books combined. We don’t know it yet, but AI looks set to transform our understanding of copyright and how content creators are paid for their work. It will also take a while. A ruling in the New York Times v. OpenAI case will take years, and even then, questions will remain.

“I think generative AI could be as transformative for copyright as the printing press,” said Grimmelmann, the Cornell law professor. “But that will probably take a little longer.”

A version of this story was also published in the Vox Technology newsletter. sign up here so you don’t miss the next one!

Leave a Reply

Your email address will not be published. Required fields are marked *