April 15, 2024

This is how the Internet ends

He New York Times is suing OpenAI for copyright infringement, arguing that the way next-generation models like ChatGPT are designed consistently violates its rights in a way that causes serious economic harm to the media.

At this point, since I write in the media for a living, I should probably be writing a riotous defense of the NYT take action, take on big nasty tech, and protect our information ecosystem. The problem is that the lawsuit is largely nonsense, obviously self-serving and based on a poor understanding of how these models work. It could risk hindering all the potential benefits of the technology, all to prop up profits (which at the same time NYT They’re pretty good as is.)

There are legitimate fights between media and technology over AI. AIs’ desire to give you an answer directly, rather than linking you to a site, breaks the pact between search engines and sites: the former would delete your content, yes, but they would send users to you (who would then see advertisements). and maybe even register).

So there is reason for media companies to want compensation if Big Tech monetizes their streaming news gathering. But NYTThe real lawsuit is a much bigger power grab than that: when a human journalist reads a few articles from a rival publication and then puts together their own version of an article on that topic, it’s completely legal as long as they include it in their own version. words.

Generally, that’s what at least an AI tries to do too. It may be “trained” on a huge data set, but contrary to how most of us think it works, it does not retain this data or look for it when asked a question. Instead, modern generative AIs are best thought of as “spicy autocomplete,” an improved version of email and messaging app suggestions on how to complete sentences.

The AI ​​creates weights for the best or most likely combinations of words to answer a series of words that generate a query, and these are the ones that are searched. Trying to keep you from mixing things up that don’t fit is how AI developers try to minimize “hallucinations,” circumstances in which AIs give a convincing but completely false answer.

To ensure that AIs never generate particular sentences you would need some database to query, which does not exist, as there is no central registry of copyrighted information. Such a database would never be possible: for example, if you have ever written some notes to yourself to refresh your memory, or briefly kept a diary, that is protected by copyright.

He NYTThe main examples in this case seem to come from well-known articles that have already been replicated on the web: a sentence from Snow Fall, a multimedia story about an avalanche that he published to great success in 2012, ended up being highly criticized. ChatGPT weights because it has already been widely copied on the Internet.

These lawsuits do not help the media, because technology has a habit of thinking that coverage is self-serving: outsiders underestimate the independence of reporters in the newsroom of their parent companies. But by focusing too much on our own battles, we weakened our coverage of technology: Coverage of social media platforms was hampered by relentless (and often silly) battles between newsroom parent companies and those same platforms.

So as long as it is New York Times In the case that has consumed most of the topping’s oxygen, there is a much more important one that has been ignored, and it starts with online chocolate candy recipes.

The problem, discovered by writer Zoah Hedges-Stocks, is that much of the recipe content on the internet is now written by low-quality robots churning out whatever regurgitated garbage. What I had noticed was that along with posts comparing the process of making fudge to making caramel, there were also posts comparing the process of making fudge to making “scrimgeour,” another apparent Scottish delicacy that I had never heard of. talk.

What happened was that AI had confused the use of the word “fudge” in different contexts: “Cornelius Fudge” is the name of the minister of magic in JK Rowling’s novel. Harry Potter series, until he is replaced by the more evil “Rufus Scrimgeour”. There is no sweet product called “scrimgeour,” but shoddy AIs collapsed the contextual gap and generated nonsense, which is now replicated on numerous sites (and also referenced here). Other examples abound: The Hunger Games and the video game Baldur’s Gate 3both called “Gale.”

By far the most entertaining to date came from Donald Trump’s former lawyer Michael Cohen, who had to apologize in a US court after citing non-existent cases that an AI assistant had found for him (and not verified at all). It’s generally rude to do that in court.


Current AI generations are already “trained” and were trained on pre-AI Internet content. But keeping future generations of AI products up to date (and updating their features) will depend on taking advantage of the Internet as it exists now.

The problem is that an increasing proportion of the internet is polluted by low-quality AI content and, in turn, even more confused by articles like this one (of which I predict there will be many more in the coming years) that attempt to explain clutter, but strengthening the associations between the misleading words as we do.

This has the potential to create a very dangerous cycle, where increasingly degraded inputs mean that AI outputs degrade even as the technology gets smarter, leading to a spiral of increasingly worse content in The Internet (a process called “enshittification” by Cory Doctorow) and, finally, the joy and creativity of the Internet reduced to algorithmic gray matter: information reduced to a mulch of wasted words.

Today is sweet and Harry Potter. It will affect news content in the very near future, if it hasn’t already.

No one is sure how significant the risk of gray matter is, and there are no sure-fire plans to prevent it. Media companies must ensure that we look beyond our own backyards; If we don’t, we could miss the disaster that ends us all.

Leave a Reply

Your email address will not be published. Required fields are marked *