Discarded.AI

Discarded.AI

They Want the Inputs Free. They Want the Outputs Free.

How the AI industry built its foundations on your work, fought to keep it, and then let a Supreme Court ruling confirm they own nothing they produce.

Alan Robertson's avatar
Alan Robertson
Mar 03, 2026
∙ Paid

This week the Supreme Court declined to hear a case about whether AI-generated art can be copyrighted. The ruling was narrow. One computer scientist, one image, one rejected appeal.

But the story it sits inside is not narrow at all.

To understand what just happened, you have to go back to where this industry actually started. Not the press releases. Not the TED talks about transforming humanity. The actual starting point.

A pirate website.


The Original Sin

In 2021, Anthropic co-founder Ben Mann downloaded 196,640 books from a dataset called Books3. He knew the books had been assembled from unauthorised copies. He downloaded them anyway.

That was just the start.

Over the following year, court documents show Mann downloaded at least five million more books from Library Genesis, known as LibGen, a shadow library of pirated works that has operated for years out of servers in Russia. Then Anthropic downloaded at least two million more from the Pirate Library Mirror.

Seven million books in total. Downloaded from piracy sites. Knowingly.

Why? Because Anthropic CEO Dario Amodei had characterised the alternative, pursuing proper licensing agreements, as a “legal/practice/business slog.” It was faster and cheaper to steal.

When this came to court, the judge, William Alsup of the Northern District of California, was not gentle. He found that Anthropic’s acquisition of those books was “inherently, irredeemably infringing.” The company had not just trained on pirated material. It had built a permanent central library of all those books and kept it.

Anthropic settled for $1.5 billion. Approximately $3,000 per work. The largest copyright settlement in US history.


They Were Not Alone

Anthropic got caught first. But the court documents made something else clear. This was industry practice.

Meta employees also used LibGen. Internal communications showed staff acknowledged it presented a “medium-high legal risk.” They used it anyway. One internal message explained the reasoning bluntly: “LibGen is essential to meet SOTA numbers.” State of the art. Everyone else was doing it. They could not afford not to.

OpenAI used the same datasets. A former OpenAI employee admitted during the Anthropic case that he had downloaded LibGen data while at OpenAI, assuming it was fair use.

Getty Images sued Stability AI for scraping over twelve million photographs, including their associated metadata and captions, to train its image generation models. The lawsuit included a detail that became famous in creative communities: Stability AI’s outputs were reproducing Getty’s watermarks. The system had ingested so many Getty images that it had learned the watermark as part of the visual language of professional photography.

The New York Times sued OpenAI and Microsoft for using millions of its articles without permission. The Times alleged the companies were building a direct market substitute for its journalism, pulling readers away from paywalled content using material those readers had effectively already paid for.

Music publishers sued Anthropic separately for scraping song lyrics across the web on a massive scale to train Claude, without licensing or consent. Universal Music, Warner Music and Suno were all drawn into litigation over AI music generation trained on copyrighted recordings.

Apple was sued for using Books3, the same pirated dataset, to train its Apple Intelligence models.

By the end of 2025, there were over seventy copyright infringement lawsuits against AI companies in the United States alone.

The defence in almost every case was identical. Fair use.


The Fair Use Argument

Fair use is a doctrine in US copyright law that allows limited use of copyrighted material without permission, typically for purposes like commentary, criticism, education, or transformation of the original work.

User's avatar

Continue reading this post for free, courtesy of Alan Robertson.

Or purchase a paid subscription.
© 2026 Alan Robertson · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture