Copyright lawsuits against OpenAI are really about who owns the language we use

Read Editorial Disclaimer

Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

By Tech Desk | March 16, 2026 | 5 min read AI-Assisted | Source: techcrunch.com

The fight over whether ChatGPT can recite a dictionary definition is not a narrow licensing spat. It is a referendum on whether the language we all use every day—the definitions, the encyclopedic facts, the phrasing that reference publishers have curated for decades—belongs to anyone at all, or to the companies that swept it into training data without asking. When Merriam-Webster and Encyclopedia Britannica sued OpenAI in March 2026 in New York federal court, they did not only allege that nearly 100,000 of their articles had been copied to train ChatGPT. They exposed how AI builders have treated the written commons as free fuel while locking their own outputs behind terms that forbid anyone from doing the same to them.

Copyright lawsuits against OpenAI are really about who owns the language we use

According to the complaint filed in the Southern District of New York (Case 1:26-cv-02097) on 13 March 2026, OpenAI used Merriam-Webster and Britannica content to train its language models without permission or payment. The plaintiffs argue that ChatGPT produces verbatim or near-verbatim reproductions of definitions and encyclopedia entries, and that the system cannibalises traffic to their sites by answering queries that would otherwise send users to the publishers. As techcrunch.com reported, the dispute centres on almost 100,000 articles the plaintiffs say were used for training. OpenAI has responded that its models are trained on publicly available data and that their use is grounded in fair use—a defence that is under growing pressure in courts elsewhere.

The written commons were treated as free fuel

Britannica had attempted to negotiate licensing with OpenAI as early as November 2024, according to reporting on the case. Those overtures were rejected while OpenAI signed licensing deals with other publishers, creating a pattern where some rightsholders are paid and others are not. That asymmetry is at the heart of the "written commons" argument: the same language and reference material that schools, writers, and the public have relied on are now embedded inside a commercial product, with no cut for the institutions that compiled and maintained it. The complaint also includes trademark claims, accusing OpenAI of falsely attributing errors or incomplete answers to the publishers when the model hallucinates.

Fair use is no longer a safe haven for training

Legal precedent is shifting. In February 2025, a Delaware court in Thomson Reuters v. Ross Intelligence reversed an earlier ruling and held that using copyrighted material to train an AI system can constitute direct copyright infringement, rejecting the defendant's fair use defence. Ropes & Gray and other analysts have noted that this casts doubt on whether fair use will reliably shield AI companies from liability for training on copyrighted works. At the same time, the UK government was due to deliver an economic impact assessment by 18 March 2026 on proposed copyright changes that could allow AI firms to use protected work without permission unless owners opt out—a move that drew protests from thousands of authors who published a symbolic "Don't Steal This Book" in March 2026, as reported by The Guardian.

Expert commentary has sharpened. The Copyright Alliance and others have criticised some 2026 rulings that favoured AI companies for applying "woefully superficial" fair-use analysis, concluding that use is transformative simply because generative AI is new technology rather than applying the legal standard from Campbell v. Acuff-Rose. IP Watchdog reported in February 2026 that litigation is increasingly paving the way to licensing: large publishers such as News Corp have secured deals with OpenAI worth hundreds of millions of dollars, while smaller and reference publishers often lack the leverage to negotiate. The Merriam-Webster and Britannica suit fits that pattern: reference works are part of the shared linguistic and factual infrastructure, yet they were used without a licence. The Bartz v. Anthropic settlement in September 2025—roughly $1.5 billion after a court held that training on pirated books was not fair use—shows that courts are willing to attach serious financial consequences to how training data is sourced.

What This Actually Means

The Merriam-Webster and Britannica case is not just about two reference brands. It is about who gets to monetise the shared infrastructure of language and fact. If courts side with OpenAI on a broad fair-use theory, reference publishers and other small rightsholders will have little leverage; if they side with the plaintiffs, the cost and structure of AI training will change. Either way, the suit makes visible what was long implicit: the industry built on "publicly available data" has been feeding on works that were public in the sense of being readable, not in the sense of being free for commercial ingestion. The question of who owns the language we use is now squarely in front of the courts.

What is the lawsuit about?

Encyclopedia Britannica, Inc. and Merriam-Webster, Inc. sued OpenAI and related entities in the U.S. District Court for the Southern District of New York on 13 March 2026. The 44-page complaint alleges copyright infringement and trademark violations. The plaintiffs claim that OpenAI used close to 100,000 of their online articles to train ChatGPT without authorisation or payment, and that the model outputs verbatim or near-verbatim copies of their content. They also allege that ChatGPT diverts users who would otherwise visit the publishers' sites, and that OpenAI has misattributed inaccurate or incomplete outputs to them. OpenAI disputes the claims and asserts that training on publicly available data is protected by fair use.

Who are the plaintiffs?

Encyclopedia Britannica, Inc. publishes the Encyclopaedia Britannica and related reference products. Merriam-Webster, Inc. is the oldest dictionary publisher in the United States and publishes Merriam-Webster dictionaries. Both are represented by Susman Godfrey L.L.P. in the case. According to court filings, Merriam-Webster's corporate parent is Aletheia Holdings, LP; the same parent is identified for Encyclopedia Britannica, Inc. The case is docketed as 1:26-cv-02097 and has been filed as related to a larger multi-district litigation (1:25-md-03143) concerning OpenAI and copyright.

Sources

techcrunch.com, Pacer Monitor – Encyclopedia Britannica et al v. OpenAI, Ropes & Gray – AI training and copyright, The Guardian – Authors protest AI use of works, IP Watchdog – AI copyright and licensing

Related Video

Related video — Watch on YouTube

Read More News

New Zealand’s petrol pain is really a subsidy war between drivers and EV buyers

Closing the Kennedy Center is really a warning shot at Washington’s arts class

What the Kennedy Center fight reveals about who really controls U.S. culture funding

Vanity Fair’s Oscar party turns awards night into a celebrity brand marketplace

GTC 2026 will reveal how far behind the rest of Big Tech is on AI infrastructure

Nvidia is using GTC 2026 to lock AI developers into its ecosystem for a decade

Trump’s threats over Iranian oil routes signal a larger election-year energy gamble

U.S. voters will feel the Hormuz crisis at the pump long before the battlefield

Why Grace Blackwell and Rubin Multiply Revenue Capacity Across Every Token Tier

How Nvidia and Groq LP300 Plus Dynamo Unlock 35× on the Highest-Value Inference Tier

Inside Vera Rubin Ultra: Liquid-Cooled Racks for the Next Generation of AI Factories

How Token Pricing Tiers Will Reshape the AI Economy

Inside the AI Token Factory: Why Tokens Became the New Commodity of Computing

From DGX-1 to Rubin: How Nvidia Turned Data Centres into AI Factories

“This Is the Beginning of Something Very, Very Big”: Nvidia’s Jensen Huang on AI-Native Companies

From Retrieval to Generation: How ChatGPT Marked the Start of Nvidia’s Generative AI Era

From Perception to Agentic AI: How Reasoning and Coding Agents Changed the Game

The Inference Inflection Point: Why AI Computing Demand Grew a Million Times in Two Years

Healthcare Enters Its ‘ChatGPT Moment’ on Nvidia’s Accelerated Platform

Inside the Trillion-Dollar Industries Powering Nvidia’s AI Infrastructure Boom

Jensen Huang Explains Why Nvidia Is ‘Vertically Integrated but Horizontally Open’

Nvidia, Palantir and Dell Team Up on Air-Gapped AI Platforms

Nvidia CEO Jensen Huang Maps Out the AI Cloud Future in Live Keynote

Team USA’s Route to the Gold Medal Game Says More About the Field Than the Score

Jessie Buckley and the Oscars Narrative Ireland Wants to Tell

Winter Storm Wisconsin Updates: What We Know So Far

Why Iran Chose This Moment to Escalate the Strait of Hormuz Crisis

What the Oscars 2026 Winners Mean for Streaming Services and Theater Chains

The Last Time Oil Hit $100 During a Middle East Crisis, Recession Followed Within Months

Why Matchday Prep Stories Like Real Sociedad’s Rain Session Get Pushed as News

Trump’s Oil Infrastructure Threat Signals a Shift Away From Diplomatic Containment

Intuit’s Buyback Gambit Shows How AI Panic Is Warping Wall Street

Gas Prices Over $100 Per Barrel Will Force Fed to Choose Between Inflation Control and Economic Growth

Severe Weather Sunday and Monday: What We Know So Far

Why Meteorologists Keep Calling It the ‘Last’ Cold Front