OpenAI is in trouble again; Encyclopædia Britannica is suing ChatGPT: even searching for information constitutes copyright infringement.

03-23

This article is machine translated

Show original

On March 16, Encyclopædia Britannica, in conjunction with Merriam-Webster, sued OpenAI, listing four major accusations against ChatGPT. Don't think Britannica is heartless; ChatGPT previously fabricated misinformation but falsely attributed the source to Britannica. OpenAI has not responded.

OpenAI is being sued again.

On March 16, Encyclopædia Britannica, together with its subsidiary Merriam-Webster, filed a lawsuit against OpenAI.

The reason is that it believes ChatGPT has committed large-scale copyright infringement.

Britannica was better prepared than all the plaintiffs: it focused on training data capture, model memory output, and real-time RAG retrieval, while also adding trademark infringement under the Lannham Act.

This is the first time in the history of AI copyright litigation that someone has attempted to dismantle the entire generation chain.

GPT-4 can recite the entire Encyclopedia Britannica word by word.

According to TechCrunch, Britannica directly named GPT-4, arguing that it has memorized a large amount of its copyrighted content and can output near-verbal copies on demand.

It's not similar, not close, it's a copy of each character.

There is a certain technical basis for this. Research teams from Stanford and Yale have conducted experiments to extract the original text of "Harry Potter" from mainstream large models, with the highest extraction rate reaching 96%.

In other words, a significant portion of the content in the training data is stored in the model weights, which can be almost perfectly reproduced under specific prompts.

Britannica holds a substantial amount of copyrighted content. Its nearly 100,000 online articles, encyclopedia entries, and dictionary definitions cover almost all major fields of knowledge, from science and history to literature.

This content was compiled over decades by professional editors and subject matter experts. Before the rise of Wikipedia, this system was the standard index of human knowledge.

OpenAI has been operating in a gray area.

Looking up information is considered an infringement?

Previously, there had been a debate about whether using my content to train a model constituted copyright infringement.

Britannica's accusations go a step further this time. His accusations are divided into three parts:

The first layer involved scraping nearly 100,000 articles without permission for use in training a large model.

Secondly , ChatGPT outputs a complete or partial verbatim copy of Britannica's content when generating answers, which constitutes direct infringement.

The third layer , and also the most controversial, is where OpenAI used Britannica's paper in its ChatGPT RAG workflow.

RAG is the mechanism by which ChatGPT scans external databases to obtain the latest information.

Britannica argues that even if its content is not included in the training set, it is considered an infringement as long as it appears in real-time retrieval.

This idea is unprecedented; it means that whether it's static training or dynamic retrieval, anyone who uses copyrighted content without authorization will be held responsible.

Even more interesting is the fourth charge: trademark infringement related to the Lanham Act.

Britannica believes that ChatGPT sometimes creates hallucinations and then attributes these erroneous contents to Britannica, creating the illusion that Britannica generated the misinformation.

This is not just an infringement; Britannica's brand reputation has to pay the price for OpenAI's mistake.

This jeopardizes the public's ability to continuously access high-quality, trustworthy online information.

The same issue: Germany claims infringement, Britain denies it.

This is the core of the entire lawsuit and one of the most hotly debated issues in the global legal community.

In the case of GEMA v. OpenAI, the Munich court in Germany ruled that lyrics were indeed embedded in the model weights of GPT-4 and GPT-4o, which constituted copyrighted copying, and therefore warranted an injunction and damages.

Model weights are numerical parameters learned by AI during training, which determine the model's output. In the Munich court's view, the ability to reconstruct the work from these parameters is sufficient to constitute infringement.

The UK High Court reached a completely opposite conclusion in the case of Getty Images v. Stability AI.

AI models are not infringing copies because their weights neither contain nor copy the copyrighted work itself; they only store the learned patterns.

In the United States, Anthropic persuaded Federal Judge William Alsup in a copyright lawsuit to determine that using the content as training data was sufficiently transformative to qualify for fair use.

However, Alsup also determined that Anthropic illegally downloaded millions of books instead of paying for them, which constituted a violation of the law and ultimately led to a $150 million class-action settlement.

Britannica's case was filed in New York and is governed by U.S. federal law.

However, there are currently no established precedents that clearly state whether training an LLM with copyrighted content constitutes infringement. The outcome of each case still largely depends on the specific judge's reasoning.

If the court recognizes that real-time retrieval also constitutes infringement, the impact on the entire AI industry will far exceed any training data dispute.

The lawsuit against Perplexity is a warm-up for OpenAI.

This is not the first time Britannica has made a move.

Back in September 2025, Britannica filed a similar copyright and trademark infringement lawsuit against Perplexity, which is still pending.

Perplexity is an AI search company whose core product logic revolves around RAG.

Britannica's decision to first target Perplexity is like a legal rehearsal, figuring out the feasibility of RAG's infringement, before applying the same logic to OpenAI.

At the same time, the copyright battle within the industry is heating up across the board.

The New York Times, Ziff Davis, and more than ten newspapers in the United States and Canada have filed lawsuits against OpenAI.

The Intercept and US News & World Report have also joined the ranks of plaintiffs.

According to ChatGPT Is Eating The World, a website that specializes in tracking AI copyright lawsuits, this is the 63rd copyright lawsuit against OpenAI.

OpenAI did not respond to TechCrunch's request for comment.

It was crippled by Wikipedia and then intercepted by ChatGPT.

Looking at it from a different perspective, there are some things that deserve more attention than the amount of compensation.

Founded in 1768, Britannica is the longest-running encyclopedia brand in the English-speaking world, representing a symbol of centuries-old traditions of organizing human knowledge.

When such an organization appears as the plaintiff in an AI copyright lawsuit, the signal is clear: the concept of knowledge authority is attempting to redefine its boundaries within the AI ecosystem through legal means.

Britannica was once the absolute authority in the era of print encyclopedias, but it has been overshadowed by Wikipedia to the point of almost disappearing from the public eye.

Later, it transformed into a digital subscription platform and regained its footing by relying on the credibility and professionalism of its content.

Now, the emergence of ChatGPT has once again put it under the threat of being replaced—but not by a better encyclopedia, but by a model trained on its content but not paid a penny.

The complaint contains the following sentence:

ChatGPT steals traffic from publishers by generating replies that replace publisher content.

This is a direct clash of business models. Whether RAG's accusation is valid remains to be seen.

However, if the courts ever accept this logic, the entire industry's real-time search pipelines will need to renegotiate their authorization.

All companies whose core products are online search and AI-generated content face this problem.

A 250-year-old encyclopedia is attempting to draw a line on the boundaries of AI with a lawsuit.

Where will this line ultimately be drawn? We'll probably have the answer in 2026.

References

https://www.reuters.com/legal/litigation/encyclopedia-britannica-sues-openai-over-ai-training-2026-03-16/

https://techcrunch.com/2026/03/16/merriam-webster-openai-encyclopedia-brittanica-lawsuit/

https://the-decoder.com/encyclopedia-britannica-sues-openai-for-training-on-nearly-100000-articles-without-permission/

https://gizmodo.com/encyclopedia-britannica-sues-openai-over-ai-training-data-2000607770

https://news.bloomberglaw.com/ip-law/britannica-merriam-webster-accuse-openai-of-copying-their-works

https://chatgptiseatingtheworld.com/wp-content/uploads/2026/03/Encyclopedia_Britannica_Inc-v-OpenAI-COMPLAINT-Mar-13-2026.pdf

https://www.aol.com/articles/encyclopedia-britannica-sues-openai-over-141324436.html

This article is from the WeChat official account "New Intelligence" , author: Qingqing, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content