Britannica Takes OpenAI to Court Over AI Training Data

Encyclopaedia Britannica

Encyclopaedia Britannica

  • Britannica has filed a lawsuit accusing OpenAI of using its reference works without permission.
  • The company claims its encyclopedia and dictionary entries were copied to train ChatGPT.
  • OpenAI maintains that its models rely on publicly available data and fall under fair‑use principles.

A New Legal Challenge for AI Training Practices

Encyclopedia Britannica and its Merriam‑Webster subsidiary have initiated legal action against OpenAI in a Manhattan federal court, alleging that the company used their online reference materials without authorization to train its artificial intelligence models. The complaint argues that OpenAI incorporated nearly 100,000 Britannica articles and dictionary entries into its training datasets. Britannica says this practice allowed ChatGPT to generate responses that closely mirror its original content. The company contends that such outputs reduce traffic to its own platforms by offering AI‑generated summaries that substitute for direct visits.

OpenAI responded by stating that its models are trained on publicly accessible information and that the use of such data is grounded in fair‑use principles. The organization did not provide detailed comments beyond this position. Representatives for Britannica did not immediately issue further statements regarding the lawsuit. The dispute adds to a growing list of legal challenges targeting how AI developers source and use copyrighted material.

Claims of Copyright and Trademark Infringement

According to the lawsuit, Britannica argues that OpenAI’s models reproduce “near‑verbatim” versions of its encyclopedia entries and dictionary definitions. The company asserts that this level of similarity goes beyond acceptable use and amounts to unlawful copying. Britannica also claims that ChatGPT sometimes cites Britannica as a source in situations where no such citation is warranted, a phenomenon often described as an AI “hallucination.” These instances, the company says, create the false impression that OpenAI has permission to use or reproduce its content.

In addition to copyright concerns, Britannica accuses OpenAI of trademark infringement. The complaint states that the AI system’s references to Britannica materials could mislead users into believing that the two organizations have a formal partnership. Britannica argues that this misrepresentation harms its brand and undermines the value of its intellectual property. The company is seeking monetary damages, though the filing does not specify an amount. It is also requesting a court order to prevent further use of its materials in AI training.

Part of a Broader Legal Landscape

This case is one of several high‑profile lawsuits brought by publishers, authors, and media organizations against AI developers. Many of these plaintiffs argue that their work has been used to train AI systems without consent or compensation. The outcomes of these cases could shape how AI companies gather data and what obligations they have toward content creators. Britannica itself filed a similar lawsuit last year against Perplexity AI, another company developing generative AI tools. That case remains ongoing and raises comparable questions about data sourcing and intellectual property rights.

AI companies, including OpenAI, have consistently argued that training on publicly available content constitutes fair use because the models transform the material into something new. Courts have not yet established clear precedents for how these principles apply to large‑scale AI training. As a result, each new lawsuit contributes to an evolving legal framework that will likely influence future AI development. The Britannica case may become another key reference point as the industry and legal system work to define acceptable practices.

What Comes Next

The lawsuit will likely prompt further scrutiny of how AI companies collect and process training data. It may also encourage other rights holders to examine whether their content has been used in similar ways. If the court rules in Britannica’s favor, AI developers could face stricter requirements for licensing or sourcing data. Conversely, a ruling supporting OpenAI’s fair‑use argument could reinforce the current industry approach. Either outcome will have implications for both AI innovation and the protection of digital intellectual property.

Britannica, founded in 1768, is one of the oldest continuously published reference works in the world. Its transition from print to digital formats over the past two decades has made the protection of its online content increasingly important to its business model. The company’s involvement in multiple AI‑related lawsuits highlights how traditional reference publishers are navigating the challenges posed by modern generative technologies.


 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.