Apple Faces Lawsuit Over Alleged AI Training Practices

Authors accuse Apple of using pirated books to train its OpenELM AI models, joining a growing wave of copyright lawsuits in the tech industry.

Legal Action Targets Apple’s AI Development

Apple has been named in a proposed class-action lawsuit filed in the U.S. District Court for Northern California. The complaint, brought by authors Grady Hendrix and Jennifer Roberson, alleges that the company used copyrighted books without permission to train its artificial intelligence systems. Plaintiffs claim Apple failed to credit or compensate them for their work, which they say was included in a dataset of pirated books. According to the filing, the disputed content was used to develop Apple’s OpenELM large language models.

The lawsuit adds Apple to a growing list of tech firms facing legal scrutiny over AI training practices. Similar cases have been filed against Microsoft, Meta, and OpenAI, all accused of using protected content without authorization. Hendrix and Roberson assert that their books were part of the Books3 dataset, a collection of over 196,000 pirated titles previously used by other AI developers. Apple has not publicly responded to the allegations, and no comment was available from its legal team at the time of filing.

Industry-Wide Copyright Disputes Intensify

The legal action against Apple follows a landmark settlement by AI startup Anthropic, which agreed to pay $1.5 billion to resolve similar claims. That case, involving the use of pirated books to train the Claude chatbot, was described by attorneys as the largest publicly reported copyright recovery to date. Microsoft also faces litigation over its Megatron AI model, which allegedly relied on unauthorized literary content. These cases reflect a broader debate over how generative AI systems acquire and process training data.

At the heart of the dispute is the question of fair use—whether transforming copyrighted material into statistical models constitutes legal reuse. Technology companies argue that AI training does not reproduce original works verbatim, but rather extracts patterns for generating new content. Plaintiffs counter that sourcing from pirated datasets undermines creators’ rights and bypasses licensing frameworks. Courts have yet to establish consistent rulings, leaving the legal boundaries of AI development unresolved.

Dataset Origins and Transparency Concerns

The Books3 dataset, central to the Apple lawsuit, has been linked to multiple copyright controversies. Originally compiled from online sources, it was removed in 2023 following a DMCA request by the Danish anti-piracy group Rights Alliance. Apple’s OpenELM model documentation reportedly referenced Books3, prompting concerns about the provenance of its training data. Plaintiffs argue that Apple’s use of such material reflects a broader lack of transparency in AI development.

OpenELM was released as an open-source model, and Apple has stated it does not power consumer-facing features under the Apple Intelligence brand. Nonetheless, the lawsuit seeks damages and an injunction to prevent further use of pirated content in Apple’s AI systems. Legal experts suggest that the outcome could influence future standards for dataset sourcing and model training. As AI tools become more integrated into mainstream products, calls for ethical and legal clarity are likely to grow.

Books3’s Role in AI Training

Books3 has emerged as a focal point in multiple lawsuits involving AI companies. The dataset, which included hundreds of thousands of pirated books, was widely used before its takedown in 2023. Its presence in model documentation has prompted legal challenges across the industry, including recent actions against Meta and OpenAI. The Apple case may further highlight the need for vetted, licensed datasets in AI research and development.