CASE BACKGROUND
Adobe Inc. (Adobe) has allegedly engaged in a massive piracy scheme to develop the Nemotron and SlimLM family of large language models (LLMs). Specifically, Adobe partnered with NVIDIA to develop “next-generation NVIDIA AI foundation models,” including the Nemotron family of models trained using millions of books.
No public information exists showing that Adobe or NVIDIA licensed millions of books to train their AI models. Plaintiff alleges Adobe and NVIDIA were only able to obtain such a large number of books by sourcing them from pirated sources, such as Anna’s Archive, a shadow library that purposefully flouts copyright law and illegally distributes millions of copyrighted books.
Adobe also acquired the notorious Books3 dataset of nearly 200,000 pirated books to develop its SlimLM LLMs. This dataset is a subset of the SlimPajama dataset that Adobe also acquired. Books3 contains the entirety of copyrighted books from the Bibliotik shadow library that were obtained without the authorization or consent of the authors.
Adobe admits that it must compensate rightsholders, obtain their authorization, and respect their copyrights before including their works in training data for AI models. For example, Adobe’s Firefly AI models are trained on licensed images whose creators are paid on a yearly basis for the inclusion of those images in Firefly’s training data. In contrast, book authors have never received compensation for Adobe’s use, storage, or copying of their works, nor have these authors ever provided Adobe with authorization to use, copy, or store their works. Consequently, Adobe’s claims of “respecting” the creative community ring hollow.
Adobe allegedly benefitted commercially from its acts of massive copyright infringement, including by securing contracts with enterprise customers for use of its LLMs, and by incorporating the infringing Nemotron and SlimLM models into Adobe programs and tools, including Adobe Acrobat.
Through the above acts, Adobe has infringed plaintiff’s copyrighted works, and it continues to do so by continuing to store, copy, use, and process datasets containing copies of plaintiff’s and the proposed Class’s copyrighted books.
