CASE BACKGROUND 

Salesforce, Inc. (Salesforce) provides cloud-based services to its clients, with a particular focus on sales and e-commerce. In June 2023, it released its XGen series of large language models (LLMs): artificial intelligence software designed to emit convincingly naturalistic text outputs in response to user prompts. XGen is trained by copying an enormous quantity of textual works and then feeding these copies into the model. This input material is called the training dataset.

Once the LLM has copied and ingested the textual works in the training dataset, the LLM is able to emit convincing simulations of natural written language in response to user prompts. Whenever an LLM generates text output in response to a user prompt, it is performing a computation with the goal of imitating the protected expression ingested from the training dataset.

Salesforce allegedly pirated hundreds of thousands of copyrighted books to develop its XGen series LLMs. The training dataset for these models consists of the RedPajama and The Pile datasets that contain copies of these copyrighted books. 

Plaintiffs and class members are authors. They own registered copyrights in certain books that were included in the RedPajama and The Pile datasets that Salesforce used to develop the XGen models. Plaintiffs and the class never authorized Salesforce to download, copy, store, or use their copyrighted works. Likewise, Salesforce has never compensated plaintiffs and class members for downloading, copying, storing, or using their copyrighted works.

Salesforce benefitted commercially from its acts of massive copyright infringement, including by securing investments and contracts with customers for use of its LLMs through its Agentforce AI platform. Through the above acts, it has infringed plaintiffs’ copyrighted works, and it continues to do so by continuing to store, copy, use, and process the datasets containing copies of plaintiffs’ and the class’s copyrighted books.

Gavel-1

CASE FILED

On October 15, 2025, the Joseph Saveri Law Firm filed a class action complaint on behalf of named authors and a class who own registered copyrights in certain books that were included in the RedPajama and The Pile datasets. The suit, filed in the United States District Court for the Northern District of California, alleges defendant Salesforce downloaded, copied, stored, and used these copyright works without their permission or compensation. Plaintiffs request damages, a jury trial, and other relief.

CONTACT US 

If you wish to inform us of any unfair business practice, antitrust or competition issue, or comment on one of our cases, please use the form below. There is no cost or obligation for our review of your case. We agree to protect your name and all confidential information you submit against disclosure, publication, or unauthorized use to the full extent under the law. Please note that completion of this form does not contractually obligate our firm to represent you. We can only represent you if both you and our firm agree, in writing, that we will serve as your attorney. Please read our disclaimer.  


SHARE YOUR EXPERIENCE/RECEIVE EMAIL UPDATES