Meta CEO Mark Zuckerberg personally approved using a dataset of pirated e-books to train the company's Llama AI models, according to new court documents filed in California.
The allegations emerged in an unredacted filing from the ongoing Kadrey v. Meta lawsuit, where authors Sarah Silverman and Ta-Nehisi Coates accuse Meta of copyright infringement. The documents reveal that Zuckerberg gave explicit permission to use LibGen, a controversial database containing unauthorized copies of books and academic papers.
Internal communications cited in the filing show Meta employees referred to LibGen as a "data set we know to be pirated." Staff members expressed concerns that using the dataset could damage Meta's position with regulators. However, after "escalation to MZ," the AI team received approval to proceed.
The filing makes additional claims about Meta's attempts to obscure its use of copyrighted materials. A Meta engineer allegedly wrote code to remove copyright information and acknowledgments from the e-books. The company also reportedly used torrenting to download LibGen's contents, despite internal concerns about the legality of this approach.
Meta's head of generative AI, Ahmad Al-Dahle, dismissed these concerns and "cleared the path" for torrenting the database, according to the court documents. The plaintiffs argue that Meta's decision to bypass legitimate acquisition methods and participate in illegal file sharing networks demonstrates willful copyright infringement.
Judge Vince Chhabria, who is presiding over the case, rejected Meta's request to keep portions of the filing sealed. In his order, he stated that Meta's redaction attempts appeared aimed at avoiding negative publicity rather than protecting sensitive business information.
The lawsuit currently focuses on Meta's earlier Llama models, not recent releases. While Meta maintains its actions are protected under fair use doctrine, the case remains ongoing with no final decision reached.