Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

Meta has faced setbacks in its legal battle against authors suing for copyright infringement related to its AI training practices. A court recently found information revealing that Meta allegedly used the shadow library Library Genesis (LibGen) to train its generative AI language models. The case, Kadrey et al. v. Meta Platforms, could set a precedent for how tech companies use creative works for AI training.

Judge Vince Chhabria for the United States District Court for the Northern District of California ordered both Meta and the plaintiffs on

directed both Meta and the plaintiffs on Wednesday to submit complete versions of several documents after criticising Meta’s method of redacting them as “absurd,” noting that, generally speaking, “there is not a single thing in those briefs that should be kept confidential.” 

Chhabria determined that Meta was not seeking to redact the documents to safeguard its business interests but rather to “prevent unfavourable publicity.” Internal communications revealed Meta employees’ awareness of using pirated data and escalated discussions about it to CEO Mark Zuckerberg.

The plaintiffs, including novelists and comedian Sarah Silverman, argue that Meta used their copyrighted works without permission. Meta claims its actions fall under the “fair use” doctrine and denies any wrongdoing, stating that the plaintiffs were aware of using LibGen data before the end of the discovery. This case’s outcome and similar lawsuits will significantly impact the legal landscape for AI training practices.

source: Wired

Leave a Reply

Your email address will not be published. Required fields are marked *