Here’s Proof You Can Train an AI Model Without Slurping Copyrighted Content

A French-backed team of researchers has released what is believed to be the largest AI training dataset composed entirely of text in the public domain. Non-profit Fairly Trained announced that it had awarded its first certification for a large language model built without copyright infringement, showing that technology like that behind ChatGPT can be constructed differently from the AI industry’s contentious norm. Chicago-based legal tech consultancy startup 273 Ventures has created its training dataset of legal, financial, and regulatory documents called the Kelvin Legal DataPack, which includes thousands of legal documents reviewed to comply with copyright law.

Source: Wired



