Why We Built the Kelvin Legal DataPack

Many of the most popular datasets used in large language models (LLMs) today include data that was collected under circumstances that could breach websites’ terms of use or service. If organisations rely on these datasets (either directly or indirectly through their use of a LLM trained on them), legal actions, such as injunctions, could significantly impact their operations.

The goal in creating the Kelvin Legal DataPack was to amass high-quality data that was free of use restrictions. Focusing on legal domain data from high-quality legal and financial sources, in numerous languages, including English, Spanish, German, and French; much of the data was produced by attorneys and other legal professionals from the US, the UK, and the EU.

The Kelvin Legal DataPack dataset that has clean provenance and commercial licensing terms; enabling organisations to capture the power of LLMs without the exposure to risks historically associated with them.

Source: Kelvin legal data OS

Why We Built the Kelvin Legal DataPack

Comments

Leave a Reply Cancel reply