Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

This article describes a mechanism that leverages large language model (LLM) rationales for small models within a multi-task training framework while using less training data than traditional methods such as fine tuning or distillation. 

Using substantially smaller model sizes than LLMs the authors show that they can reduce the model size and the data required to outperform LLMs, achieving better performance than a 540B Pathways Language Model (PaLM) using only 80% of available data on a benchmark task with their 770M T5 model.

Source: Cornell University



, ,



Leave a Reply

Your email address will not be published. Required fields are marked *