Notes on AlpaGasus: Training A Better Alpaca with Fewer Data

Link to paper: https://arxiv.org/abs/2307.08701

Paper published on: 2023-07-17

Paper's authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin

GPT3 API Cost: $0.04

GPT4 API Cost: $0.10

Total Cost To Write This: $0.14

Time Savings: 18:1

The ELI5 TLDR:

ALPAGASUS is a new approach to training artificial intelligence models. It focuses on using high-quality data to improve the model's performance. The approach filters out low-quality data and only uses the best examples to train the model. This results in a model that can provide better responses when deployed. ALPAGASUS outperforms the original model on multiple test sets and also trains faster. The paper also emphasizes the importance of data quality and introduces a data selection strategy based on ratings from a strong language model. This strategy ensures that the model is trained on high-quality data, leading to improved accuracy and efficiency. The approach also offers cost-saving benefits as the model size scales up. The paper includes a detailed analysis and plans to study the difference between human feedback and language model ratings in the future. Overall, ALPAGASUS presents a new and effective approach to training language models.

The Deeper Dive:

Introduction and Summary of ALPAGASUS

In the ever-evolving field of artificial intelligence, the quality of data used for training models is paramount. The research paper presents a novel approach, ALPAGASUS, for instruction fine-tuning (IFT) of large language models (LLMs). This approach introduces a data selection strategy that filters out low-quality data from the training set. The strategy uses a strong LLM, ChatGPT, as an auto-grader to rate each triplet of (instruction, input, response) and selects only the high-scoring triplets.

To help you understand, imagine you're training a model to answer customer queries. Instead of using all available data, you filter out the data where the model's response is irrelevant or incorrect. This way, the model learns from the best examples and provides better responses when deployed.

The result is a significantly improved performance of the fine-tuned model (ALPAGASUS) compared to the original model (ALPACA) on multiple test sets. ALPAGASUS also achieves faster training time and demonstrates the importance of data quality in IFT.

The Importance of Data Quality and Selection Strategy

The paper emphasizes the importance of data quality in instruction-following models. Existing machine-generated instruction-finetuning datasets often contain low-quality data, which can hinder the model's performance. To address this issue, the paper proposes a data selection strategy based on ratings given by a strong language model (ChatGPT).

The strategy works by using ChatGPT to rate each triplet of (instruction, input, response) in the training data. The triplets with a rating above a certain threshold (in this case, 4.5) are selected for training the model. This approach ensures that the model is trained on high-quality data, leading to improved accuracy and efficiency of instruction-finetuning.

Evaluation Scheme and Results

The paper introduces a comprehensive evaluation scheme for comparing instruction-following capabilities of models. The evaluation involves prompting an LLM to act as a judge on four test sets: Vicuna, Koala, Self-Instruct, and WizardLM.

ALPAGASUS significantly outperforms the original ALPACA model on these test sets. Moreover, ALPAGASUS trained on high-quality data performs better than ALPACA models trained on randomly selected data. The performance of ALPAGASUS improves with more high-quality data, with around 6,000 high-quality data samples being sufficient to achieve similar performance to the original ALPACA model.

Cost-Saving Benefits of Data Selection Strategy

The data selection strategy also offers cost-saving benefits as model size scales up. Applying data selection to the ALPACA training data reduces it from 52k to 9k but improves the performance of the resulting ALPAGASUS model. The training time is also significantly shortened from 80 minutes to 14 minutes.

These findings contribute to the field of data-centric AI and the evaluation of language models. The study evaluated the IFT strategy on models of sizes 7B and 13B, but plans to extend the study to larger model sizes in the future.

Detailed Analysis and Future Directions

The paper includes a detailed analysis using a keyword set [Java, java, C++, c++, C#, c#, Python, python] and provides examples of responses rated by ChatGPT with scores of 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, and 2.0. These examples include instructions, inputs, and responses, along with explanations for the given scores.

ALPAGASUS-13B can achieve ≥ 91% capacity of its "teacher" model, text-Davinci-003, on tasks like Writing, RolePlay, Toxicity, Art, etc. However, ALPAGASUS-13B needs improvement on coding and math capacity compared to stronger language models.

The study did not rely on human evaluation due to cost, but plans to study the difference between human feedback and LLM ratings/evaluations in the future. The work focused on the IFT dataset for ALPACA and leaves exploration of other IFT datasets for future work.

Conclusion

In conclusion, ALPAGASUS presents a novel approach to instruction fine-tuning of large language models. By focusing on data quality and using a strong LLM as an auto-grader, it achieves improved performance and efficiency. The approach can be applied to other instruction finetuning datasets and LLMs, leading to better models and more efficient training.

Notes on AlpaGasus: Training A Better Alpaca with Fewer Data

The ELI5 TLDR: