Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning

Created by
  • Haebom

Author

Hongyi Cai, Jie Li, Mohammad Mahdinur Rahman, Wenzhen Dong

Outline

This paper proposes Low-Confidence Gold (LCG), a novel filtering framework for improving the efficiency of directive fine-tuning in large-scale language models. LCG identifies valuable directive pairs using centroid-based clustering and confidence-based selection. Semi-supervised learning using lightweight classifiers generates high-quality subsets while preserving data diversity. Experimental results show that a model fine-tuned on 6K samples filtered by LCG outperforms existing methods, demonstrating significant performance gains on MT-bench and consistent performance gains across comprehensive evaluation metrics. The effectiveness of this framework in improving efficiency while maintaining model performance suggests a promising direction for efficient directive fine-tuning.

Takeaways, Limitations

Takeaways:
We demonstrate that the LCG framework can improve the performance of directive fine-tuning of large-scale language models with only a small amount of high-quality data.
We propose an efficient directive fine-tuning method compared to existing bulk data-based fine-tuning methods.
We demonstrate the effectiveness of a novel data filtering technique that combines center-based clustering and confidence-based selection.
Achieved consistent performance improvements across various evaluation metrics, including MT-bench.
Limitations:
The performance of LCG may depend on the performance of the lightweight classifier.
The experiments were conducted with a limited data size of 6K, and further research is needed to determine generalization performance for larger datasets.
May be biased towards certain types of directives or datasets.
Further validation of the generalizability of the framework is needed.
👍