Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

Created by
  • Haebom

Author

Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang

Outline

This paper presents InfiAlign, an efficient post-training framework for improving the inference performance of large-scale language models (LLMs). InfiAlign aligns LLMs by combining supervised fine-tuning (SFT) and direct affinity optimization (DPO). Its core is a robust data selection pipeline that automatically selects high-quality alignment data from open-source inference datasets using multidimensional quality metrics. Applying it to the Qwen2.5-Math-7B-Base model, we demonstrate that it achieves comparable performance to existing models using only about 12% of the original data, demonstrating strong generalization across a variety of inference tasks. Specifically, applying DPO yields an average performance improvement of 3.89% on mathematical inference tasks. By combining principled data selection with pre-training, InfiAlign provides a practical solution for aligning large-scale inference models in a scalable and data-efficient manner. Model checkpoints are available at https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT .

Takeaways, Limitations

Takeaways:
We present InfiAlign, a novel framework that effectively addresses the data and computational cost issues of existing LLM post-training.
Maximize data efficiency and ensure scalability with automated data selection pipelines.
Achieving excellent performance improvements in various inference tasks through the combination of SFT and DPO.
Presenting a practical, data-efficient, large-scale inference model alignment method.
Increased reproducibility and usability of research through public disclosure of learned models.
Limitations:
InfiAlign's performance improvements may be limited to specific models (Qwen2.5-Math-7B-Base) and datasets.
Lack of detailed explanation of the definition and setting of multidimensional quality indicators.
Further validation of generalization performance for other LLMs and various inference tasks is needed.
Lack of analysis of biases and limitations of data selection pipelines.
👍