Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

Created by
  • Haebom

Author

Wenjun Wang, Shuo Cai, Congkai Xie, Mingfa Feng, Yiming Zhang, Zhen Li, Kejing Yang, Ming Li, Jiannong Cao, Yuan Xie, Hongxia Yang

Outline

To address the high computational cost of training large-scale language models (LLMs), we present an open training recipe that maximizes the efficiency of FP8 training. It integrates continuous pretraining and supervised fine-tuning, and employs a fine-grained hybrid precision quantization strategy to maximize computational efficiency while maintaining numerical accuracy. Extensive experiments, including continuous pretraining of the model on a 160 billion-token corpus, demonstrate that the proposed recipe is robust, exhibits minimal loss, and achieves performance comparable to BF16-based models. Significant efficiency improvements include up to a 22% reduction in training time, a 14% reduction in peak memory usage, and a 19% increase in throughput.

Takeaways, Limitations

Takeaways:
We present a practical method to improve the efficiency of large-scale language model training using FP8 training.
Significant improvements in training time, memory usage, and throughput while maintaining comparable performance to BF16-based models.
Increase accessibility to large-scale model training by releasing open source code.
Limitations:
The paper may be limited in its description of specific model architecture or training details.
Further research is needed to determine whether the benefits of FP8 training can be generalized to all models or datasets.
There may be dependencies on hardware and software support for FP8 training.
👍