This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
To address the high computational cost of training large-scale language models (LLMs), we present an open training recipe that maximizes the efficiency of FP8 training. It integrates continuous pretraining and supervised fine-tuning, and employs a fine-grained hybrid precision quantization strategy to maximize computational efficiency while maintaining numerical accuracy. Extensive experiments, including continuous pretraining of the model on a 160 billion-token corpus, demonstrate that the proposed recipe is robust, exhibits minimal loss, and achieves performance comparable to BF16-based models. Significant efficiency improvements include up to a 22% reduction in training time, a 14% reduction in peak memory usage, and a 19% increase in throughput.
Takeaways, Limitations
•
Takeaways:
◦
We present a practical method to improve the efficiency of large-scale language model training using FP8 training.
◦
Significant improvements in training time, memory usage, and throughput while maintaining comparable performance to BF16-based models.
◦
Increase accessibility to large-scale model training by releasing open source code.
•
Limitations:
◦
The paper may be limited in its description of specific model architecture or training details.
◦
Further research is needed to determine whether the benefits of FP8 training can be generalized to all models or datasets.
◦
There may be dependencies on hardware and software support for FP8 training.