Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Created by
  • Haebom

Author

Ning Liao, Xiaoxing Wang, Zehao Lin, Weiyang Guo, Feng Hong, Shixiang Song, Geng Yu, Zihua Zhao, Sitao Xie, Longxuan Wei, Xiangqi Jin, Xiaohan Qin, Jiale Ma, Kai Chen, Jiangchao Yao, Zhouhan Lin, Junchi Yan, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Linfeng Zhang

Innovator: LLM with both scientific knowledge and general skills

Outline

Innovator is a large-scale language model (LLM) that combines scientific knowledge and general capabilities. It upcycles existing LLMs into a fine-grained Mixtures of Experts model, leveraging experts specialized in both general tasks and various scientific fields. Through four stages of upcycling training (Scientific Expert Induction, Fine-Grained Expert Splitting, Science-Aware Routing Warmup, and Generalist-Scientist Integration), it minimizes the negative impact of scientific domains while preserving general domain knowledge. Based on Qwen2.5-7B, Innovator, with 53.3B parameters (13.3B activations), is trained with 300B tokens, containing 64 scientific experts and 1 general expert. It achieved an average performance improvement of 25% on 30 scientific tasks, maintained 99% performance on general tasks, and improved inference ability by over 30% on complex scientific problems. Innovator-Reason, trained on Innovator, demonstrated a 30% improvement in complex scientific problem solving.

Takeaways, Limitations

Takeaways:
Successful development of an LLM that combines scientific knowledge and general skills.
Efficient knowledge acquisition and management using the Mixtures-of-Experts model.
Solving the problem of catastrophic forgetting through a multi-step upcycling training approach.
Significant performance improvements across a variety of scientific tasks.
Development of additional training models to improve inference capabilities.
Limitations:
Lack of information about model size and training data scale.
Possibly biased towards specific scientific fields or datasets.
Lack of specifics on maintaining performance in general tasks.
The complexity of the upcycling process.
👍