This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Created by
Haebom
Author
Yuqi Pan, Yupeng Feng, Jinghao Zhuang, Siyu Ding, Zehao Liu, Bohan Sun, Yuhong Chou, Han Xu, Xuerui Qiu, Anlin Deng, Anjie Hu, Peng Zhou, Man Yao, Jibin Wu, Jian Yang, Guoliang Sun, Bo Xu, Guoqi Li
Outline
This paper proposes the brain-inspired SpikingBrain model to address the efficiency bottlenecks (quadratic increase in computational complexity and linear increase in memory) of existing Transformer-based large-scale language models. Leveraging the MetaX GPU cluster, we developed two models, SpikingBrain-7B (linear LLM) and SpikingBrain-76B (hybrid linear MoE LLM), focusing on three aspects: linear and hybrid linear attention architectures, efficient transformation-based learning pipelines, a dedicated spike coding framework, a custom learning framework, and parallel processing strategies. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms and achieve similar performance to the open-source Transformer baseline model with significantly fewer tokens (approximately 150B). In particular, they significantly improve the efficiency of long-sequence learning and perform inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B reduces the time to generate the first token in a 4M-token sequence by more than 100x. Maintaining stable training for weeks on hundreds of MetaX C550 GPUs, the 7B model achieves 23.4% model FLOPs utilization and 69.15% sparsity, enabling low-power operation.
Takeaways, Limitations
•
Takeaways:
◦
Suggesting the possibility of large-scale LLM development on non-NVIDIA platforms
◦
Improving Long-Text Processing Efficiency Using Brain-Inspired Models
◦
Improved learning and inference efficiency compared to existing Transformer-based models (especially long sequence processing)
◦
Low power operation capability
◦
Excellent initial token generation speed
•
Limitations:
◦
A system specifically designed for MetaX GPU clusters, requiring verification of portability to other platforms.
◦
The performance comparison of the proposed model is limited to the open-source Transformer reference model. Comparative analysis with various state-of-the-art models is necessary.
◦
Further research is needed to determine the generalization performance of the SpikingBrain model and its applicability to various tasks.
◦
The model size (7B, 76B) is medium compared to other large-scale language models, so development of a larger-scale model and performance evaluation are necessary.