Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing

Created by
  • Haebom

Author

Yuanteng Chen, Peisong Wang, Yuantian Shao, Nanxin Zeng, Chang Xu, Jian Cheng

Outline

Sparse Mixture-of-Experts (MoE) is a key architecture for efficiently scaling large-scale language models (LLMs). This study highlights the issue that pre-training routers are optimized for stability and robustness, limiting model performance and efficiency. To address this, we propose a post-training strategy, Ban & Pick, without retraining or architecture changes. Pick identifies and strengthens key experts that significantly impact performance, improving accuracy. Ban dynamically removes redundant experts based on layer and token sensitivity to accelerate inference. Experiments on fine-grained MoE-LLMs, such as DeepSeek and Qwen3, demonstrate that Ban & Pick achieves improved accuracy and accelerated inference without retraining or architecture changes.

Takeaways, Limitations

Takeaways:
We propose Ban&Pick, a post-training optimization strategy for pre-trained MoE models, to achieve performance improvement and accelerated inference.
Emphasize the importance of key experts and eliminate redundant experts to increase efficiency.
We present a practical method to improve the performance of existing MoE models without retraining.
Limitations:
Further research is needed to determine how well the Ban&Pick strategy generalizes to other MoE architectures or model sizes.
Detailed analysis of the optimal parameter settings for Ban&Pick may be lacking.
Although Ban&Pick shows significant performance improvements in certain benchmarks, further validation of its performance in other benchmarks and real-world applications is needed.
👍