Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Efficiently Editing Mixture-of-Experts Models with Compressed Experts

Created by
  • Haebom

Author

Yifei He, Yang Liu, Chen Liang, Hany Hassan Awadalla

Outline

This paper proposes the concept of compressed experts to efficiently scale the Mixture-of-Experts (MoE) model. Existing MoE models activate only a subset of experts during training and inference, but not all activated experts contribute equally to performance. This study proposes a method to reduce the number of active parameters and reduce inference costs by replacing insignificant experts with compressed, lightweight modules. Experimental results using Phi-MoE and OLMoE models demonstrate that compressed experts recover over 90% of the full expert performance while reducing the number of active parameters by over 30% and the inference cost by over 20%. This enables efficient deployment of MoE models in resource-constrained environments and their scaling to larger models. The code can be found at https://github.com/yifei-he/Compressed-Experts .

Takeaways, Limitations

Takeaways:
Presenting a new method that can significantly improve the efficiency of the MoE model.
Resource-efficient MoE model deployment with reduced active parameters and inference costs.
Improved scalability of large-scale MoE models.
Limitations:
Further research is needed on the generalization performance of the proposed compression method.
Further experiments with different MoE architectures and downstream tasks are needed.
Quantitative analysis of information loss during compression is needed.
👍