This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
Created by
Haebom
Author
Yifei He, Yang Liu, Chen Liang, Hany Hassan Awadalla
Outline
This paper proposes the concept of compressed experts to efficiently scale the Mixture-of-Experts (MoE) model. Existing MoE models activate only a subset of experts during training and inference, but not all activated experts contribute equally to performance. This study proposes a method to reduce the number of active parameters and reduce inference costs by replacing insignificant experts with compressed, lightweight modules. Experimental results using Phi-MoE and OLMoE models demonstrate that compressed experts recover over 90% of the full expert performance while reducing the number of active parameters by over 30% and the inference cost by over 20%. This enables efficient deployment of MoE models in resource-constrained environments and their scaling to larger models. The code can be found at https://github.com/yifei-he/Compressed-Experts .