Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts

Created by
  • Haebom

Author

Qing Wang, Xue Han, Jiahui Wang, Lehao Xing, Qian Hu, Lianlian Zhang, Chao Deng, Junlan Feng

Outline

This paper presents a method to improve the performance of existing large-scale language models (LLMs) for multi-programming languages (MultiPL) within limited computational resources to address the challenge of multilingual code generation. We consider MultiPL as a special case of multi-language natural language models and propose MultiPL-MoE, a hybrid architecture of expert mixture models (MoEs). MultiPL-MoE combines two MoEs to optimize expert selection at the token and segment levels. The token-level MoE utilizes shared experts and gated weight regularization techniques, while the segment-level MoE utilizes a sliding window and top-k segment selection strategy to better capture the syntactic structure and contextual patterns of programming languages. Experimental results demonstrate the effectiveness of MultiPL-MoE.

Takeaways, Limitations

Takeaways:
Suggesting the possibility of improving multi-programming language (MultiPL) performance under limited resources.
Proposing an efficient MoE structure through expert selection optimization at the token and segment levels.
Improving understanding of programming language structure and context through sliding window and top-k segment selection strategies.
Experimental verification of the effectiveness of MultiPL-MoE
Limitations:
The paper lacks detailed information about the specific experimental setup, dataset, and comparison models.
Further research is needed on the generalization performance of the proposed MultiPL-MoE and its applicability to various programming languages.
Lack of detailed explanation of the gate weight normalization technique and the working principle of expert selection strategy.
Lack of sufficient information to ensure reproducibility of experimental results
👍