[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE

Created by
  • Haebom

Author

Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla

Outline

In this paper, we propose FLAME, a novel framework for federated learning in resource-constrained client environments. Existing resource-adaptive LoRA federated fine-tuning methods use compressed versions of global LoRA matrices to accommodate diverse client computational resources, but suffer from performance degradation due to information loss. FLAME is based on the sparse mixture of experts (SMoE) architecture, which maintains the full uncompressed global LoRA matrices while varying the number of activated experts per client to achieve client-side adaptability. It addresses issues such as output size mismatch due to partial expert activations and imbalance in expert training quality across clients through a lightweight rebalancing mechanism and an activation-aware aggregation scheme. Experimental results on various computational environments demonstrate that FLAME outperforms existing methods.

Takeaways, Limitations

Takeaways:
We address the performance degradation issue of existing LoRA-based federated learning by using the full global LoRA matrix without compression.
The SMoE architecture allows for flexible adaptation to the client's compute resources.
It effectively addresses the inherent challenges of SMoE-based federated learning through a lightweight rebalancing mechanism and an activation-aware aggregation approach.
It demonstrates superior performance over existing methods in a variety of environments.
Limitations:
The complexity of the SMoE architecture can increase the model size and training complexity.
There may be room for optimization of the proposed lightweight rebalancing mechanism and activation-aware aggregation approach.
Additional experiments with different data distributions and network environments may be required.
👍