Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

Created by
  • Haebom

Author

Haowei Yang, Yu Tian, Zhongheng Yang, Zhao Wang, Chengrui Zhou, Dannier Li

Outline

In this paper, we systematically study two distributed learning optimization methods, model-parallel and data-parallel, to address the computational and communication bottlenecks that arise due to the rapid introduction of large-scale language models (LLMs) in recommender systems. For model-parallel, we implement tensor-parallel and pipeline-parallel, and introduce an adaptive load-balancing mechanism to reduce the communication overhead between devices. For data-parallel, we compare synchronous and asynchronous modes, and combine gradient compression and sparsification techniques with an efficient aggregate communication framework to significantly improve bandwidth utilization. Experimental results using real recommendation datasets show that the proposed hybrid parallel approach improves training throughput by more than 30% and resource utilization by about 20% compared to existing single-mode parallel approaches, while maintaining strong scalability and stability. Finally, we discuss the trade-offs between different parallel strategies in online deployments, and suggest future directions including heterogeneous hardware integration and automatic scheduling techniques.

Takeaways, Limitations

Takeaways:
We experimentally demonstrate that the learning throughput and resource utilization of LLM-based recommender systems can be significantly improved by using a hybrid parallel processing approach.
We present an efficient distributed learning strategy that combines the advantages of model parallelism and data parallelism.
Reduced communication overhead through adaptive load balancing and efficient communication framework.
Limitations:
Since the experiments were conducted in a simulation environment, performance in an actual service environment requires further verification.
No specific solutions are presented for heterogeneous hardware integration and automatic scheduling techniques.
Lack of in-depth analysis of trade-offs between different parallel strategies.
👍