This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Created by
Haebom
Author
Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui
Outline
This paper proposes a latency-aware scheduling framework to address the limited resources and cold-start latency issues that arise when deploying large-scale language models on edge devices. This framework minimizes total inference latency by overlapping model loading, computation, and communication. It also dynamically adjusts layer partitioning and allocation based on device and model parameters, effectively hiding loading time and minimizing idle time. We formulate the problem as a mixed-integer nonlinear programming (MIP) method and design an efficient dynamic programming algorithm to optimize model partitioning and device allocation. Experimental results demonstrate that the proposed method significantly reduces cold-start latency compared to existing strategies.
Takeaways, Limitations
•
Takeaways:
◦
A proposal to effectively address the cold start delay problem that occurs when deploying large-scale language models on edge devices.
◦
Reduce total inference latency by overlapping model loading and computation/communication.
◦
Efficient model partitioning and device allocation optimization using dynamic programming algorithms.
◦
The superiority of the proposed method is verified through experimental results.
•
Limitations:
◦
Further verification of the proposed method's generalization performance in real edge device environments is needed.
◦
Scalability evaluation for various types of edge devices and large-scale language models is needed.
◦
Consideration of computational cost issues due to the complexity of mixed integer nonlinear programming