Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Created by
  • Haebom

Author

Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shutao Xia, Zhi Wang, Wenwu Zhu

Outline

To address the high computational cost and low execution frequency of the Vision-Language-Action (VLA) model, we propose a unified framework called SP-VLA. It accelerates VLA models by combining model scheduling and token pruning. Specifically, it reduces temporal redundancy through action-aware model scheduling and removes visual redundancy through spatial-semantic dual-aware token pruning. SP-VLA dynamically switches between the VLA model and a lightweight generator to adjust execution frequency, focusing on important actions and key visual information to achieve effective acceleration while maintaining accuracy. Experimental results demonstrate lossless acceleration of 1.5x on LIBERO and 2.4x on SimplerEnv, with an average performance improvement of up to 6%. Inference frequency and latency are also improved by 2.2x on SimplerEnv and 1.4x on LIBERO.

Takeaways, Limitations

Takeaways:
A new framework for improving the efficiency of the VLA model is presented.
Combining model scheduling and token pruning to address both temporal and spatial redundancy.
Experiments have proven that it maintains high acceleration performance and accuracy.
Increased applicability of VLA models to real-time tasks such as robot control and autonomous navigation.
Limitations:
Performance may vary depending on the performance and generalization capabilities of the lightweight generator.
Further research is needed on optimal parameter settings for model scheduling and token pruning.
Generalization performance verification for other VLA models and environments is required.
👍