Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition

Created by
  • Haebom

Author

Ying Huang, Yuanbin Man, Wenqi Jia, Zhengzhong Tu, Junzhou Huang, Miao Yin

Outline

This paper proposes AdaRing, an adapter-based fine-tuning framework for efficiently applying large-scale pre-trained vision language models (VLMs) to various subtasks. Existing adapter-based fine-tuning methods integrate adapters into all layers to increase adapter capacity. However, they ignore inter-layer redundancy, limiting compression ratios and restricting the expressive power of homogeneous adapters. AdaRing achieves ultra-lightweight parameter-efficient adaptation of VLMs by integrating and collaborating multiple adapters based on inter-layer Tensor Ring Decomposition (TRD). To eliminate high redundancy between inter-layer adapters, we leverage tensor-level low-rank to formalize adapters into layer-shared tensor cores and layer-specific slices. Furthermore, following generalization-aware fine-tuning, various class-based adapters collaborate to handle tasks requiring different representations. Experimental results demonstrate that AdaRing achieves state-of-the-art performance while reducing the average training parameter requirement by 90%.

Takeaways, Limitations

Takeaways:
We present an ultra-lightweight parameter-efficient VLM fine-tuning framework that improves compression ratio by considering inter-layer redundancy.
Enhanced expressive capabilities for diverse tasks through collaboration with various adapters.
Achieve cutting-edge performance while reducing training parameters by 90%.
Limitations:
The performance of the proposed AdaRing may be limited to specific VLMs and subtasks.
Potential increase in computational cost due to the complexity of tensor ring decomposition.
Further analysis is needed on the effectiveness of fine-tuning generalization awareness.
👍