This paper proposes AdaRing, an adapter-based fine-tuning framework for efficiently applying large-scale pre-trained vision language models (VLMs) to various subtasks. Existing adapter-based fine-tuning methods integrate adapters into all layers to increase adapter capacity. However, they ignore inter-layer redundancy, limiting compression ratios and restricting the expressive power of homogeneous adapters. AdaRing achieves ultra-lightweight parameter-efficient adaptation of VLMs by integrating and collaborating multiple adapters based on inter-layer Tensor Ring Decomposition (TRD). To eliminate high redundancy between inter-layer adapters, we leverage tensor-level low-rank to formalize adapters into layer-shared tensor cores and layer-specific slices. Furthermore, following generalization-aware fine-tuning, various class-based adapters collaborate to handle tasks requiring different representations. Experimental results demonstrate that AdaRing achieves state-of-the-art performance while reducing the average training parameter requirement by 90%.