[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CVPT: Cross Visual Prompt Tuning

Created by
  • Haebom

Author

Lingyun Huang, Jianxu Mao, Junfei Yi, Ziming Tao, Yaonan Wang

Outline

This paper analyzes the Limitations of Visual Prompt Tuning (VPT), one of the parameter-efficient fine-tuning (PEFT) methods for reducing the computational cost of large-scale models, and proposes a novel method, Cross Visual Prompt Tuning (CVPT), which improves it. VPT has a problem that the self-attention mechanism of the model is distorted due to the prompt placement strategy. CVPT solves this problem by introducing a cross-attention module that directly models the interaction between prompts and image tokens. The cross-attention module separates the prompt from the input sequence, enabling efficient feature integration while maintaining the integrity of the self-attention mechanism. In addition, it improves the expressiveness without parameter overhead by using a weight sharing mechanism. Experimental results on 25 datasets show that CVPT significantly outperforms VPT and achieves an average accuracy higher than 4% on the VTAB-1K benchmark, which is comparable to state-of-the-art adapter-based methods in terms of performance and efficiency.

Takeaways, Limitations

Takeaways:
We show that methods based on visual prompt adjustment can be competitive in performance and efficiency compared to adapter-based methods.
We present a novel PEFT method that effectively models the interaction between prompts and image tokens by leveraging the cross-attention module.
We propose an effective method to improve parameter efficiency through a weight sharing mechanism.
Demonstrated excellent performance on various vision datasets.
Limitations:
Further research is needed on the generalization performance of the proposed method.
A more comprehensive comparative analysis with other types of PEFT methods is needed.
Additional analysis is needed to determine whether there is a dependency on a specific dataset or model architecture.
👍