This paper analyzes the Limitations of Visual Prompt Tuning (VPT), one of the parameter-efficient fine-tuning (PEFT) methods for reducing the computational cost of large-scale models, and proposes a novel method, Cross Visual Prompt Tuning (CVPT), which improves it. VPT has a problem that the self-attention mechanism of the model is distorted due to the prompt placement strategy. CVPT solves this problem by introducing a cross-attention module that directly models the interaction between prompts and image tokens. The cross-attention module separates the prompt from the input sequence, enabling efficient feature integration while maintaining the integrity of the self-attention mechanism. In addition, it improves the expressiveness without parameter overhead by using a weight sharing mechanism. Experimental results on 25 datasets show that CVPT significantly outperforms VPT and achieves an average accuracy higher than 4% on the VTAB-1K benchmark, which is comparable to state-of-the-art adapter-based methods in terms of performance and efficiency.