Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Spotlighter: Revisiting Prompt Tuning from a Representative Mining View

Created by
  • Haebom

Author

Yutong Gao, Maoyuan Shao, Xinyang Huang, Chuang Zhu, Lijuan Sun, Yu Weng, Xuan Liu, Guoshun Nan

Outline

Building on the success of CLIP's prompt tuning, we propose Spotlighter, a lightweight token selection framework that simultaneously improves accuracy and efficiency by removing redundant or weakly correlated features that incur unnecessary computational costs. Spotlighter evaluates the activation of each visual token at both a sample-by-sample and semantic-by-sense level, retaining only the top-scoring tokens for downstream predictions. A class-specific semantic memory bank of learned prototypes enhances this selection, ensuring semantic representativeness and compensating for discarded features. Furthermore, we introduce a two-stage ranking mechanism that dynamically weights token-prototype interactions to prioritize informative cues. Across 11 few-shot benchmarks, Spotlighter improves harmonic mean accuracy by up to 11.19% over CLIP and achieves up to 0.8K FPS improvement with only 21 additional parameters. These results establish Spotlighter as an effective and scalable baseline for prompt tuning. The code is available at https://github.com/greatest-gourmet/Spotlighter .

Takeaways, Limitations

Takeaways:
We present an effective lightweight token selection framework that simultaneously improves the accuracy and efficiency of prompt tuning.
Reduce unnecessary computational costs and improve accuracy by evaluating token activation at sample and semantic levels.
We leverage class-specific semantic memory banks to ensure semantic representativeness and compensate for discarded features.
A two-step ranking mechanism prioritizes informative signals.
It outperforms CLIP in various benchmarks.
Limitations:
Further validation of the generality of the proposed method may be required.
Optimization may be required for specific datasets or tasks.
Further research may be needed on the size and structure of the memory bank.
👍