Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

Created by
  • Haebom

Author

Qi Zhou, Tianlin Li, Qing Guo, Dongxia Wang, Yun Lin, Yang Liu, Jin Song Dong

Outline

This paper addresses the vulnerability of large-scale vision language models (LVLMs) to maliciously injected or altered input images. Existing defense methods address visual attacks susceptible to image modification (partial cropping), but such modifications generate partial images and distort semantics, degrading the quality of responses to clean images after voting. Instead of directly using partial image responses for voting, this paper proposes a method to supervise LVLM responses to original images. We propose a black-box, no-training approach called partial-aware supervision (DPS), which uses responses generated by a model that recognizes only partial images to provide prompts to the model. DPS allows the model to adjust its responses based on its partial image understanding when under attack, while confidently maintaining its original responses for clean inputs. Experimental results demonstrate that a weak model can supervise a strong model. The strong model, under attack, loses confidence and adapts its responses based on the weak model's partial understanding, effectively defending against attacks. Across six datasets across three popular models, we demonstrate a 76.3% average attack success rate reduction.

Takeaways, Limitations

Takeaways:
We present a novel defense technique against adversarial attacks on large-scale vision language models by leveraging partial image information.
We overcome the limitations of existing voting-based defense methods and present an effective method for defending against attacks without degrading the response quality of clean images.
We present an original approach to supervising a strong model by leveraging a weak model.
It demonstrates high defense performance across various data sets and models.
Limitations:
Further research is needed to determine whether the proposed DPS method is effective against all types of visual attacks.
Defense performance may be degraded against certain types of partial images or certain attack vectors.
Because it is a black box approach, there may be a lack of understanding of the internal working mechanisms of the model.
Further validation of applicability in real-world environments is required.
👍