Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Created by
  • Haebom

Author

Ruofan Wang, Juncheng Li, Yixu Wang, Bo Wang, Xiaosen Wang, Yan Teng, Yingchun Wang, Xingjun Ma, Yu-Gang Jiang

Outline

This paper proposes IDEATOR, a novel method for evaluating the robustness of large-scale Vision-Language Models (VLMs) against jailbreak attacks that induce malicious output, for the secure deployment of VLMs. To overcome the lack of diverse multimodal data, a limitation of existing research, we leverage the VLM itself to generate pairs of targeted jailbreak texts and jailbreak images generated by state-of-the-art spreading models. IDEATOR achieves an attack success rate (ASR) of 94% against MiniGPT-4 and high ASRs against LLaVA, InstructBLIP, and Chameleon, demonstrating its effectiveness and transferability. Furthermore, we introduce VLJailbreakBench, a safety benchmark comprised of 3,654 multimodal jailbreak samples. We demonstrate significant safety alignment across 11 recently released VLMs (e.g., GPT-4o with 46.31% ASR and Claude-3.5-Sonnet with 19.65% ASR).

Takeaways, Limitations

Takeaways:
IDEATOR, a new jailbreak attack method utilizing VLM itself, is presented and its high effectiveness and transferability are demonstrated.
VLJailbreakBench, a safety benchmark for various VLMs, is released.
It exposes serious vulnerabilities in the current VLM security and highlights the need for stronger defenses.
Limitations:
The performance of IDEATOR may depend on the performance of the diffusion model used and the VLM.
VLJailbreakBench may be limited in scope and a more diverse and extensive dataset may be required.
IDEATOR may not guarantee the same effectiveness for all VLMs (possibility of developing defense mechanisms for specific models).
👍