This paper proposes IDEATOR, a novel method for evaluating the robustness of large-scale Vision-Language Models (VLMs) against jailbreak attacks that induce malicious output, for the secure deployment of VLMs. To overcome the lack of diverse multimodal data, a limitation of existing research, we leverage the VLM itself to generate pairs of targeted jailbreak texts and jailbreak images generated by state-of-the-art spreading models. IDEATOR achieves an attack success rate (ASR) of 94% against MiniGPT-4 and high ASRs against LLaVA, InstructBLIP, and Chameleon, demonstrating its effectiveness and transferability. Furthermore, we introduce VLJailbreakBench, a safety benchmark comprised of 3,654 multimodal jailbreak samples. We demonstrate significant safety alignment across 11 recently released VLMs (e.g., GPT-4o with 46.31% ASR and Claude-3.5-Sonnet with 19.65% ASR).