Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Created by
  • Haebom

Author

Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li, Shanqing Guo

Outline

This paper addresses the vulnerability to so-called 'jailbreak attacks', which generate unsafe content by bypassing the safety mechanisms of text-to-image generation models. We point out that existing jailbreak attack methods have limitations such as impractical access requirements, easily detectable unnatural prompts, limited search space, and high system query requirements. To overcome these limitations, we propose JailFuzzer, a novel fuzzing framework based on a large-scale language model (LLM) agent. JailFuzzer consists of three components: a seed pool for initial and jailbreak prompts, a guided mutation engine that generates meaningful mutations, and an oracle function that evaluates the success of jailbreak. It secures efficiency and adaptability through an LLM-based agent. Experimental results show that JailFuzzer generates more natural and semantically consistent prompts than existing methods, reduces detectability, and achieves a high success rate with minimal query overhead. This highlights the need for a robust safety mechanism in the generation model and provides a foundation for further research on sophisticated jailbreak attack defenses. JailFuzzer is open source.

Takeaways, Limitations

Takeaways:
An efficient and natural jailbreak attack method using an LLM-based fuzzing framework is presented.
Overcomes __T223072_____ (unnatural prompts, high query requirements, etc.) of existing jailbreak attack methods.
Emphasizes the need to strengthen the safety mechanism of the text-to-image generation model.
Laying the foundation for future sophisticated jailbreak attack defense research.
Promoting research sharing and advancement through open source disclosure.
Limitations:
The performance of JailFuzzer presented in this paper is for a specific text-to-image generation model, and its generalizability to other models requires further study.
Since it relies on the performance of the LLM agent, limitations and biases of the LLM itself may affect the performance of JailFuzzer.
As new security mechanisms are developed, the effectiveness of JailFuzzer may decrease. Continuous improvement and adaptation are required.
👍