Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

Created by
  • Haebom

Author

Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang

Outline

Text-to-image models demonstrate remarkable ability to generate high-quality images from natural language descriptions, but they are highly vulnerable to adversarial prompts that can bypass safety measures and generate malicious content. In this paper, we experimentally study the text encoder of the Stable Diffusion (SD) model and find that the [EOS] token acts as a semantic aggregate and exhibits distinct distribution patterns between legitimate and adversarial prompts. Building on this, we introduce SafeGuider, a two-stage framework for robust safety control without compromising generation quality. Combining an embedding-level awareness model and a safety-aware feature-suppressing beam search algorithm, SafeGuider maintains high-quality image generation for legitimate prompts while ensuring robust defense against both in-domain and out-of-domain attacks. SafeGuider achieves an attack success rate of up to 5.48% across various attack scenarios and enhances practicality by generating safe, meaningful images for unsafe prompts instead of rejecting them or generating black images. Furthermore, we demonstrate that SafeGuider can be effectively applied to other text-to-image models, such as the Flux model, in addition to the SD model.

Takeaways, Limitations

Takeaways:
SafeGuider provides an effective framework for improving the safety of text-image models.
Increase usability by generating safe and meaningful images for unsafe prompts.
Applicable to various text-image models.
Limitations:
There is no specific mention of Limitations in the paper.
👍