Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Created by
  • Haebom

Author

Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, Bo Li

Outline

Despite recent improvements in the performance of text-to-image (T2I) models, this paper raises concerns about the generation of NSFW content, including sexually suggestive, violent, politically sensitive, and offensive images. To address this, we present PromptGuard, a novel content moderation technique. Inspired by the system prompt mechanism of large-scale language models (LLMs), PromptGuard optimizes safe soft prompts (P*), which serve as implicit system prompts within the text embedding space of T2I models. This enables safe and realistic image generation without compromising inference efficiency or requiring proxy models. Furthermore, we optimize category-specific soft prompts and integrate them to provide safety guidance, enhancing reliability and usability. Extensive experiments on five datasets demonstrate that PromptGuard effectively mitigates NSFW content generation while maintaining high-quality positive output. It achieves 3.8x speedup over existing methods and reduces the optimal unsafe rate to 5.84%, outperforming eight state-of-the-art defenses.

Takeaways, Limitations

Takeaways:
Providing an effective and efficient solution to the NSFW content creation problem of the T2I model.
Reduces NSFW content creation at a much faster rate than existing methods.
A novel approach to applying system prompt mechanisms to the T2I model is presented.
Achieving balanced performance that simultaneously considers safety and quality
Limitations:
Since this is a performance evaluation result for a specific dataset, it is necessary to verify generalizability to other datasets or models.
Further research is needed on adaptability to new types of NSFW content.
The need for greater transparency and explainability in the safety software prompt optimization process.
The unsafe rate of 5.84% is not a perfect solution and requires continuous improvement.
👍