This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
Created by
Haebom
Author
Runtao Liu, I Chieh Chen, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati
Outline
In this paper, we present AlignGuard, a novel method to improve the safety of text-to-image (T2I) models. To overcome the limitation of existing safety measures that only remove a few concepts, AlignGuard applies Direct Preference Optimization (DPO) on the synthetic dataset CoProV2. CoProV2 consists of harmful and safe image-text pairs, and trains safety experts in the form of low-dimensional adaptation (LoRA) matrices. The trained safety experts guide the generation process away from specific safety-related concepts, and a novel merging strategy efficiently merges multiple experts into a single LoRA. As a result, AlignGuard removes 7x more harmful concepts than existing methods and achieves state-of-the-art performance on several benchmarks. The code and data will be made available at https://safetydpo.github.io/ .