[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

Created by
  • Haebom

Author

Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang

Outline

Rich Preference Optimization (RPO) is a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-image diffusion models. Existing methods such as Diffusion-DPO often rely solely on reward model labels, which are opaque, provide limited insight into the reasons for preferences, and are prone to reward hacking and overfitting. In contrast, RPO starts by generating detailed critiques of synthetic images and extracts reliable and actionable image editing guidelines. By implementing these guidelines, it generates improved synthetic images and information-rich preference pairs that can be used as a fine-tuning dataset. RPO is shown to be effective in fine-tuning state-of-the-art diffusion models, and the code is available in https://github.com/Diffusion-RLHF/RPO .

Takeaways, Limitations

Takeaways:
By leveraging rich feedback signals (detailed critiques of images), we overcome the limitations of existing methods (opacity of reward model labeling, limited insight, reward hacking, and overfitting problems).
Extract reliable and actionable image editing guidelines to generate higher quality synthetic preference pairs.
Contributes to improving the fine-tuning performance of state-of-the-art diffusion models.
Reproducibility and extensibility achieved through open code.
Limitations:
The performance of the RPO pipeline is highly dependent on the quality of image critiques, and poor quality critiques can result in poor performance.
The process of generating detailed critiques of synthetic images and extracting image editing guidelines can be computationally expensive.
There is a possibility that performance may degrade for certain types of images or text. Additional experiments on various datasets are needed.
👍