Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Created by
  • Haebom

Author

Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, Yansong Tang

Outline

This paper presents a novel approach that addresses two key challenges of existing methods that directly align diffusion models with human preferences: the computational cost and the need for continuous offline compensation model adaptation. Existing methods require gradient calculations during multi-stage denoising, resulting in high computational costs. Furthermore, they have limited optimization steps and require continuous offline compensation model adaptation to achieve realistic images and accurate lighting effects. To overcome the limitations of multi-stage denoising, this paper proposes a Direct-Align method that predefines a noise dictionary and effectively reconstructs the original image through interpolation at arbitrary time steps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), which uses textual conditional cues as compensation. This method adjusts the compensation online based on positive and negative prompt reinforcement, reducing the reliance on offline compensation fine-tuning. By fine-tuning the FLUX model with optimized denoising and online compensation adjustment, we achieve a more than threefold improvement in human-rated realism and aesthetic quality.

Takeaways, Limitations

Takeaways:
We present a Direct-Align method that effectively addresses the computational cost problem of multi-stage noise removal.
We propose SRPO, an online reward adjustment method that reduces dependence on offline reward model adaptation.
More than 3x improvement in realism and aesthetic quality of FLUX models.
Effectively reflect user preferences through text-based reward adjustments.
Limitations:
The performance of the Direct-Align method may depend on the quality of the predefined noise dictionary.
The effectiveness of SRPO can be affected by the quality and variety of text prompts.
Further research is needed on the generalization performance of the proposed method.
Only experimental results for a specific model (FLUX) are presented, making generalizability to other models uncertain.
👍