Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Created by
  • Haebom

Author

Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, Yansong Tang

Outline

This paper proposes a novel method, Direct-Align, to address two key challenges: the computational cost of existing methods that directly align diffusion models with human preferences and the need for continuous offline compensation model adaptation. Direct-Align reduces the computational cost of the multi-step denoising process by defining a noise dictionary and effectively recovering the original image at all time steps through interpolation. Furthermore, it introduces Semantic Relative Preference Optimization (SRPO), which uses text-conditional cues as compensation. This reduces the reliance on offline compensation fine-tuning by adjusting the compensation online based on positive and negative prompt reinforcement. Consequently, fine-tuning the FLUX model improves the realism and aesthetic quality of human evaluation criteria by more than threefold.

Takeaways, Limitations

Takeaways:
A new method (Direct-Align) is presented to effectively solve the computational cost problem of multi-step denoising process.
We present an online compensation adjustment method (SRPO) that reduces the dependence on fine-tuning offline compensation models.
Significantly improves the realism and aesthetic quality of the FLUX model.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Since the results are for a specific model (FLUX), verification of applicability to other diffusion models is required.
SRPO's high dependence on text-conditional signals may affect its performance depending on the quality of the text description.
👍