This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes a novel method, Direct-Align, to address two key challenges: the computational cost of existing methods that directly align diffusion models with human preferences and the need for continuous offline compensation model adaptation. Direct-Align reduces the computational cost of the multi-step denoising process by defining a noise dictionary and effectively recovering the original image at all time steps through interpolation. Furthermore, it introduces Semantic Relative Preference Optimization (SRPO), which uses text-conditional cues as compensation. This reduces the reliance on offline compensation fine-tuning by adjusting the compensation online based on positive and negative prompt reinforcement. Consequently, fine-tuning the FLUX model improves the realism and aesthetic quality of human evaluation criteria by more than threefold.
Takeaways, Limitations
•
Takeaways:
◦
A new method (Direct-Align) is presented to effectively solve the computational cost problem of multi-step denoising process.
◦
We present an online compensation adjustment method (SRPO) that reduces the dependence on fine-tuning offline compensation models.
◦
Significantly improves the realism and aesthetic quality of the FLUX model.
•
Limitations:
◦
Further research is needed on the generalization performance of the proposed method.
◦
Since the results are for a specific model (FLUX), verification of applicability to other diffusion models is required.
◦
SRPO's high dependence on text-conditional signals may affect its performance depending on the quality of the text description.