Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

Created by
  • Haebom

Author

Luozhijie Jin, Zijie Qiu, Jie Liu, Zijie Diao, Lifeng Qiao, Ning Ding, Alex Lamb, Xipeng Qiu

Outline

This paper builds on the achievements of denoising-based generative models, particularly diffusion and flow-matching algorithms, to address the challenges of aligning the output distribution of generative models with complex sub-objectives such as human preference, compositional accuracy, and data compression ratio. To overcome the limitations of existing reinforcement learning (RL) fine-tuning methods, we reinterpret RL fine-tuning for diffusion models in terms of stochastic differential equations and implicit reward conditioning. We present Reinforcement Learning Guidance (RLG), an inference-time method that combines the outputs of a base model and an RL fine-tuned model via geometric means and applies classifier-free guidance (CFG). Theoretical analysis demonstrates that the guidance metric of RLG is mathematically equivalent to adjusting the KL-regularization coefficient in standard RL objectives, enabling dynamic control of alignment-quality trade-offs without additional training. Extensive experiments demonstrate that RLG consistently improves the performance of RL fine-tuned models across a variety of architectures, RL algorithms, and sub-tasks (including human preference, compositional control, compression ratio, and text rendering). Furthermore, RLG supports both interpolation and extrapolation, providing unprecedented flexibility in controlling generative alignment. In conclusion, this paper presents a practical and theoretically sound solution for improving and controlling diffusion model alignment during inference.

Takeaways, Limitations

Takeaways:
We present RLG, a novel method for RL fine-tuning of diffusion models.
RLG combines the outputs of the base model and the RL fine-tuned model using geometric mean at inference time, enabling dynamic control of alignment strength without additional training.
Improve the performance of RL fine-tuning models on various subtasks, including human preference, composition control, compression ratio, and text rendering.
Increased flexibility in controlling generation alignment by supporting interpolation and extrapolation.
The effectiveness of RLG is mathematically proven through theoretical analysis.
Source code disclosure.
Limitations:
The performance of RLG presented in this paper is based on experimental results for specific datasets and tasks, and further research is needed to determine generalization performance for other datasets or tasks.
The computational cost of RLG may be higher than that of conventional methods.
The optimal strategy for adjusting the guidance scale of RLG may be further improved through further research.
👍