Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization

Created by
  • Haebom

Author

Matteo Gallici, Haitz S aez de Oc ariz Borde

Outline

This paper presents an effective method to fine-tune pre-trained generative models with reinforcement learning (RL) to match complex human preferences. In particular, we focus on fine-tuning a next-generation visual autoregressive (VAR) model using group-relative policy optimization (GRPO). Experimental results show that alignment of complex reward signals obtained from the aesthetic predictor and CLIP embeddings significantly improves image quality and provides precise control over the generative style. By leveraging CLIP, we help the VAR model generalize beyond the initial ImageNet distribution, and through RL-based exploration, we can generate images tailored to prompts that refer to image styles that were not present during pre-training. In conclusion, we demonstrate that RL-based fine-tuning is efficient and effective for VAR models, and is advantageous over diffusion-based alternatives, especially due to its fast inference speed and its favorable online sampling.

Takeaways, Limitations

Takeaways:
Suggesting the possibility of improving image quality and precisely controlling the generation style through fine-tuning the VAR model using reinforcement learning (GRPO).
Improved generalization performance beyond pre-training data distributions using CLIP.
Validation of the possibility of efficient online sampling by leveraging the fast inference speed of the VAR model.
Limitations:
Dependency on the specific aesthetic predictor and CLIP embedding used in this study. Generalization performance needs to be verified on other datasets or reward functions.
Further analysis of the performance and stability of the GRPO algorithm is needed. Absence of comparative analysis with other RL algorithms.
Further review of the scale and diversity of experiments is needed, and generalization performance to different image styles and complex prompts is needed.
👍