Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models

Created by
  • Haebom

Author

Ky Dan Nguyen, Hoang Lam Tran, Anh-Dung Dinh, Daochang Liu, Weidong Cai, Xiuying Wang, Chang Xu

Outline

Next-generation predictive autoregressive (AR) models are emerging as powerful tools in image generation. However, they suffer from information mismatch between patches due to gradual resolution expansion. This mismatch disperses guidance signals, diverging them from conditional information and leaving features ambiguous and inaccurate. In this paper, we propose a novel mechanism, Information-Grounding Guidance (IGG), which anchors guidance to important semantic regions through an attention mechanism. IGG adaptively strengthens informative patches during the sampling process, ensuring a close alignment between guidance and content. In class-conditional and text-to-image generation tasks, IGG generates sharper, more consistent, and semantically grounded images, setting a new standard for AR-based methods.

Takeaways, Limitations

Takeaways:
A new mechanism (IGG) is proposed to solve the information mismatch problem that occurs when generating images of AR models.
Improved image quality by improving alignment of instructions and content through the attention mechanism.
It outperforms existing AR-based methods in class-conditional and text-to-image generation tasks.
Limitations:
There is no direct mention of Limitations in the paper.
👍