Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Created by
  • Haebom

Author

Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang

Outline

RandAR is a decoder-only visual autoregressive (AR) model that can generate images with arbitrary token order. While existing decoder-only AR models rely on predefined generation order, RandAR removes this inductive bias and provides novel capabilities in decoder-only generation. The core design enables arbitrary order by inserting a “position-indicating token” that indicates the spatial location before the next image token to be predicted. RandAR trained with randomly permuted token sequences (a more difficult task than fixed-order generation) achieves comparable performance to existing raster-order models. More importantly, decoder-only transformers trained with random order acquire novel capabilities. To address the efficiency bottleneck of AR models, RandAR adopts parallel decoding with KV-Cache at inference time, achieving a 2.5x speedup without compromising generation quality. RandAR also supports inpainting, outpainting, and resolution extrapolation in a zero-shot manner.

Takeaways, Limitations

Takeaways:
We present a new direction for decoder-only visual generative models.
It overcomes the limitations of existing models by enabling image generation in any token order.
Inference speed improved by 2.5x through parallel decoding.
Supports inpainting, outpainting, and resolution extrapolation in a zero-shot manner.
Limitations:
The paper does not explicitly mention the specific Limitations. Additional experiments and analyses are needed to reveal the performance limitations of RandAR and its vulnerability to specific image types.
There is a need to clearly present the advantages of random order generation and analyze its practical advantages more specifically compared to existing fixed-order models.
👍