Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hita: Holistic Tokenizer for Autoregressive Image Generation

Created by
  • Haebom

Author

Anlin Zheng, Haochen Wang, Yucheng Zhao, Weipeng Deng, Tiancai Wang, Xiangyu Zhang, Xiaojuan Qi

Outline

Hita is a novel image tokenizer proposed to overcome the limitations of existing autoregressive image generation models. Existing models have difficulty in capturing global relationships while generating tokens sequentially, and have problems in that they rely on local patch information and thus have limited use of global information. Hita solves these problems by introducing a learnable global query and a global-local tokenization method that uses local patch tokens. It uses a sequential structure that places global tokens first and then places patch tokens, and causal attention to maintain awareness of previous tokens, and a lightweight fusion module to control information flow and increase the priority of global tokens. It achieved FID 2.59 and IS 281.9 on the ImageNet benchmark, showing superior performance than existing tokenizers, and also improved training speed. It also showed effectiveness in zero-shot style transfer and image inpainting.

Takeaways, Limitations

Takeaways:
Performance improvement of autoregressive image generation models: Achieving SOTA performance on ImageNet (FID 2.59, IS 281.9).
Increased training speed.
Improved ability to capture global image features (texture, material, shape).
Effective utilization in zero-shot style transfer and image inpainting.
A novel approach to designing global-local tokenizers is presented.
Limitations:
There is a lack of explicit reference to Hita's Limitations in the paper. There may be room for further improvement through future research.
Lack of information about dependencies or scalability for specific hardware environments.
👍