Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Created by
  • Haebom

Author

Zeyu Liu, Zanlin Ni, Yeguo Hua, Xin Deng, Xiao Ma, Cheng Zhong, Gao Huang

Outline

CODA (Continuous-to-Discrete Adaptation) is a framework that performs visual tokenization by separating image compression and discretization. Unlike conventional tokenization methods, CODA utilizes compression-optimized continuous VAEs, ensuring stable training and high codebook utilization. On the ImageNet 256x256 benchmark, CODA achieved superior reconstructed FID (rFID) performance with a training budget six times smaller than VQGAN.

Takeaways, Limitations

Takeaways:
Separate compression and discretization to ensure learning stability and increase codebook utilization.
Efficient learning is possible by reusing existing VAEs.
It shows excellent image reconstruction performance on the ImageNet 256x256 benchmark.
Limitations:
The specific Limitations is not presented in the paper.
👍