This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
CODA (Continuous-to-Discrete Adaptation) is a framework that performs visual tokenization by separating image compression and discretization. Unlike conventional tokenization methods, CODA utilizes compression-optimized continuous VAEs, ensuring stable training and high codebook utilization. On the ImageNet 256x256 benchmark, CODA achieved superior reconstructed FID (rFID) performance with a training budget six times smaller than VQGAN.
Takeaways, Limitations
•
Takeaways:
◦
Separate compression and discretization to ensure learning stability and increase codebook utilization.
◦
Efficient learning is possible by reusing existing VAEs.
◦
It shows excellent image reconstruction performance on the ImageNet 256x256 benchmark.
•
Limitations:
◦
The specific Limitations is not presented in the paper.