Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Learning Diffusion Models with Flexible Representation Guidance

Created by
  • Haebom

Author

Chenyu Wang, Cai Zhou, Sharut Gupta, Zongyu Lin, Stefanie Jegelka, Stephen Bates, Tommi Jaakkola

Outline

This paper presents a systematic framework that integrates additional guidelines for more effective representations of diffusion model inputs. We aim to improve generation quality by aligning the representations of pretrained models with the internal representations of diffusion models.
In this paper, we present an alternative decomposition of the denoising model and its associated training criteria, and determine when and how to incorporate auxiliary representations.
Based on this, we propose two novel strategies. First, we train a joint model for multimodal pairs by pairing examples with target representations derived from either their own or other synthetic modalities. Second, we design an optimal training curriculum that balances representation learning and data generation.
Experimental results on image, protein sequence, and molecule generation tasks demonstrate excellent performance and training acceleration. Specifically, on the class-conditional ImageNet $256\times 256$ benchmark, the proposed instructions achieve training speeds 23.3x faster than the existing SiT-XL and 4x faster than the state-of-the-art REPA method.

Takeaways, Limitations

Takeaways:
A systematic framework for integrating representational guidelines in diffusion models is presented.
Improved performance on diverse data types (images, protein sequences, molecules)
Significantly improved training speed (23.3x compared to SiT-XL, 4x compared to REPA)
Utilizing multiple modalities and designing optimal training curricula
Limitations:
Specific Limitations is not specified in the paper
Lack of information on comparisons with other recent models, other than comparisons with REPA.
Further analysis of the model's generalization ability is needed.
👍