Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Efficient Generative Model Training via Embedded Representation Warmup

Created by
  • Haebom

Author

Deyuan Liu, Peng Sun, Xufeng Li, Tao Lin

Outline

Diffusion models excel at generating high-dimensional data, but their training efficiency and representation quality are inferior to self-supervised learning methods. This paper reveals that the lack of high-quality, semantically rich representations during training is a key bottleneck. Systematic analysis identifies a crucial representation processing region (early layer) where semantic and structural pattern learning primarily occurs before the model performs generation. To address this, we propose Embedded Representation Warmup (ERW), a plug-and-play framework that initializes the early layer of a diffusion model with high-quality, pre-trained representations, acting as a warmup. This warmup reduces the burden of learning representations from scratch, thereby accelerating convergence and improving performance. The effectiveness of ERW relies on its precise integration into a specific neural network layer (the representation processing region), where the model primarily processes and transforms feature representations for subsequent generation. ERW not only accelerates training convergence but also improves representation quality, experimentally achieving a 40x training speedup compared to the existing state-of-the-art method, REPA.

Takeaways, Limitations

Takeaways:
We present an ERW framework that dramatically improves the training speed of diffusion models (40x speedup).
Improved representation quality of diffusion models.
Reduce the burden of training early layers by leveraging high-quality pre-trained representations.
Elucidating the importance of the expression processing area.
Limitations:
The effectiveness of ERW depends on its precise integration into specific neural network layers (representation processing areas). Further research is needed to determine whether this methodology is general enough to be applied to all models.
Further verification of the generality of the presented code and its applicability to various models is needed.
👍