Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Improving Progressive Generation with Decomposable Flow Matching

Created by
  • Haebom

Author

Moayed Haji-Ali, Willi Menapace, Ivan Skorokhodov, Arpit Sahni, Sergey Tulyakov, Vicente Ordonez, Aliaksandr Siarohin

Outline

In this paper, we propose Decomposable Flow Matching (DFM), a novel framework that independently applies Flow Matching to each level of a user-defined multi-scale representation (e.g., Laplacian pyramid) to address the computational cost of high-dimensional visual modality generation. DFM overcomes the complexity of existing multi-stage generative models (requiring custom diffusion formulas, decomposition-dependent stage transitions, temporal samplers, or model cascades) and improves the visual quality of images and videos with a single model. Experimental results show that the FDD score is improved by 35.2% over the baseline architecture and by 26.4% over the state-of-the-art baseline on the Imagenet-1k 512px dataset. Furthermore, when applied to fine-tuning large-scale models such as FLUX, it shows faster convergence speed on the training distribution. This is possible with minimal modifications to the existing training pipeline.

Takeaways, Limitations

Takeaways:
A novel method to effectively reduce the computational cost of generating high-dimensional visual modalities is presented.
Improve image and video quality based on a single model
Solving the complexity problem of existing multi-stage generation models
Achieving fast convergence rates when fine-tuning large models
Requires minimal modifications to existing training pipelines
Limitations:
Further research is needed on the generalization performance of the proposed method.
Need for applicability and performance evaluation for various multi-scale representations
Results are for a specific dataset and model, and performance under other conditions requires further validation.
👍