Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Created by
  • Haebom

Author

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, Linfeng Zhang

Outline

This paper proposes TaylorSeer to address the high computational cost of the Diffusion Transformer (DiT), which excels at high-resolution image and video synthesis. Existing feature caching methods suffer from increased error due to decreased feature similarity at large time intervals. TaylorSeer overcomes this limitation by predicting features at future time steps based on feature values from previous time steps. It leverages the slow and continuous change of features across time steps to approximate higher-order derivatives through Taylor series expansion and predict future features. Experimental results demonstrate that TaylorSeer achieves high speed-up ratios in image and video synthesis, achieving 4.99x and 5.00x accelerations with virtually no loss in performance on FLUX and HunyuanVideo, respectively. In DiT, it achieves 4.53x acceleration while reducing FID by 3.41x compared to the previous state-of-the-art performance.

Takeaways, Limitations

Takeaways:
We present a novel feature prediction method that effectively addresses the computational cost problem of DiT.
Demonstrating the efficiency and accuracy of future feature prediction using Taylor series expansion.
Achieve high acceleration rates and excellent performance in image and video synthesis.
Expanding real-time application possibilities through near-lossless high-speed performance.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Applicability and performance evaluation for various diffusion models are needed.
Possible degradation of feature prediction accuracy for very large time intervals.
👍