Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Created by
  • Haebom

Author

Xiaoliu Guan, Lielin Jiang, Hanqi Chen, Xu Zhang, Jiaxing Yan, Guanzhong Wang, Yi Liu, Zetao Zhang, Yu Wu

Outline

This paper proposes a novel method to improve the inference speed of Diffusion Transformers (DiTs). Conventional TaylorSeer caches intermediate features of all transformer blocks and predicts future features through Taylor expansion. However, it suffers from significant memory and computational overhead and fails to consider prediction accuracy. In this paper, we reduce the number of cached features by shifting the Taylor prediction target to the last block and propose a dynamic caching mechanism based on the prediction error of the first block. This improves the trade-off between speed and quality, achieving inference speed increases of 3.17x, 2.36x, and 4.14x for FLUX, DiT, and Wan Video, respectively.

Takeaways, Limitations

Takeaways:
Effectively solves the high memory and computation overhead of the existing TaylorSeer, Limitations.
Flexibly adjust inference speed based on prediction accuracy through a dynamic caching mechanism.
Achieved both speed improvements and quality retention across various DiT models.
Limitations:
The effectiveness of the proposed method is highly dependent on the prediction error of the first block. Performance can be affected by the accuracy of the error estimation.
Only experimental results for specific models (FLUX, DiT, Wan Video) are presented, so generalizability to other models requires further validation.
There is a lack of detailed information on optimizing parameters of the dynamic caching mechanism (e.g., error tolerance).
👍