This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes the FastCache framework to reduce the computational cost of Diffusion Transformers (DiTs). FastCache employs a dual strategy to accelerate inference by exploiting redundancy in the model's internal representation. First, it employs a spatially aware token selection mechanism that adaptively filters out redundant tokens based on the importance of hidden states. Second, it employs a Transformer-level cache that reuses latent activations across time steps when changes are statistically insignificant. Learnable linear approximation reduces unnecessary computation while maintaining generation fidelity. Theoretical analysis demonstrates that FastCache maintains bounded approximation error under hypothesis-testing-based decision rules. Experimental evaluations of various DiT variants demonstrate significant reductions in latency and memory usage, and achieve the best generation output quality compared to other cache methods, as measured by FID and t-FID metrics. The FastCache code is available on GitHub ( https://github.com/NoakLiu/FastCache-xDiT) .
Takeaways, Limitations
•
Takeaways:
◦
We present FastCache, a novel caching and compression framework that effectively reduces the computational cost of DiT.
◦
Improving efficiency with a dual strategy of spatially aware token selection and transformer-level caching.
◦
Maintaining generation quality through learnable linear approximation.
◦
Demonstrated superior performance over other methods based on FID and t-FID metrics.
◦
Ensuring reproducibility and scalability by making the code public via GitHub.
•
Limitations:
◦
The effectiveness of the proposed method may depend on specific DiT variants and datasets.
◦
The performance of decision rules based on hypothesis testing is affected by the validity of the assumptions.
◦
Further experiments with more diverse DiT variants and larger datasets are needed.
◦
Further research may be needed on hyperparameter optimization of FastCache.