Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

ToMA: Token Merge with Attention for Diffusion Models

Created by
  • Haebom

Author

Wenbo Lu, Shaoyi Zheng, Yuxuan Xia, Shengjie Wang

Outline

To address the scalability limitations of the Transformer due to the quadratic attention complexity in diffusion models, this paper proposes Token Merge with Attention (ToMA), a GPU-friendly improvement of the token reduction technique. ToMA reformulates token merging as a submodular optimization problem to select various tokens, performs merging/splits as attention-like linear transformations using GPU-friendly matrix operations, and minimizes overhead by exploiting latent locality and sequential redundancy. Experimental results demonstrate that ToMA reduces the generation latency of SDXL and Flux models by 24% and 23%, respectively, while minimizing image quality degradation. By addressing the speedup limitations of existing token reduction methods due to GPU-inefficient computation, ToMA narrows the gap between theoretical and practical efficiency.

Takeaways, Limitations

Takeaways:
We present a novel method to overcome the scalability limitations of diffusion models through a GPU-friendly token reduction technique.
Solving GPU inefficiency issues of existing methods and achieving substantial speed improvements.
Experiments on SDXL and Flux models validate performance improvements.
A new direction for improving the efficiency of transformer-based diffusion models.
Limitations:
ToMA's performance improvements are based on experimental results for specific models (SDXL, Flux), and generalizability to other models requires further research.
Additional analysis may be required on the solution method and parameter settings of the submodule optimization problem.
Lack of performance evaluation in various hardware environments.
👍