[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

Created by
  • Haebom

Author

Kwanyoung Kim, Byeongsu Sim

Outline

In this paper, we propose a novel efficient method, called PLADIS, to address the issue that existing distributed diffusion models require additional training or neural function evaluation (NFE) when using guided techniques (e.g., Classifier-Free Guidance) for generating high-quality conditional samples. PLADIS enhances the pre-trained U-Net/Transformer models by extrapolating query-key correlations using softmax and its sparse counterpart at the cross-attention layer during the inference process. By leveraging the noise robustness of sparse attention without additional training or NFE, we overcome the difficulties of existing models and enhance text alignment and human preference. It integrates seamlessly with guided techniques including guided distillation models.

Takeaways, Limitations

Takeaways:
We present an efficient method to improve the performance of pre-trained text-to-image diffusion models without additional learning or NFE.
Leverages sparse attention to facilitate seamless integration with guided technologies.
Shows notable performance improvements in terms of text alignment and human preference.
Provides a general solution applicable to both U-Net and Transformer models.
Limitations:
There is a lack of specific reference to Limitations in the method presented in this paper.
Additional experimental results on different models and datasets are needed.
There is a lack of detailed explanation on the specific parameter settings and optimization of sparse attention.
👍