Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts

Created by
  • Haebom

Author

Adam Cole, Mick Grierson

Outline

This paper presents artistic and technical research on the attention mechanism of video diffusion converters. Inspired by early video artists who manipulated analog video signals to create new visual aesthetics, this study proposes a method for extracting and visualizing cross-attention maps from generative video models. Built on the open-source Wan model, this tool provides an interpretable window into the temporal and spatial behavior of attention in text-to-video generation. Through exploratory research and artistic case studies, we explore the potential of utilizing attention maps as both an analytical tool and raw artistic material. This research contributes to the growing field of Explainable AI for Art (XAIxArts), inviting artists to reclaim the inner workings of AI as a creative medium.

Takeaways, Limitations

Takeaways:
A novel method for visualizing and analyzing the attention mechanism of video diffusion transformers is presented.
Increased understanding of the text-to-video generation process.
Suggesting the possibility of using attention maps in artistic creative activities.
Contribution to the field of XAIxArts.
Limitations:
Because the study is based on the Wan model, generalizability to other models is limited.
Subjectivity exists in the interpretation of attention maps.
Limited scope of artistic case studies.
👍