Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Created by
  • Haebom

Author

Wen Huang, Jiarui Yang, Tao Dai, Jiawei Li, Shaoxiong Zhan, Bin Wang, Shu-Tao Xia

Outline

RelayFormer is an integrated framework for solving the Visual Manipulation Localization (VML) problem, which identifies manipulated regions in images and videos. To address resolution diversity and modality disparity, RelayFormer segments the input image into fixed-size sub-images and introduces Global-Local Relay (GLR) tokens and the global-local relay attention (GLRA) mechanism to enable efficient context exchange. RelayFormer seamlessly adapts to arbitrary resolutions and video sequences, providing a unified model for both images and videos.

Takeaways, Limitations

Resolution adaptability: Adapts to different resolutions without interpolation or excessive padding, improving processing efficiency.
Unified modeling: Use a single model for both images and videos.
Performance and Efficiency Balance: Achieving SOTA performance while maintaining a balance between accuracy and computational cost.
Limitations: The specific Limitations is not explicitly mentioned in the paper.
👍