Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Frequency-Dynamic Attention Modulation for Dense Prediction

Created by
  • Haebom

Author

Linwei Chen, Lin Gu, Ying Fu

Outline

In this paper, we propose a frequency dynamic attention modulation (FDAM) technique to solve the frequency loss problem, which is a major Limitations of vision transformers (ViTs). The existing attention mechanism of ViTs acts as a low-pass filter, which causes the loss of detailed information and texture, whereas FDAM directly modulates the frequency response of ViTs through two techniques: attention inversion (AttInv), which generates high-frequency filtering by inverting the attention matrix, and frequency dynamic scaling (FreqScale), which weights various frequency components. It demonstrates performance improvement in various models such as SegFormer, DeiT, and MaskDINO in tasks such as semantic segmentation, object detection, and instance segmentation, and achieves state-of-the-art performance in the field of remote sensing detection in particular.

Takeaways, Limitations

Takeaways:
FDAM, a new technique to effectively solve the frequency loss problem of ViTs, is presented
Precise control of the frequency response of ViTs through attention inversion (AttInv) and frequency dynamic scaling (FreqScale).
Consistent performance improvements across a variety of vision transformer models and tasks
Achieving state-of-the-art performance in remote sensing detection
Ensuring reproducibility through open code
Limitations:
There is a possibility that the effects of FDAM may be biased toward certain models or tasks (further experiments and analyses are needed).
Potential increase in computational costs (research on efficient implementation methods is needed)
Additional experiments on different datasets and hyperparameters are needed.
👍