[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FOCUS: Fused Observation of Channels for Unveiling Spectra

Created by
  • Haebom

Author

Xi Xiao, Aristeidis Tsaris, Anika Tabassum, John Lagergren, Larry M. York, Tianyang Wang, Xiao Wang

Outline

This paper presents FOCUS, a novel framework for improving the interpretability of Vision Transformer (ViT) on high-resolution hyperspectral imagery (HSI) data. To address the challenges of existing ViT interpretation methods that struggle to capture meaningful spectral information and are computationally expensive, FOCUS introduces class-specific spectral prompts and noise-absorbing SINK tokens. This allows it to generate robust and interpretable 3D cellulosity maps and spectral importance curves in a single forward pass, improving performance without modifying the underlying model. FOCUS improves band-level IoU by 15%, reduces attention decay by more than 40%, and produces cellulosity results that are consistent with expert annotations. It achieves high-resolution ViT interpretability with only a small number of additional parameters (<1%), bridging the gap between black-box modeling and reliable HSI decision making.

Takeaways, Limitations

Takeaways:
An efficient method is presented to significantly improve the interpretability of ViT in high-resolution HSI data.
A unique approach utilizing class-specific spectrum prompts and SINK tokens.
Enhancing performance and interpretability without modifying existing models.
Increased accuracy and reliability through improved band-level IoU and reduced attention decay.
A practical method applicable to real-world hyperspectral applications.
Limitations:
Further research is needed to determine whether the method presented in this paper guarantees the same performance for all types of HSI data and ViT architectures.
A more detailed explanation and analysis of the design and learning process of the SINK token is needed.
Lack of comparative analysis with other interpretable ViT models.
👍