Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SiNGER: A Clearer Voice Distills Vision Transformers Further

Created by
  • Haebom

Author

Geunhyeok Yu, Sunjae Jeong, Yoonyoung Choi, Jaeseung Kim, Hyoseok Hwang

SiNGER (Singular Nullspace-Guided Energy Reallocation) for solving high-dimensional artifact problems in Vision Transformer-based models.

Outline

This paper addresses the problem of high-dimensional artifacts arising from features in Vision Transformer-based models and proposes SiNGER (Singular Nullspace-Guided Energy Reallocation), a novel distillation framework designed to address this issue. While widely used in vision, Vision Transformer generates high-dimensional artifacts, degrading representation quality. During the knowledge distillation process, these artifacts can impact the student model, leading to overfitting on artifacts rather than useful signals. SiNGER aims to preserve useful signals while suppressing artifacts through teacher feature refinement. Specifically, it utilizes nullspace-guided perturbation to preserve information and is efficiently implemented via a LoRA-based adapter. Through extensive experiments, we demonstrate that SiNGER improves the performance of the student model, achieves state-of-the-art performance on multiple downstream tasks, and produces clearer and more interpretable representations.

Takeaways, Limitations

Takeaways:
We improved the knowledge distillation efficiency of Vision Transformer-based models by solving the high-dimensional artifact problem.
We present a novel framework that addresses the trade-off problem between artifact suppression and information preservation.
Efficient implementation is possible by utilizing LoRA-based adapters.
Achieved SOTA performance in multiple downstream tasks.
Increased interpretability of the model.
Limitations:
The paper does not contain any specific reference to Limitations.
👍