Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Created by
  • Haebom

Author

Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang

Outline

This paper proposes Distributed Onal Test-time Adaptation (DOTA) to address the reliability degradation caused by the distributional differences between training and test data during deployment of vision-language-based models (VLMs), such as CLIP. While existing cache-based test-time adapters suffer from severe forgetting problems when samples are deleted due to limited capacity, DOTA continuously estimates the underlying distribution of the test data stream rather than simply memorizing individual samples. Using Bayes' theorem, it dynamically adapts by computing test-time posterior probabilities based on this estimated distribution. This distribution-centric approach enables the model to continuously learn and adapt to the deployment environment. Extensive experiments demonstrate that DOTA significantly mitigates the forgetting problem and achieves superior performance compared to existing methods.

Takeaways, Limitations

Takeaways:
Presenting an effective solution to the test time adaptation problem of VLMs.
Overcoming the forgetting problem, a limitation of existing cache-based methods
A distribution-centric approach enables continuous learning and adaptation.
Achieving SOTA performance
Limitations:
Further analysis of DOTA's computational complexity and efficiency is needed.
Need to verify generalization performance for various types of distribution differences
Further research is needed on scalability and stability in real-world applications.
👍