Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

To Trust Or Not To Trust Your Vision-Language Model's Prediction

Created by
  • Haebom

Author

Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink

Outline

This paper proposes TrustVLM, a training-free framework for confidence estimation of Vision-Language Models (VLMs). While VLMs demonstrate excellent performance across a variety of applications, they are prone to overconfidently making incorrect predictions. TrustVLM proposes a novel confidence score function that leverages the intermodal differences in VLMs and the more specific representation of certain concepts in image embeddings. Evaluation results using 17 diverse datasets, four architectures, and two VLMs demonstrate up to 51.87%, 9.14%, and 32.42% improvements in AURC, AUROC, and FPR95, respectively, compared to existing baseline models. This improvement in model confidence without retraining enables the secure deployment of VLMs in real-world applications. The code is available at https://github.com/EPFL-IMOS/TrustVLM .

Takeaways, Limitations

Takeaways:
Presenting an effective, training-free framework for addressing the reliability issues of VLMs.
Validating the Superiority of a Confidence Score Function Using Image Embedding Spaces
Significant performance improvements on a variety of datasets and architectures (based on AURC, AUROC, and FPR95).
Presenting the possibility of safe deployment of VLM in real-world applications.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Extensive experiments are needed on various types of VLMs and datasets.
Possible bias toward certain types of errors
Need for improved interpretability and transparency of the reliability score function
👍