This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
To Trust Or Not To Trust Your Vision-Language Model's Prediction
Created by
Haebom
Author
Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink
Outline
This paper proposes TrustVLM, a training-free framework for confidence estimation of Vision-Language Models (VLMs). While VLMs demonstrate excellent performance across a variety of applications, they are prone to overconfidently making incorrect predictions. TrustVLM proposes a novel confidence score function that leverages the intermodal differences in VLMs and the more specific representation of certain concepts in image embeddings. Evaluation results using 17 diverse datasets, four architectures, and two VLMs demonstrate up to 51.87%, 9.14%, and 32.42% improvements in AURC, AUROC, and FPR95, respectively, compared to existing baseline models. This improvement in model confidence without retraining enables the secure deployment of VLMs in real-world applications. The code is available at https://github.com/EPFL-IMOS/TrustVLM .