This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models
Created by
Haebom
Author
Anita Kriz, Elizabeth Laura Janes, Xing Shen, Tal Arbel
Outline
This paper focuses on two major Limitations of multimodal large-scale language models (MLLMs) that have high potential for use in healthcare: (i) their sensitivity to prompt design and (ii) their tendency to produce incorrect responses with high confidence. Since healthcare professionals may rely on the level of confidence expressed by a model to judge the model’s reliability, it is especially important that the model maintains high accuracy when expressing high confidence. Therefore, in this paper, we present Prompt4Trust, the first reinforcement learning (RL) framework for prompt augmentation targeting confidence calibration of MLLMs. We train a lightweight LLM to generate context-aware auxiliary prompts that induce subtask MLLMs to produce responses whose expressed confidence more accurately reflects the predicted accuracy. Unlike existing calibration techniques, Prompt4Trust prioritizes the aspects of calibration that are most important for safe and reliable clinical decision making. In addition to these clinically motivated calibration objectives, the proposed method also improves task accuracy by achieving state-of-the-art medical visual question answering (VQA) performance on the PMC-VQA benchmark, which consists of multiple-choice questions covering various medical image modes. Furthermore, the framework trained on small-scale subtask MLLMs shows promising zero-shot generalization to large-scale MLLMs in experiments, suggesting the potential for scalable calibration without the associated computational cost. This work demonstrates the potential of automated yet human-driven prompt engineering to improve the reliability of MLLMs in safety-critical environments. The codebase can be found at https://github.com/xingbpshen/prompt4trust .