Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Created by
  • Haebom

Author

Seongheon Park, Sharon Li

Outline

Object hallucinations in large-scale vision-language models pose significant challenges for secure deployment in real-world applications. Recent studies have proposed object-level hallucination scores to estimate the likelihood of object hallucinations, but these methods typically employ only global or local perspectives independently, which can limit detection reliability. In this paper, we introduce GLSim, a novel, training-free object hallucination detection framework that leverages complementary global and local embedding similarity cues between image and text modalities, enabling more accurate and reliable hallucination detection across a variety of scenarios. Through comprehensive benchmarking of existing object hallucination detection methods, we demonstrate that GLSim achieves significantly better detection performance than competitive baselines.

Takeaways, Limitations

Takeaways:
GLSim is a learning-free object hallucination detection framework that combines global and local embedding similarity signals to provide more accurate and reliable hallucination detection than existing methods.
It demonstrates excellent detection performance in various scenarios.
Limitations:
There is no specific mention of Limitations in the paper itself.
👍