Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MESH -- Understanding Videos Like Humans: Measuring Hallucinations in Large Video Models

Created by
  • Haebom

Author

Garry Yang, Zizhe Chen, Man Hon Wong, Haoyu Lei, Yongqiang Chen, Zhenguo Li, Kaiwen Zhou, James Cheng

Outline

This paper proposes MESH, a novel benchmark for systematically evaluating hallucinations in large-scale video models (LVMs). To overcome the limitations of existing benchmarks, MESH utilizes a question-answering approach to evaluate basic objects, detailed features, and subject-action pairs in a multi-layered manner. This approach mimics the human video comprehension process, aiming to more accurately identify the causes of hallucinations in LVMs. Experimental results demonstrate that while LVMs are adept at recognizing basic objects and features, their hallucination rate increases significantly in scenes containing detailed information or complex actions of multiple subjects.

Takeaways, Limitations

Takeaways:
Overcoming the limitations of video hallucination evaluation methods that rely on existing manual classification methods, we present new evaluation criteria that reflect human perception processes.
Development of a benchmark MESH to comprehensively analyze LVM's hallucination problem and more accurately identify its causes.
Clearly present the strengths and weaknesses of LVM to suggest future model development directions.
Limitations:
Further validation of the generalization performance of the MESH benchmark is needed.
Lack of presentation of evaluation results for various types of LVM.
There is a possibility that it may not perfectly reflect the complexity of actual video data.
👍