Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Created by
  • Haebom

Author

Hao Lu, Jiahao Wang, Yaolun Zhang, Ruohui Wang, Xuanyu Zheng, Yepeng Tang, Dahua Lin, Lewei Lu

Outline

This paper focuses on the hallucination problem in Video Multimodal Large-Scale Language Models (Video-MLLMs), specifically Semantic Aggregation Hallucination (SAH) occurring in long-duration videos. Unlike previous studies that have simplified the causes of hallucination by focusing on short videos, this paper redefines SAH, which occurs during complex semantic processing in long-duration videos, and presents a new benchmark, ELV-Halluc, for this purpose. Using ELV-Halluc, we confirm the presence of SAH, analyze its correlation with semantic complexity and rapid semantic changes, and experimentally verify the effectiveness of positional encoding strategies and dynamic positional offset (DPO) strategies for SAH mitigation. Utilizing 8,000 adversarial data pairs, we improve model performance and achieve a 27.7% reduction in SAH rate.

Takeaways, Limitations

Takeaways:
We define a new type of hallucination in long-term videos, SAH, and present a new benchmark, ELV-Halluc, for it.
To analyze the causes and characteristics of SAH and to elucidate its correlation with semantic complexity and the rate of semantic change.
Presentation of effective strategies (position encoding strategy, DPO strategy) for SAH mitigation and verification of performance improvement.
Contributed to improving the performance of Video-MLLMs for long-term video understanding.
Limitations:
The data size of the ELV-Halluc benchmark may be relatively small at 8,000.
Further validation of the generalization performance of the proposed SAH mitigation strategy is needed.
There may be a lack of evaluation of the applicability of SAH incidence and mitigation strategies for different types of Video-MLLMs.
👍