Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MovieCORE: COgnitive REasoning in Movies

Created by
  • Haebom

Author

Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu

Outline

MovieCORE is a novel video question-answering (VQA) dataset designed to explore deeper cognitive understanding of movie content. Unlike existing datasets that focus on superficial understanding, MovieCORE focuses on questions that trigger System 2 thinking while remaining specific to video material. We present an innovative agentic brainstorming approach that leverages multiple large-scale language models (LLMs) as thinking agents to generate and refine high-quality question-answer pairs. To assess the quality of the dataset, we developed a set of cognitive tests that assess depth, thought-provoking potential, and syntactic complexity. We also propose a comprehensive evaluation framework to assess VQA model performance on deeper cognitive tasks. To address the limitations of existing video-language models (VLMs), we introduce Agentic Choice Enhancement (ACE), an agentic enhancement module that improves model inference ability by up to 25% after training. This research contributes to the advancement of movie understanding in AI systems and provides valuable insights into the capabilities and limitations of current VQA models when faced with more challenging and nuanced questions about movie content. The project page, dataset, and code can be found at https://joslefaure.github.io/assets/html/moviecore.html .

Takeaways, Limitations

Takeaways:
We present MovieCORE, a new VQA dataset that assesses deep cognitive understanding of movie content.
Generating high-quality question-answer pairs through an innovative agentic brainstorming approach using LLM.
Proposing a comprehensive evaluation system for assessing the deep cognitive task performance of VQA models.
Development of ACE module to enhance VLM's inference capability.
Contributing to the advancement of AI systems' understanding of movies.
Limitations:
Lack of specific mention of the size and diversity of the MovieCORE dataset.
The performance improvements of the ACE module may be limited to specific datasets and models.
Further validation of the objectivity and reliability of the proposed cognitive test is needed.
👍