Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding

Created by
  • Haebom

Author

Yuxuan Cai, Jiangning Zhang, Zhenye Gan, Qingdong He, Xiaobin Hu, Junwei Zhu, Yabiao Wang, Chengjie Wang, Zhucun Xue, Chaoyou Fu, Xinwei He, Xiang Bai

Outline

This paper proposes HV-MMBench, a new evaluation benchmark for Multimodal Large Language Models (MLLMs) specialized for human-centered video understanding. To overcome the limitations of existing benchmarks, HV-MMBench is designed to more comprehensively evaluate the capabilities of MLLMs, encompassing multiple evaluation dimensions, diverse data types, multi-domain video coverage, and temporal scope.

Takeaways, Limitations

Takeaways:
We present a new benchmark for evaluating MLLMs for human-centered video understanding, broadening the scope of model performance evaluation.
To overcome the limitations of existing benchmarks, we provide various evaluation dimensions, data types, scenarios, and time ranges.
Includes 13 tasks to assess a range of abilities, from basic attribute recognition to advanced cognitive reasoning.
Limitations:
The paper does not specifically mention Limitations. (However, given the nature of the paper, potential limitations include the early development of the benchmark, difficulties in data collection, and potential bias in certain domains.)
👍