This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
MULTI: Multimodal Understanding Leaderboard with Text and Images
Created by
Haebom
Author
Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu
Outline
This paper introduces MULTI, a Chinese multimodal dataset built from real-world test questions to evaluate the performance of multimodal large-scale language models (MLLMs). MULTI assesses image-text understanding, complex reasoning, and knowledge recall and contains over 18,000 carefully selected problems. We also present MULTI-Elite, a dataset comprised of more challenging problems, and MULTI-Extend, which tests contextual learning. The Qwen2-VL-72B model achieved an accuracy of 76.9% on MULTI and 53.1% on MULTI-Elite, achieving the highest performance among 25 models, but falling short of human expert benchmarks.
Takeaways, Limitations
•
Takeaways:
◦
Providing a realistic MLLM performance evaluation platform utilizing actual test questions.
◦
Various evaluations are possible through MULTI, MULTI-Elite, and MULTI-Extend.
◦
Confirming the development potential of MLLM
◦
Contributing to the development of expert-level AI
•
Limitations:
◦
Qwen2-VL-72B showed the best performance, but fell short of human experts' performance.
◦
There is room for improvement in model performance.
◦
The language of the dataset is limited to Chinese.