Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Created by
  • Haebom

Author

Somraj Gautam, Abhirama Subramanyam Penamakuri, Abhishek Bhandari, Gaurav Harit

Outline

MMCRICBENCH-3K is a visual question answering (VQA) benchmark for cricket scorecards, designed to evaluate complex numerical and cross-lingual reasoning capabilities on semi-structured tabular images. It consists of 1,463 synthetic scorecard images in ODI, T20, and Test formats and 1,500 English QA pairs. It is divided into two subsets: MMCRICBENCH-E-1.5K, which contains English scorecards, and MMCRICBENCH-H-1.5K, which contains visually similar Hindi scorecards. All questions and answers are kept in English, enabling controlled cross-script evaluation. This task requires reasoning on structured numerical data, multi-image context, and implicit domain knowledge. Experimental results show that even state-of-the-art LVLMs, such as GPT-4o and Qwen2.5VL, struggle on the English subset and perform even worse on the Hindi subset. This highlights key limitations in structure-aware visual text understanding, numerical reasoning, and cross-lingual generalization. The dataset is publicly available via Hugging Face ( https://huggingface.co/datasets/DIALab/MMCricBench) .

Takeaways, Limitations

Takeaways: We present MMCRICBENCH-3K, a new benchmark for evaluating numerical and cross-language reasoning on semi-structured tabular images. We reveal the limitations of state-of-the-art LVLMs in structure-aware visual text understanding, numerical reasoning, and cross-language generalization. This publicly available dataset facilitates related research.
Limitations: The dataset is based on synthetic data. It currently supports only two languages: English and Hindi. It is limited to the specific domain of cricket scorecards, requiring further research to determine its generalizability.
👍