Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Created by
  • Haebom

Author

Qingmei Li, Yang Zhang, Zurong Mai, Yuhang Chen, Shuohong Lou, Henglian Huang, Jiarui Zhang, Zhiwei Zhang, Yibin Wen, Weijia Li, Haohuan Fu, Jianxi Huang, Juepeng Zheng

Outline

This paper presents AgroMind, a comprehensive benchmark for evaluating the performance of large-scale multimodal models (LMMs) in agricultural remote sensing (RS). To overcome the limitations of existing benchmarks, which often lack dataset diversity and oversimplified task design, AgroMind encompasses four task dimensions and 13 task types: spatial perception, object understanding, scene understanding, and scene inference. By integrating eight public datasets and one private farmland dataset, we constructed a high-quality evaluation set consisting of 27,247 QA pairs and 19,615 images. Evaluating 20 open-source LMMs and four closed-source models on AgroMind, we found significant performance differences, particularly in spatial inference and fine-grained recognition, with some top-performing LMMs outperforming human performance. AgroMind provides a standardized evaluation framework for agricultural RS, exposing the domain-specific limitations of LMMs and highlighting important challenges for future research. Data and code are available at https://rssysu.github.io/AgroMind/ .

Takeaways, Limitations

Takeaways:
AgroMind presents a comprehensive and standardized benchmark for agricultural remote sensing.
Provides various task types (spatial perception, object understanding, scene understanding, scene inference) for evaluating the performance of LMM.
Limitations of LMM's Domain Knowledge and Future Research Directions
Some LMMs produce results that surpass human performance.
Limitations:
Potential bias in the datasets included in the benchmark (8 public datasets + 1 private dataset)
Possible over-reliance on specific LMMs (limitations of the specific models used for evaluation)
Further analysis is needed to understand the reasons for the poor performance of LMMs in spatial reasoning and fine-grained recognition.
👍