Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Warehouse Spatial Question Answering with LLM Agent

Created by
  • Haebom

Author

Hsiang-Wei Huang, Jen-Hao Cheng, Kuang-Ming Chen, Cheng-Yen Yang, Bahaa Alattar, Yi-Ru Lin, Pyongkun Kim, Sangwon Kim, Kwangju Kim, Chung-I Huang, Jenq-Neng Hwang

Outline

This paper presents a data-efficient approach to enhance the spatial understanding capabilities of existing multimodal large-scale language models (MLLMs). We propose an LLM agent system with robust, advanced spatial reasoning capabilities capable of solving challenging spatial question-answering tasks in complex indoor warehouse environments. This system integrates multiple tools, enabling the LLM agent to perform spatial reasoning and answer complex spatial questions through API tool interactions. Extensive evaluation on the 2025 AI City Challenge Physical AI Spatial Intelligence Warehouse dataset demonstrates that the proposed system achieves high accuracy and efficiency in tasks such as object search, counting, and distance estimation. The source code is available at https://github.com/hsiangwei0903/SpatialAgent .

Takeaways, Limitations

Takeaways:
We present a novel approach to enhance the spatial understanding of MLLM in a data-efficient manner.
Achieving high accuracy and efficiency for spatial question answering tasks in complex indoor environments.
Integration of spatial reasoning capabilities and interaction with various API tools through the LLM agent system.
Performance verification through experimental results using the 2025 AI City Challenge dataset.
Limitations:
Further evaluation of the proposed system's generalization performance is needed (possibly due to a lack of testing on other environments or datasets).
Dependency analysis is required for the characteristics of the API tools and datasets used.
Further research is needed on the complexity and scalability of the system.
Additional validation is needed for application in actual commercial environments.
👍