This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Warehouse Spatial Question Answering with LLM Agent
Created by
Haebom
Author
Hsiang-Wei Huang, Jen-Hao Cheng, Kuang-Ming Chen, Cheng-Yen Yang, Bahaa Alattar, Yi-Ru Lin, Pyongkun Kim, Sangwon Kim, Kwangju Kim, Chung-I Huang, Jenq-Neng Hwang
Outline
This paper presents a data-efficient approach to enhance the spatial understanding capabilities of existing multimodal large-scale language models (MLLMs). We propose an LLM agent system with robust, advanced spatial reasoning capabilities capable of solving challenging spatial question-answering tasks in complex indoor warehouse environments. This system integrates multiple tools, enabling the LLM agent to perform spatial reasoning and answer complex spatial questions through API tool interactions. Extensive evaluation on the 2025 AI City Challenge Physical AI Spatial Intelligence Warehouse dataset demonstrates that the proposed system achieves high accuracy and efficiency in tasks such as object search, counting, and distance estimation. The source code is available at https://github.com/hsiangwei0903/SpatialAgent .
We present a novel approach to enhance the spatial understanding of MLLM in a data-efficient manner.
◦
Achieving high accuracy and efficiency for spatial question answering tasks in complex indoor environments.
◦
Integration of spatial reasoning capabilities and interaction with various API tools through the LLM agent system.
◦
Performance verification through experimental results using the 2025 AI City Challenge dataset.
•
Limitations:
◦
Further evaluation of the proposed system's generalization performance is needed (possibly due to a lack of testing on other environments or datasets).
◦
Dependency analysis is required for the characteristics of the API tools and datasets used.
◦
Further research is needed on the complexity and scalability of the system.
◦
Additional validation is needed for application in actual commercial environments.