Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Exploring Spatial Representation to Enhance LLM Reasoning in Aerial Vision-Language Navigation

Created by
  • Haebom

Author

Yunpeng Gao, Zhigang Wang, Pengfei Han, Linglin Jing, Dong Wang, Bin Zhao

Outline

This paper addresses aerial vision-language navigation (VLN), a novel task that enables unmanned aerial vehicles (UAVs) to navigate their external environments using natural language commands and visual cues. To address the existing challenge of spatial relationships in complex aerial scenes, this paper proposes a zero-shot framework that requires no training and utilizes a large-scale language model (LLM) as an action prediction agent. Specifically, we develop a novel Semantic-Topological-Measure Representation (STMR) that enhances the spatial reasoning capabilities of the LLM. This is achieved by extracting and projecting semantic masks associated with commands onto a top-down map, which provides spatial and topological information about surrounding landmarks and expands the map during navigation. At each step, a local map centered on the UAV is extracted from the expanded top-down map and transformed into a matrix representation containing distance measures, which serves as a text prompt for the LLM to predict actions for a given command. Experiments conducted in real and simulated environments demonstrated the effectiveness and robustness of the proposed method, achieving absolute success rates of 26.8% and 5.8%, respectively, compared to state-of-the-art methods for simple and complex navigation tasks. The dataset and code will be released soon.

Takeaways, Limitations

Takeaways:
We have improved the efficiency of aerial VLN operations through a zero-shot framework that requires no training.
We improved the performance of aerial VLN by enhancing the spatial reasoning ability of LLM through STMR.
Achieved state-of-the-art performance in real and simulated environments.
Datasets and code will be made public for future research.
Limitations:
The dataset and code are not yet public.
Further validation of generalization performance in real-world environments is needed.
Further research is needed on robustness in complex environments and unexpected situations.
👍