Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Created by
  • Haebom

Author

Tianjun Gu, Linfeng Li, Xuhong Wang, Chenghua Gong, Jingyu Gong, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan

Outline

DORAEMON is a cognitive framework developed to overcome the limitations of zero-shot autonomous navigation based on the Visual Language Model (VLM). DORAEMON consists of Ventral and Dorsal Streams, which mimic human navigational abilities. It integrates hierarchical semantic-spatial fusion, topological maps, RAG-VLM, and Policy-VLM to address spatiotemporal discontinuities, unstructured memory representations, and insufficient task understanding. Furthermore, Nav-Assurance ensures navigational safety and efficiency. DORAEMON achieves state-of-the-art performance on the HM3D, MP3D, and GOAT datasets, and introduces a new evaluation metric, AORI, to better assess navigational intelligence.

Takeaways, Limitations

Takeaways:
Achieving state-of-the-art performance in zero-shot autonomous navigation without prior map building or pre-training.
Outperforms existing methods in success rate (SR) and success weighting by path length (SPL) metrics on HM3D, MP3D, and GOAT datasets.
Introducing a new assessment index (AORI) to better assess navigational intelligence.
We present a novel framework that mimics human navigational abilities by leveraging ventral and dorsal streams.
Limitations:
There is no specific mention of Limitations in the paper.
👍