Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System

Created by
  • Haebom

Author

Yiye Chen, Harpreet Sawhney, Nicholas Gyd e, Yanan Jian, Jack Saunders, Patricio Vela, Ben Lundell

Outline

This paper presents a scene graph as a structured and serializable environmental representation for spatial reasoning based on a large-scale language model (LLM). We propose SG², an iterative, schema-based scene graph inference framework based on a multi-agent LLM. Each agent consists of two modules: a reasoner module (Reasoner), which plans abstract tasks and generates graph information queries, and a retrieval module (Retriever), which extracts relevant graph information by writing code based on the queries. These two modules iteratively collaborate to enable sequential inference and adaptive attention to graph information. A scene graph schema presented to both modules streamlines the inference and retrieval processes and guides their collaboration. This eliminates the need to present the entire graph data to the LLM, thereby reducing the potential for hallucinations due to irrelevant information. Experiments in various simulated environments demonstrate that the proposed framework outperforms existing LLM-based approaches and baseline single-agent, tool-based reason-while-retrieve strategies on numerical question-answering and planning tasks.

Takeaways, Limitations

Takeaways:
Improving spatial inference performance of LLM through an iterative inference framework based on multi-agent LLM.
Reducing the hallucination problem and increasing inference efficiency in LLM by leveraging scene graph schema.
Superior performance compared to existing single-agent methods verified in multiple simulation environments.
Demonstrated effective performance in numerical question-answering and planning tasks.
Limitations:
Further research is needed to determine the generalizability of the proposed framework.
Applicability verification is required for various types of scene graphs and complex environments.
Performance evaluation and applicability studies in real-world environments are needed.
Further research is needed on efficient collaboration strategies between multi-agent LLMs.
👍