Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Created by
  • Haebom

Author

Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Outline

This paper proposes JARVIS, a neural symbolic common sense inference framework for building conversational embodied agents that perform real-world tasks. To overcome the limitations of existing symbolic methods and end-to-end deep learning models, we utilize a large-scale language model (LLM) to acquire symbolic representations for language understanding and subgoal planning, and construct semantic maps from visual observations. The symbolic module then performs subgoal planning and action generation based on task- and action-level common sense. Experimental results using the TEACh dataset demonstrate that JARVIS achieves state-of-the-art performance on three dialogue-based embodied tasks (EDH, TfD, and TATC), significantly improving the success rate in the EDH task from 6.1% to 15.8%. Furthermore, we systematically analyze the key factors affecting task performance and demonstrate superior performance even in small-shot settings. Furthermore, we achieved first place in the Alexa Prize SimBot Public Benchmark Challenge.

Takeaways, Limitations

Takeaways:
We demonstrate that combining large-scale language models and symbolic reasoning can improve the performance and interpretability of conversational implementation agents.
Demonstrated practical applicability by achieving state-of-the-art performance on the TEACh dataset and winning first place in the Alexa Prize SimBot Public Benchmark Challenge.
It also shows excellent performance in a small number of shot learning environment.
Provides a systematic analysis of factors affecting task performance.
Limitations:
LLM dependency: It depends on the performance of LLM, and limitations of LLM may affect the performance of JARVIS.
Data Dependency: While the model performs well on the TEACh dataset, its generalization performance on other datasets requires further research.
Limitations of the definition and expression of common sense: The types of common sense used and the methods of expression may be limited, and it is necessary to integrate more abundant and diverse common sense.
Difficulties in real-world applications: Further research is needed to fully address the complexity and uncertainty of the real world.
👍