Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering

Created by
  • Haebom

Author

Kai Cheng, Zhengyuan Li, Xingpeng Sun, Byung-Cheol Min, Amrit Singh Bedi, Aniket Bera

Outline

This paper addresses Embedded Question Answering (EQA), a critical yet challenging task for robotic assistants. Existing approaches either treat static video Q&A as static video Q&A or limit answers to closed-ended choices, hindering practical application. To overcome these limitations, we present EfficientEQA, a novel framework that combines efficient exploration with free-form answer generation. EfficientEQA features three key innovations: (1) efficient exploration via Semantic-Value-Weighted Frontier Exploration (SFE) using Verbalized Confidence (VC) from a black-box VLM; (2) a BLIP-based mechanism that adaptively stops exploration by flagging highly relevant observations as outliers; and (3) a Retrieval-Augmented Generation (RAG) method that accurately answers based on relevant images from the agent's observation history without relying on predefined choices. Experimental results show that EfficientEQA achieves over 15% higher accuracy than state-of-the-art methods and requires over 20% fewer exploration steps.

Takeaways, Limitations

Takeaways:
We present EfficientEQA, a novel EQA framework that combines efficient exploration and free-form answer generation.
Achieves higher accuracy (over 15%) and fewer exploration steps (over 20%) than existing methods.
Innovative technologies such as Semantic-Value-Weighted Frontier Exploration (SFE), BLIP-based adaptive exploration stopping, and Retrieval-Augmented Generation (RAG) are presented.
Increasing the applicability of real-world robot assistants
Limitations:
High reliance on black box VLMs. EfficientEQA's performance can be significantly affected by the performance of the VLM.
Currently, it has only been tested in a specific environment, so generalization performance verification in various environments is needed.
Further research is needed on the interactions and optimization between the SFE, BLIP, and RAG modules.
👍