Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding

Created by
  • Haebom

Author

Austin T. Wang, ZeMing Gong, Angel X. Chang

Outline

This paper addresses the task of finding objects in 3D scenes referenced by natural language text in 3D Visual Grounding (3DVG). Recent research has focused on extending LLM-based 3DVG datasets, but these datasets have limitations in that they do not cover all kinds of questions that can be expressed in English. Therefore, this paper proposes a framework to linguistically analyze 3DVG prompts and introduces ViGiL3D, a diagnostic dataset for evaluating visual grounding methods for various language patterns. We evaluate existing open-vocabulary 3DVG methods and show that they still lack the ability to understand and identify targets on more difficult and out-of-distribution questions for practical applications.

Takeaways, Limitations

Takeaways: We present ViGiL3D, a 3DVG dataset containing various language patterns, to reveal the limitations of existing methods and suggest future research directions. We establish performance evaluation criteria for 3DVG models for more diverse and difficult questions closer to real-world applications.
Limitations: The ViGiL3D dataset may not perfectly cover all possible language patterns. Further research is needed on the generalization performance of the proposed framework and dataset. While it clearly shows the limitations of the real-world applicability of current 3DVG methods, it lacks specific suggestions for improvement.
👍