Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

Created by
  • Haebom

Author

Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando

Place objects in real 3D scenes based on language instructions

Outline

This paper introduces a novel task for positioning objects in real-world 3D scenes based on verbal guidance. The model takes as input a point cloud of the 3D scene, 3D assets, and text prompts that broadly describe where the 3D assets should be placed. The task is to find valid placements of the 3D assets that adhere to the prompts. Compared to verbal guidance-based positioning tasks in 3D scenes (e.g., grounding), this task presents the unique challenge of being ambiguous, with multiple valid solutions, and requiring inference about 3D geometric relationships and empty space. This paper begins this work by presenting a novel benchmark and evaluation protocol. We also present a new dataset for training a 3D Language Learning Model (LLM) for this task, and the first method to serve as a meaningful baseline. We believe that this challenging task and new benchmark can serve as a benchmark for evaluating and comparing common 3D LLM models.

Takeaways, Limitations

Defining and Introducing New 3D Object Placement Operations
Development of new benchmarks and evaluation protocols
Building a New Dataset for 3D LLM Learning
Presenting the first methodology that serves as a meaningful baseline
Ambiguity of tasks and difficulty in reasoning about 3D geometric relationships
Potential as a benchmark for evaluating general 3D LLM models
👍