This paper introduces a novel task for positioning objects in real-world 3D scenes based on verbal guidance. The model takes as input a point cloud of the 3D scene, 3D assets, and text prompts that broadly describe where the 3D assets should be placed. The task is to find valid placements of the 3D assets that adhere to the prompts. Compared to verbal guidance-based positioning tasks in 3D scenes (e.g., grounding), this task presents the unique challenge of being ambiguous, with multiple valid solutions, and requiring inference about 3D geometric relationships and empty space. This paper begins this work by presenting a novel benchmark and evaluation protocol. We also present a new dataset for training a 3D Language Learning Model (LLM) for this task, and the first method to serve as a meaningful baseline. We believe that this challenging task and new benchmark can serve as a benchmark for evaluating and comparing common 3D LLM models.