SORT3D proposes a method for interpreting object reference language and specifying objects using spatial relationships and properties in 3D environments for robots working with humans. Unlike existing methods that struggle with the complexity of diverse scenes, numerous fragmented objects, and free-form language references, SORT3D leverages the rich object properties of 2D data and combines a heuristic-based spatial inference toolbox with the sequential inference capabilities of large-scale language models (LLMs). It does not require training using text-to-3D data and can be applied zero-shot to new environments. In two benchmarks, we achieve state-of-the-art zero-shot performance on complex view-dependent grounding tasks, and by implementing a pipeline running in real-time on two autonomous vehicles, we demonstrate its applicability for object target exploration in previously unseen real-world environments. The source code is publicly available.