In this paper, we propose SPORT, a method for solving the general-purpose object placement problem that follows the "something something" command. SPORT consists of three stages: object localization, target position imagination, and robot control. It performs extensive semantic inference on objects by leveraging a large-scale pre-trained vision model, and trains a diffusion-based pose estimator for pose estimation in physically realistic 3D space. By exchanging only the information on whether objects can move between the two stages, we maximize the open object recognition and localization capabilities and enable effective target pose estimation without large-scale learning. The target pose estimator is trained with data annotated and collected using GPT-4 in a simulation environment, and experimental results show that it is effective in both simulation and real environments.