In this paper, we propose Pixels-to-Graph (Pix2G), a lightweight method for generating a structured scene graph in real time from image pixels and LiDAR maps for autonomous exploration of unknown environments on resource-constrained robotic platforms. We leverage the 3D scene graph to bridge the gap between human operators’ 2D Building Information Models (BIMs) and robots’ 3D maps, enabling real-time processing using only the CPU. The output is a denoised 2D bottom-up environment map and a structured 3D point cloud, which are seamlessly connected into a multilayer graph that abstracts information from object level to building level. Through real-world experiments using a NASA JPL NeBula-Spot robot, we quantitatively and qualitatively evaluate the performance of real-time autonomous exploration and mapping in complex environments such as a garage and an urban office.