This paper presents a novel framework for reconstructing dynamic human-object interactions from monocular video. Existing 3D reconstruction methods assume complete visibility of static or dynamic objects, leading to poor performance, especially when mutual occlusion occurs. To address this, our framework leverages amodal completion to infer the complete structure of partially occluded regions. Unlike existing approaches that operate on individual frames, our approach incorporates temporal context to enhance consistency across video sequences and progressively improve and stabilize the reconstruction. This template-free strategy significantly improves the recovery of complex details in dynamic scenes by adapting to various conditions without relying on predefined models. We validate our approach using 3D Gaussian Splatting on challenging monocular videos, demonstrating superior accuracy in occlusion handling and maintaining temporal stability compared to existing techniques.