Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Created by
  • Haebom

Author

Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan

Outline

In this paper, we propose ReconX, a novel method for performing sophisticated 3D scene reconstruction from limited input images. To address the problem that existing 3D scene reconstruction methods suffer from artifacts and distortions due to insufficient viewpoint information, ReconX reframes the sparse viewpoint reconstruction problem as a temporal generative task by leveraging the powerful generative priors of pre-trained video diffusion models. It generates a global point cloud based on input views, encodes it with context information to derive a video diffusion model, and synthesizes video frames with high 3D consistency while preserving details. Finally, it recovers the 3D scene from the generated videos via a confidence-based 3D Gaussian Splatting optimization technique. Experimental results show that ReconX outperforms existing state-of-the-art methods in terms of performance and generalization capability.

Takeaways, Limitations

Takeaways:
We demonstrate that high-quality 3D scene reconstruction is possible from limited viewpoint information.
We present a novel approach to the 3D reconstruction problem by leveraging a pre-trained video diffusion model.
Improved 3D consistency with confidence-based 3D Gaussian Splatting optimization.
Demonstrates superior performance over existing methods on various real-world datasets.
Limitations:
High reliance on pre-trained video diffusion models.
May be computationally expensive.
Generalization performance for certain types of scenes may be limited.
Performance may be affected by quality at the time of input.
👍