Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Created by
  • Haebom

Author

Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

Outline

This paper proposes a novel approach for 3D mesh reconstruction from multi-view images. Inspired by large-scale reconstruction models such as LRM, it utilizes a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. We analyze and improve upon the shortcomings of existing LRM architectures to enhance multi-view image representations and enable computationally efficient training. Furthermore, we extract meshes from NeRF fields in a differentiable manner and fine-tune the NeRF model through mesh rendering to improve geometric reconstruction and enable supervision at full image resolution. While our approach achieves state-of-the-art performance, achieving a PSNR of 28.67 on the Google Scanned Objects (GSO) dataset, it struggles to reconstruct complex textures (e.g., text, portraits). To address this, we introduce a lightweight, instance-specific texture enhancement procedure that fine-tunes the triplane representation and NeRF color estimation model in just 4 seconds, improving the PSNR to 29.79 and accurately reconstructing complex textures. Furthermore, our approach enables various downstream applications, such as 3D generation from text or images.

Takeaways, Limitations

Takeaways:
Achieving state-of-the-art performance in multi-view image-based 3D mesh reconstruction (PSNR 29.79).
Improved computational efficiency through improvements to the LRM architecture.
Improved geometric reconstruction through differentiable mesh extraction and NeRF refinement.
Accurate reconstruction of complex textures through a lightweight, instance-specific texture refinement procedure.
Offers a variety of follow-up application possibilities, such as 3D creation from text or images.
Limitations:
It still has difficulties in perfectly reconstructing complex textures (text, portraits, etc.).
Further research is needed on the generalization performance of the proposed method.
👍