This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
Created by
Haebom
Author
Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie
Outline
In this paper, we build on recent advances in generalizable Gaussian Splatting, which enables robust 3D reconstruction from sparse input views. We propose the TextSplat framework, which leverages text-based guidance to accurately reconstruct fine details of complex scenes. Unlike existing methods that focus on geometric consistency, TextSplat focuses on enhancing semantic understanding through text-based guidance. It obtains complementary representations using three parallel modules: a diffusion dictionary depth estimator for accurate depth information, a semantically aware segmentation network for detailed semantic information, and a multi-view interaction network for improved cross-view features. These representations are then integrated through a text-based attention-based feature aggregation mechanism to generate enhanced 3D Gaussian parameters rich in detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance over existing methods across multiple evaluation metrics. The code will be publicly available.
Takeaways, Limitations
•
Takeaways:
◦
We present the first framework that improves the performance of generalizable Gaussian Splatting by leveraging text-based guidance.
◦
Achieving high-fidelity 3D reconstruction by improving the alignment of geometric and semantic information.
◦
Obtain complementary representations through various modules and effectively integrate them into text-based attention mechanisms.
◦
Demonstrated superior performance over existing methods on several benchmark datasets.
◦
Ensuring reproducibility and scalability of research through open code provision.
•
Limitations:
◦
Lack of detailed analysis of the computational cost and processing time of the proposed method.
◦
Further validation is needed for robustness and generalization performance across diverse text inputs.
◦
There may be a bias towards certain types of scenes or objects.
◦
Further research is needed to evaluate performance and applicability in real-world environments.