Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Created by
  • Haebom

Author

Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He

Outline

WinT3R is a feed-forward reconstruction model capable of estimating accurate camera poses and high-quality point maps in real time. Existing methods suffer from a tradeoff between reconstruction quality and real-time performance. WinT3R introduces a sliding window mechanism to ensure sufficient information exchange between frames within a window, thereby improving the quality of geometric prediction without significant computational overhead. Furthermore, it leverages a compact camera representation and maintains a global camera token pool to enhance the reliability of camera pose estimation without sacrificing efficiency. Through extensive experiments on various datasets, WinT3R demonstrates state-of-the-art performance in online reconstruction quality, camera pose estimation, and reconstruction speed. The code and model are publicly available at https://github.com/LiZizun/WinT3R .

Takeaways, Limitations

Takeaways:
We demonstrate that real-time, high-quality 3D reconstruction is possible using a sliding window mechanism, a compact camera representation, and a global camera token pool.
Effectively resolves the trade-off between reconstruction quality and real-time performance of existing methods.
Achieving state-of-the-art performance in online reconstruction quality, camera pose estimation, and reconstruction speed.
Ensuring reproducibility and scalability of research by making code and models public.
Limitations:
The paper does not address specific Limitations issues. Further experiments and analyses are needed to identify Limitations issues. For example, these may include susceptibility to certain types of scene or sensor noise, and limitations in computational and memory usage.
👍