Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images

Created by
  • Haebom

Author

Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-S ebastien Franco, Gr egory Rogez

Outline

This paper addresses hand-object 3D reconstruction, a growing topic in applications such as human-robot interaction and immersive AR/VR experiences. Conventional approaches for object-agnostic hand-object reconstruction from RGB sequences involve a two-stage pipeline: hand-object 3D tracking followed by multi-view 3D reconstruction. However, existing methods rely on keypoint detection techniques such as SfM and hand keypoint optimization, which struggle with diverse object geometries, weak textures, and mutual hand-object occlusion, limiting scalability and generalizability. As a key element for general, smooth, and non-intrusive applicability, this study proposes a robust, keypoint-detector-free approach for estimating hand-object 3D transformations from monocular motion videos/images. Furthermore, by integrating this approach with a multi-view reconstruction pipeline, we accurately recover hand-object 3D shape. Our method, named HOSt3R, is unconstrained and does not rely on pre-scanned object templates or internal camera parameters. It achieves state-of-the-art performance on the SHOWMe benchmark for object-agnostic hand-to-object 3D transformation and shape estimation. We also demonstrate generalization to unseen object categories by conducting experiments on sequences from the HO3D dataset.

Takeaways, Limitations

Takeaways:
A robust hand-object 3D reconstruction method that does not rely on keypoint detectors is presented.
Works without pre-scanned object templates or internal camera parameters
Achieving cutting-edge performance in the SHOWMe benchmark
Checking generalizability to categories of invisible objects
Limitations:
The paper does not explicitly address the specific Limitations. Further experiments and analysis are needed to identify the Limitations. For example, potential performance degradation due to specific lighting conditions or object complexity should be addressed through further research.
👍