Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Created by
  • Haebom

Author

Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, Yan-Pei Cao

Outline

GeoSAM2 is a prompt-controlled framework for part segmentation of textureless 3D objects. It renders normal and point maps from a predefined viewpoint and uses simple 2D prompts (clicks or boxes) to guide part selection. A shared SAM2 backbone, augmented with LoRA and residual geometry fusion, processes the prompts, enabling view-specific inference while preserving pretrained prior information. Predicted masks are backprojected onto the object and aggregated across views. This method enables fine-grained part-specific control without text prompts, shape-specific optimization, or full 3D labels. Unlike global clustering or scale-based methods, the prompts are explicit, spatially grounded, and interpretable. It achieves state-of-the-art class-independent performance on PartObjaverse-Tiny and PartNetE, outperforming both slow optimization-based pipelines and fast but crude feed-forward approaches. This highlights a new paradigm for 3D segmentation that leverages interactive 2D inputs to increase controllability and precision in object-level part understanding, aligning with the paradigm of SAM2.

Takeaways, Limitations

Takeaways:
Provides a precise and controllable framework for segmenting 3D objects without text prompts.
Use 2D prompts to guide part selection in an intuitive and interpretable way.
It is faster than optimization-based methods and more accurate than crude feedforward methods.
We achieve state-of-the-art performance on PartObjaverse-Tiny and PartNetE datasets.
We present a new paradigm for 3D segmentation.
Limitations:
Currently, performance has only been evaluated for objects without textures. Performance on objects with textures requires further research.
Prompt types are limited to clicks and boxes. Extensions may be needed to accommodate a wider variety of prompt types.
Because it relies on the SAM2 backbone, limitations of the backbone may impact the performance of GeoSAM2.
👍