GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Arxiv 2025
Ken Deng1,4* Yunhan Yang2* Jingxiang Sun3* Xihui Liu2 Yebin Liu3 Ding Liang1 Yan-Pei Cao1
* Equal Contribution,
1 VAST, 2 The University of Hong Kong, 3 Tsinghua University, 4 Sun Yat-sen University,

Abstract

We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation that casts the task as multi-view 2D mask prediction. Given a textureless mesh, we render normal and point maps from predefined viewpoints and accept simple 2D prompts—clicks or boxes—to guide part selection. These prompts are processed by a shared SAM2 backbone augmented with LoRA and residual geometry fusion, enabling view-specific reasoning while preserving pretrained priors. The predicted masks are back-projected to the mesh, aggregated across views, and refined via k-NN voting. Our method enables fine-grained, part-specific control without requiring text prompts, per-shape optimization, or full 3D labels. In contrast to global clustering or scale-based methods, prompts are explicit, spatially grounded, and interpretable. We achieve state-of-the-art class-agnostic performance on PartObjaverse-Tiny and PartNetE, outperforming both slow optimization-based pipelines and fast but coarse feedforward approaches. Our results highlight a new paradigm: aligning interactive 2D inputs with 3D segmentation unlocks controllability and precision in mesh-level part understanding.

Method Overview

Our pipeline renders 12-view normal and point maps of an object, arranged into an inverse-clockwise video sequence. Users can annotate any frame with 2D prompts, which serves as the video's starting frame. GeoSAM2 processes these inputs by first encoding each frame's normal and point maps using pretrained (frozen) image encoders fine-tuned with LoRA, fusing their features, and decoding masks. The per-frame 2D masks are then projected onto a 3D point cloud using camera poses, with visibility-aware voting assigning consistent labels across views.

Hierarchical Segmentation

Segmentation Results

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

Find3D

SAMPart3D

SAMesh

Partfield

Ours

Ground Truth

BibTeX

@misc{deng2025geosam2unleashingpowersam2,
    title={GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation}, 
    author={Ken Deng and Yunhan Yang and Jingxiang Sun and Xihui Liu and Yebin Liu and Ding Liang and Yan-Pei Cao},
    year={2025},
    eprint={2508.14036},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2508.14036}, 
}