Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes

1University of Amsterdam 2Singapore Management University 3Netherlands Cancer Institute

In this work, we propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) for interactive 3D segmentation. NPISeg3D addresses the two critical challenges of interactive 3D segmentation: (1) few-shot generalization from sparse user clicks and (2) predictive uncertainty quantification, which are under-explored in the literature. To the best of our knowledge, NPISeg3D is the first probabilistic work that enables uncertainty estimation for interactive 3D segmentation.

More videos and applications are coming soon!

Abstract

Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentations and (2) quantifying predictive uncertainty to help users identify unreliable regions.

In this work, we propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges. Specifically, NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization by capturing both global context and object-specific characteristics. Additionally, we design a probabilistic prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model's ability to capture object-aware context and quantify predictive uncertainty.

Experiments on four 3D point cloud datasets demonstrate that NPISeg3D achieves superior segmentation performance with fewer clicks while providing reliable uncertainty estimations.

Method

Method overview

Framework of NPISeg3D. We formulate interactive 3D segmentation as a probabilistic modeling problem with neural processes. Given a 3D scene \(S\) and a user click set \(\mathcal{C}\), a point encoder encodes them into click prototypes \(\mathbf{X}_C\) (context data) and scene features \(\mathbf{X}_T\) (target data). Then, we introduce two hierarchical latent variables: scene-level latent variable \(\mathbf{z}_s\) and object-level latent variable \(\mathbf{z}_o\), to enable probabilistic modeling and capture contextual information across hierarchical levels. In probabilistic prototype modulator, each object-specific latent variable is utilized to generate object-specific weights \((\gamma, \beta)\), which modulate its corresponding click prototypes, thereby enhancing few-shot generalization and providing reliable uncertainty estimation. The posterior distributions of the latent variables are inferred from the target set \((\mathbf{X}_T,\mathbf{Y}_T)\), which supervise the prior during training.

Qualitative Results

Below we present some visualization of interactive 3D segmentation. As shown, NPISeg3D achieves strong few-shot generalization, i.e., high-quality segmentation with fewer clicks, and robust uncertainty estimation across datasets like ScanNet, S3DIS, and KITTI-360.

BibTeX

@article{jie2024probabilistic,
  author    = {Liu, Jie and Zhou, Pan and Xiao, Zehao and Shen, Jiayi and Yin, Wenzhe and Sonke, Jan-Jakob and Gavves, Efstratios},
  title     = {Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes},
  journal   = {ICML},
  year      = {2025},
}