Open-set 3D segmentation represents a major point of interest for multiple downstream robotics and augmented/virtual reality applications. Recent advances introduce 3D Gaussian Splatting as a computationally efficient representation of the underlying scene. They enable the rendering of novel views while achieving real-time display rates and matching the quality of computationally far more expensive methods. We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations and semantic segmentation foundation models. The pipeline proposes class-agnostic masks based on a 3D reconstruction of the scene. Given the resulting class-agnostic masks, we use a class-aware 2D foundation model to add class annotations to the 3D masks. We test this pipeline with 3D Gaussian Splatting and different 2D segmentation models and achieve better performance than more tailored approaches while also significantly increasing the modularity.
开放集三维分割是机器人和增强/虚拟现实等多个下游应用中的重要研究方向。最近的研究引入了三维高斯点云(3D Gaussian Splatting)作为一种计算高效的场景表示方法,不仅能够渲染新视图,还能实现实时显示速率,同时在质量上可与计算成本更高的方法相媲美。为提升适应性和模块化,我们提出了一种解耦的三维分割流程,以适配新型三维表示和语义分割基础模型。 该流程基于场景的三维重建生成与类别无关的掩码,然后利用一个类别感知的二维基础模型为三维掩码添加类别注释。我们在三维高斯点云以及不同的二维分割模型上测试了这一流程,与更为定制化的方法相比,不仅取得了更优的性能,还显著提升了流程的模块化程度。