Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating strategy: semantics help to extract coherent instances in 3D space, while the resulting instances regularize the injection of stable semantics from 2D space. Additionally, we adopt a 2D-3D joint contrastive loss to enhance the complementarity between view-consistent 3D geometry and rich semantics during the bootstrapping process, enabling FreeGS to uniformly perform tasks such as novel-view semantic segmentation, object selection, and 3D object detection. Extensive experiments on LERF-Mask, 3D-OVS, and ScanNet datasets demonstrate that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
在 3D 高斯点云表示(3D Gaussian Splatting, 3DGS)中注入语义信息,近年来引起了广泛关注。尽管当前的方法通常依赖从 2D 基础模型(如 CLIP 和 SAM)提取 3D 语义特征,以促进新视图分割和语义理解,但其对 2D 监督的高度依赖可能会削弱跨视图语义一致性,同时需要复杂的数据准备过程,从而阻碍了视图一致的场景理解。 为此,我们提出了 FreeGS,一种无监督的语义嵌入 3DGS 框架,可在无需 2D 标签的情况下实现视图一致的 3D 场景理解。与直接学习语义特征不同,我们引入了 身份耦合语义场(IDentity-coupled Semantic Field, IDSF) 到 3DGS 中,该方法为每个高斯点捕获语义表示和视图一致的实例索引。 我们通过两步交替策略优化 IDSF:语义用于在 3D 空间中提取一致的实例,而提取出的实例则对从 2D 空间注入稳定语义起到正则化作用。此外,我们采用 2D-3D 联合对比损失,增强视图一致的 3D 几何与丰富语义之间的互补性,在引导过程中支持 FreeGS 统一执行多种任务,例如新视图语义分割、对象选择和 3D 对象检测。 在 LERF-Mask、3D-OVS 和 ScanNet 数据集上的广泛实验表明,FreeGS 的性能与当前最先进的方法相当,同时避免了复杂的数据预处理工作量。这验证了 FreeGS 的高效性和实用性,为语义注入的 3D 场景理解提供了一种新方向。