3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection
Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on [email protected] and +8.1 on [email protected] for the ScanNet dataset, and impressive +31.5 on [email protected] for the ARKITScenes dataset.
神经辐射场(NeRF)广泛应用于新视图合成,并已被改编用于3D目标检测(3DOD),通过视图合成表示提供了一种有前途的3DOD方法。然而,NeRF存在一些固有的限制:(i)由于其隐式特性,3DOD的表示能力有限;(ii)渲染速度较慢。最近,3D高斯散射(3DGS)作为一种显式的3D表示出现,解决了这些限制。受其优势的启发,本文首次将3DGS引入3DOD领域,识别出两个主要挑战:(i)高斯斑点的空间分布不明确:3DGS主要依赖2D像素级监督,导致高斯斑点的3D空间分布不清晰,难以区分物体和背景,阻碍了3DOD的效果;(ii)背景斑点过多:2D图像通常包含大量背景像素,导致3DGS重建的背景中充斥着大量噪声高斯斑点,影响检测表现。为应对挑战(i),我们利用3DGS重建来自2D图像的事实,提出了一种优雅且高效的解决方案,通过引入2D边界引导显著增强高斯斑点的空间分布,使物体与背景之间的区分更加清晰。针对挑战(ii),我们提出了一种基于2D框的聚焦采样策略,通过在3D空间生成物体概率分布,进行有效的概率采样,保留更多物体斑点并减少噪声背景斑点。得益于我们的设计,3DGS-DET显著超越了基于NeRF的SOTA方法NeRF-Det,在ScanNet数据集上[email protected]提升了6.6,[email protected]提升了8.1,而在ARKITScenes数据集上[email protected]更是提升了31.5。