Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation.
同时定位与建图(SLAM)在静态环境中已经取得了显著的性能。然而,SLAM 在动态环境中的应用仍然是一个未解决的问题。许多方法直接过滤掉动态物体,导致场景重建不完整以及摄像机定位精度受限。其他工作通过点云、稀疏关节或粗糙网格表示动态物体,但无法提供照片级真实感的表现。 为了解决上述限制,我们提出了一种通过扩展高斯投影实现照片级真实感和几何感知的 RGB-D SLAM 方法。我们的方法由三个主要模块组成:1)映射包括非刚体人类和刚体物品在内的动态前景,2)重建静态背景,3)定位摄像机。为了映射前景,我们专注于建模形变和/或运动,并结合人类的形状先验以及几何和外观约束来表示人类和物品。对于背景重建,我们设计了一种通过将外观约束整合到几何对齐中的邻域局部地图优化策略。至于摄像机定位,我们利用静态背景和动态前景来增加观测量以补偿噪声。 我们通过将 3D 高斯与 2D 光流和像素块关联,探索几何和外观约束。在多个真实世界数据集上的实验表明,我们的方法在摄像机定位和场景表示方面优于最先进的方法。