Building accurate representations of the environment is critical for intelligent robots to make decisions during deployment. Advances in photorealistic environment models have enabled robots to develop hyper-realistic reconstructions, which can be used to generate images that are intuitive for human inspection. In particular, the recently introduced 3DGS, which describes the scene with up to millions of primitive ellipsoids, can be rendered in real time. 3DGS has rapidly gained prominence. However, a critical unsolved problem persists: how can we fuse multiple 3DGS into a single coherent model? Solving this problem will enable robot teams to jointly build 3DGS models of their surroundings. A key insight of this work is to leverage the duality between photorealistic reconstructions, which render realistic 2D images from 3D structure, and 3D foundation models, which predict 3D structure from image pairs. To this end, we develop PhotoReg, a framework to register multiple photorealistic 3DGS models with 3D foundation models. As 3DGS models are generally built from monocular camera images, they have arbitrary scale. To resolve this, PhotoReg actively enforces scale consistency among the different 3DGS models by considering depth estimates within these models. Then, the alignment is iteratively refined with fine-grained photometric losses to produce high-quality fused 3DGS models. We rigorously evaluate PhotoReg on both standard benchmark datasets and our custom-collected datasets, including with two quadruped robots.
构建精确的环境表示对于智能机器人在部署期间做出决策至关重要。近年来,照片级真实感环境模型的进步使机器人能够生成超现实的重建,这些重建可以用于生成直观便于人类检查的图像。特别是最近引入的3DGS,通过多达数百万的原始椭球体来描述场景,并能够实时渲染。3DGS迅速获得了广泛关注。然而,一个关键的未解决问题仍然存在:如何将多个3DGS融合为一个连贯的模型?解决这一问题将使机器人团队能够共同构建其周围环境的3DGS模型。本工作的一个关键见解是利用照片级重建(从3D结构渲染逼真的2D图像)和3D基础模型(从图像对中预测3D结构)之间的{对偶性}。为此,我们开发了PhotoReg框架,将多个照片级真实感3DGS模型与3D基础模型进行注册。由于3DGS模型通常由单目相机图像构建,因此它们具有任意尺度。为了解决这一问题,PhotoReg通过考虑这些模型中的深度估计,主动强制不同3DGS模型之间的尺度一致性。然后,使用精细的光度损失迭代优化对齐,以生成高质量的融合3DGS模型。我们在标准基准数据集和我们自定义收集的数据集(包括两台四足机器人)上对PhotoReg进行了严格评估。