Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible. However, the tracking performance still lacks behind traditional and end-to-end SLAM systems. An optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video. In this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques. Our framework DroidSplat achieves both SotA tracking and rendering results on common SLAM benchmarks. We implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's. Recent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics.
近期在场景合成方面的进展使得基于渲染目标优化超基元的独立 SLAM 系统成为可能。然而,其跟踪性能仍落后于传统和端到端的 SLAM 系统。尤其是在单目视频中,鲁棒性、速度和精度之间的最佳平衡尚未实现。 本文提出了一种基于端到端跟踪器的 SLAM 系统,并结合了基于最新 3D 高斯投影(3D Gaussian Splatting)技术的渲染器。我们的框架 DroidSplat 在常见 SLAM 基准测试中同时实现了最先进(SotA)的跟踪和渲染性能。 我们实现了多个现代 SLAM 系统的关键模块,使其能够并行运行,从而在常见消费级 GPU 上实现快速推理。得益于单目深度预测和相机校准领域的最新进展,我们的系统即使在未知相机内参的自然场景数据上也能获得优异的结果。