DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
We propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of the vehicle further complicates the acquisition of camera extrinsics. To tackle these challenges and achieve real-time reconstruction, we jointly train a pose network, a depth network, and a Gaussian network to predict the Gaussian primitives that represent the driving scenes. The pose network and depth network determine the position of the Gaussian primitives in a self-supervised manner, without using depth ground truth and camera extrinsics during training. The Gaussian network independently predicts primitive parameters from each input image, including covariance, opacity, and spherical harmonics coefficients. At the inference stage, our model can achieve feed-forward reconstruction from flexible multi-frame surround-view input. Experiments on the nuScenes dataset show that our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.
我们提出了DrivingForward,一种基于前馈高斯散点模型的驾驶场景重建方法,能够从灵活的环视输入中进行重建。来自车辆安装摄像头的驾驶场景图像通常较为稀疏,且重叠区域有限,同时车辆的移动进一步加大了摄像机外参获取的难度。为了解决这些挑战并实现实时重建,我们联合训练了姿态网络、深度网络和高斯网络,以预测代表驾驶场景的高斯基元。姿态网络和深度网络以自监督的方式确定高斯基元的位置,在训练过程中无需深度真值和摄像机外参。高斯网络则独立预测每个输入图像的基元参数,包括协方差、透明度和球谐系数。在推理阶段,我们的模型能够从灵活的多帧环视输入中实现前馈重建。基于nuScenes数据集的实验表明,我们的模型在重建性能上优于现有的前馈和场景优化重建方法。