We propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at this https URL
我们提出了 SelfSplat,一种新颖的 3D 高斯投影模型,旨在从未配准的多视角图像中进行无位姿和无 3D 先验的可泛化 3D 重建。这种设置由于缺乏真实数据、已学习的几何信息以及无需微调情况下实现准确 3D 重建的需求,天生具有病态性,使得传统方法难以获得高质量的结果。 我们的模型通过有效整合显式 3D 表示与自监督的深度和位姿估计技术,解决了这些挑战,从而在位姿精度和 3D 重建质量上实现了相互促进的改进。此外,我们引入了匹配感知的位姿估计网络和深度优化模块,以增强跨视角的几何一致性,从而确保更准确且更稳定的 3D 重建。 为了展示我们方法的性能,我们在大规模真实数据集(包括 RealEstate10K、ACID 和 DL3DV)上进行了评估。实验结果表明,SelfSplat 在外观和几何质量方面均优于之前的最先进方法,同时展现了强大的跨数据集泛化能力。广泛的消融研究和分析进一步验证了我们方法的有效性。