OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
Feed-forward 3D Gaussian Splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are getting more popular since they reduce the computation for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a pioneering work for fast feed-forward 3DGS generation from a few omnidirectional images. We introduce Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Furthermore, we enhance the segmentation consistency between omnidirectional images by leveraging attention from the encoder of OmniSplat, providing fast and clean 3DGS editing results.
前馈式3D高斯点绘(3D Gaussian Splatting, 3DGS)模型因其无需对每个场景进行优化即可直接生成场景而备受关注。随着全景图像因其减少图像拼接计算量而逐渐流行,现有的前馈模型却仍然仅针对透视图像设计。全景图像独特的光学特性使得特征编码器难以正确理解图像上下文,从而导致高斯点在空间上的非均匀分布,进而影响从新视角生成图像的质量。 我们提出了OmniSplat,这是从少量全景图像中快速生成3DGS的开创性方法。我们引入了阴阳网格(Yin-Yang grid),并基于此对图像进行分解,以缩小全景图像与透视图像之间的领域差距。阴阳网格能够直接使用现有的卷积神经网络(CNN)结构,同时其准均匀特性使得分解后的图像类似于透视图像,从而能够利用已有前馈网络中的强先验知识。 实验表明,OmniSplat 在全景图像上的重建精度优于现有基于透视图像训练的前馈网络。此外,我们通过利用 OmniSplat 编码器的注意力机制增强了全景图像之间的分割一致性,从而提供快速且整洁的3DGS编辑效果。