Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 3 KB

2412.09648.md

File metadata and controls

7 lines (5 loc) · 3 KB

DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models

Generating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion Models. On the other hand, 2D Diffusion Models, which have been successfully applied to denoise multiview images, show potential for generating a wide range of photorealistic 3D outputs but still fall short on explicit 3D priors and consistency. In this work, we aim to bridge these two approaches by introducing DSplats, a novel method that directly denoises multiview images using Gaussian Splat-based Reconstructors to produce a diverse array of realistic 3D assets. To harness the extensive priors of 2D Diffusion Models, we incorporate a pretrained Latent Diffusion Model into the reconstructor backbone to predict a set of 3D Gaussians. Additionally, the explicit 3D representation embedded in the denoising network provides a strong inductive bias, ensuring geometrically consistent novel view generation. Our qualitative and quantitative experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction. When evaluated on the Google Scanned Objects dataset, DSplats achieves a PSNR of 20.38, an SSIM of 0.842, and an LPIPS of 0.109.

生成高质量的三维内容需要能够学习复杂场景和其中真实世界对象分布的模型。最近基于高斯的三维重建技术通过以前馈方式预测三维高斯,实现了从稀疏输入图像中恢复高保真三维资产的出色成果。然而,这些技术通常缺乏扩展的先验知识和扩展性,而这些是扩散模型所能提供的。另一方面,尽管二维扩散模型已成功应用于对多视图图像去噪,并展现出生成多种真实感三维输出的潜力,但它们在明确三维先验和一致性方面仍然存在不足。 在本研究中,我们提出了一种名为 DSplats 的新方法,通过使用基于高斯点云重建器直接对多视图图像进行去噪,以生成多样化的逼真三维资产。为了利用二维扩散模型的丰富先验知识,我们在重建器框架中引入了预训练的潜在扩散模型(Latent Diffusion Model),用于预测一组三维高斯。此外,嵌入到去噪网络中的显式三维表示提供了强大的归纳偏置,从而确保了几何一致的全新视图生成。 我们的定性和定量实验表明,DSplats 不仅能够生成高质量、空间一致的输出,还为单张图像到三维重建设立了新标杆。在 Google Scanned Objects 数据集上的评估结果显示,DSplats 实现了 PSNR 20.38,SSIM 0.842,以及 LPIPS 0.109 的优秀表现。