Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.09 KB

2411.14384.md

File metadata and controls

7 lines (5 loc) · 2.09 KB

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

Existing feed-forward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric prompt images. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generalization ability of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA methods. The user study and text-to-3D applications also reveals the practical values of our method.

现有的前馈式图像到3D方法主要依赖于二维多视角扩散模型,但这些模型难以保证三维一致性。在视角变化时,这些方法容易崩溃,并且主要适用于以物体为中心的提示图像。为解决这些问题,本文提出了一种新颖的单阶段三维扩散模型 DiffusionGS,用于从单视图生成物体和场景。 DiffusionGS 在每个时间步直接输出三维高斯点云,从而强化视角一致性,使模型能够稳健地生成来自任意方向的提示视图,而不仅限于以物体为中心的输入。此外,为了提高 DiffusionGS 的生成能力和泛化能力,我们开发了一种 场景-物体混合训练策略,大规模扩展了三维训练数据。 实验表明,与现有最先进方法相比,DiffusionGS 在生成质量上表现更佳(PSNR 提高 2.20 dB,FID 降低 23.25),并且速度提高超过 5 倍(在 A100 GPU 上约为 6 秒)。用户研究和文本到3D应用进一步展示了该方法的实用价值。