Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.
在三维内容中生成并插入新物体是实现多样化场景重建的一个引人注目的方法。现有依赖SDS优化或单视图修复的方法,往往难以生成高质量的结果。为了解决这一问题,我们提出了一种基于高斯分布(Gaussian Splatting)的新颖物体插入方法。该方法引入了一个多视角扩散模型,称为MVInpainter,基于预训练的稳定视频扩散模型构建,以实现视图一致的物体修复。在MVInpainter中,我们集成了一个基于ControlNet的条件注入模块,从而实现可控且更可预测的多视角生成。在生成多视角修复结果后,我们进一步提出了一种基于掩码感知的三维重建技术,以从这些稀疏修复视图中优化高斯分布重建。通过利用这些精细化的技术,我们的方法能够生成多样化的结果,确保视图一致且和谐的插入,并提升物体的质量。大量实验表明,我们的方法优于现有方法。