Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.87 KB

2407.13976.md

File metadata and controls

5 lines (3 loc) · 2.87 KB

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer.

近期,文本到3D生成技术引起了广泛关注,并取得了显著的性能提升。先前的方法使用端到端的3D生成模型来初始化3D高斯基元,利用多视图扩散模型来保持多视图一致性,并应用文本到图像扩散模型配合分数蒸馏算法来细化细节。然而,这些方法存在两个局限性。首先,由于不同模型旨在生成多样化的3D资产,它们在生成方向上存在冲突。其次,分数蒸馏中的过饱和问题尚未得到充分研究和解决。为了解决这些局限性,我们提出了PlacidDreamer,这是一个文本到3D的框架,它通过单一的多视图扩散模型协调初始化、多视图生成和文本条件生成,同时采用新颖的分数蒸馏算法来实现均衡的饱和度。为了统一生成方向,我们引入了Latent-Plane模块,这是一个易于训练的插件式扩展,使多视图扩散模型能够为初始化提供快速的几何重建,并生成增强的多视图图像来个性化文本到图像扩散模型。为了解决过饱和问题,我们将分数蒸馏视为多目标优化问题,并提出了均衡分数蒸馏算法,该算法提供了帕累托最优解,既能实现丰富的细节,又能保持均衡的饱和度。大量实验验证了我们的PlacidDreamer具有出色的能力。