Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models.
得分提取采样(SDS)方法,即将预训练的2D扩散模型的得分提炼到3D表示中,最近在文本到3D生成任务中带来了显著的进步。然而,这种方法仍面临着严重的几何不一致问题,例如Janus问题。出发点是这种不一致问题可能由从不同视角预测的2D得分之间的多视图不一致引起,我们引入了GSD,一个简单且通用的即插即用框架,用于将3D一致性和几何意识融入SDS过程。我们的方法由三个组成部分构成:设计用于产生完美遵循标准高斯分布的3D一致噪声图的3D一致噪声化,基于几何的梯度变形用于识别不同视角预测梯度之间的对应关系,以及优化场景几何以产生更一致梯度的新颖梯度一致性损失。我们证明了我们的方法显著提高了性能,成功地解决了文本到3D生成任务中的几何不一致问题,计算成本最小,并且与现有的基于得分提取的模型兼容。