Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables high-quality details of generation results. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation.
使用潜在扩散模型在开发新型 3D 生成技术方面已被证明是有效的。要利用潜在扩散模型,一个关键挑战是设计一种高保真且高效的表示方式,将潜在空间和 3D 空间连接起来。本文介绍了 Atlas Gaussians,一种用于前馈本地 3D 生成的新型表示。Atlas Gaussians 将形状表示为局部补丁的并集,每个补丁可以解码 3D 高斯。我们将补丁参数化为一系列特征向量,并设计了一个可学习的函数,从特征向量中解码 3D 高斯。在此过程中,我们结合了基于 UV 的采样,允许生成足够大且理论上无限的 3D 高斯点。大量的 3D 高斯点能够生成高质量的细节。此外,由于表示的局部特性,基于变压器的解码过程在补丁级别上操作,确保了效率。我们训练了一个变分自编码器来学习 Atlas Gaussians 表示,然后在其潜在空间上应用潜在扩散模型进行 3D 生成。实验表明,我们的方法优于以前的前馈本地 3D 生成技术。