We present FaceLift, a feed-forward approach for rapid, high-quality, 360-degree head reconstruction from a single image. Our pipeline begins by employing a multi-view latent diffusion model that generates consistent side and back views of the head from a single facial input. These generated views then serve as input to a GS-LRM reconstructor, which produces a comprehensive 3D representation using Gaussian splats. To train our system, we develop a dataset of multi-view renderings using synthetic 3D human head as-sets. The diffusion-based multi-view generator is trained exclusively on synthetic head images, while the GS-LRM reconstructor undergoes initial training on Objaverse followed by fine-tuning on synthetic head data. FaceLift excels at preserving identity and maintaining view consistency across views. Despite being trained solely on synthetic data, FaceLift demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art methods in 3D head reconstruction, highlighting its practical applicability and robust performance on real-world images. In addition to single image reconstruction, FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates with 2D reanimation techniques to enable 3D facial animation.
我们提出了FaceLift,这是一种前馈方法,用于从单张图像快速、高质量地进行360度头部重建。我们的流程首先使用多视图潜在扩散模型,从单一面部输入生成一致的头部侧面和背面视图。这些生成的视图随后作为GS-LRM重建器的输入,后者使用高斯点云生成全面的三维表示。为了训练我们的系统,我们开发了一个使用合成三维人头资产的多视图渲染数据集。基于扩散的多视图生成器仅在合成头部图像上进行训练,而GS-LRM重建器则先在Objaverse上进行初步训练,然后在合成头部数据上进行微调。FaceLift在保持身份特征和各视图之间的一致性方面表现出色。尽管仅在合成数据上进行训练,FaceLift在真实世界图像上的泛化能力表现出色。通过广泛的定性和定量评估,我们展示了FaceLift在三维头部重建方面优于最先进的方法,突显了其在实际应用中的可行性和在真实世界图像上的稳健表现。除了单张图像重建,FaceLift还支持视频输入用于4D新颖视图合成,并与二维再动画技术无缝集成,实现三维面部动画。