Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 3.25 KB

2410.05259.md

File metadata and controls

5 lines (3 loc) · 3.25 KB

GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed GS-VTON has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.

基于扩散的2D虚拟试穿(VTON)技术近年来展现出了强大的性能,而3D VTON的发展却相对滞后。尽管文本引导的3D场景编辑技术有所进步,但将2D VTON集成到这些流程中以实现生动的3D VTON仍然充满挑战。原因有两个:首先,文本提示无法提供足够的细节来描述服装;其次,从同一3D场景不同视角生成的2D VTON结果缺乏连贯性和空间关系,因此经常导致外观不一致和几何变形。为了解决这些问题,我们引入了一种图像提示的3D VTON方法(称为GS-VTON),通过使用3D高斯点(3DGS)作为3D表示,能够将2D VTON模型中的预训练知识迁移到3D,并提升跨视角的一致性。具体来说:(1) 我们提出了一个个性化扩散模型,利用低秩适配(LoRA)微调,将个性化信息融入预训练的2D VTON模型中。为实现有效的LoRA训练,我们引入了一种基于参考的图像编辑方法,能够同时编辑多视角图像并确保一致性。(2) 此外,我们提出了一个面向个体的3DGS编辑框架,促进有效编辑的同时保持跨视角一致性和高质量的3D几何。(3) 我们还建立了一个新的3D VTON基准,称为3D-VTONBench,以促进全面的定性和定量3D VTON评估。通过大量实验和与现有方法的对比分析,提出的GS-VTON在保真度和高级编辑能力方面表现出色,证实了其在3D VTON中的有效性。