The development of 3D human avatars from multi-view videos represents a significant yet challenging task in the field. Recent advancements, including 3D Gaussian Splattings (3DGS), have markedly progressed this domain. Nonetheless, existing techniques necessitate the use of high-quality sharp images, which are often impractical to obtain in real-world settings due to variations in human motion speed and intensity. In this study, we attempt to explore deriving sharp intrinsic 3D human Gaussian avatars from blurry video footage in an end-to-end manner. Our approach encompasses a 3D-aware, physics-oriented model of blur formation attributable to human movement, coupled with a 3D human motion model to clarify ambiguities found in motion-induced blurry images. This methodology facilitates the concurrent learning of avatar model parameters and the refinement of sub-frame motion parameters from a coarse initialization. We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system. Comprehensive evaluations demonstrate that our model surpasses existing baselines.
从多视角视频生成 3D 人体化身是一个重要但具有挑战性的任务。近期的进展,包括 3D 高斯投影(3D Gaussian Splattings, 3DGS),显著推动了这一领域的发展。然而,现有技术通常要求高质量、清晰的图像,而在现实场景中,由于人体运动速度和强度的变化,这种要求往往难以满足。 本研究尝试以端到端的方式,从模糊视频中推导出清晰的内在 3D 人体高斯化身。我们的方法包括一个 3D 感知且基于物理的模糊形成模型,用于描述人体运动引起的模糊,以及一个 3D 人体运动模型,用于澄清运动引起的模糊图像中的歧义。该方法能够同时学习化身模型参数,并从粗略的初始化中优化子帧运动参数。 我们通过基于现有多视角捕获数据生成的合成数据集,以及通过 360 度同步混合曝光相机系统采集的真实数据集,建立了该任务的基准数据集。全面评估结果表明,我们的模型在性能上显著超越了现有基线方法。