Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 2.94 KB

2412.04469.md

File metadata and controls

8 lines (6 loc) · 2.94 KB

QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos

Online free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored. It requires incremental on-the-fly updates to a volumetric representation, fast training and rendering to satisfy real-time constraints and a small memory footprint for efficient transmission. If achieved, it can enhance user experience by enabling novel applications, e.g., 3D video conferencing and live volumetric video broadcast, among others. In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). QUEEN directly learns Gaussian attribute residuals between consecutive frames at each time-step without imposing any structural constraints on them, allowing for high quality reconstruction and generalizability. To efficiently store the residuals, we further propose a quantization-sparsity framework, which contains a learned latent-decoder for effectively quantizing attribute residuals other than Gaussian positions and a learned gating module to sparsify position residuals. We propose to use the Gaussian viewspace gradient difference vector as a signal to separate the static and dynamic content of the scene. It acts as a guide for effective sparsity learning and speeds up training. On diverse FVV benchmarks, QUEEN outperforms the state-of-the-art online FVV methods on all metrics. Notably, for several highly dynamic scenes, it reduces the model size to just 0.7 MB per frame while training in under 5 sec and rendering at 350 FPS.

在线自由视角视频(FVV)流媒体是一项具有挑战性但研究较少的问题。该任务要求对体素表示进行实时增量更新,快速训练和渲染以满足实时约束,同时具有较小的内存占用以实现高效传输。如果实现,该技术能够显著提升用户体验,支持如3D视频会议和实时体积视频直播等新型应用。 在本文中,我们提出了一种用于流式 FVV 的新框架:基于三维高斯点绘(3D-GS)的量化高效编码框架(QUEEN)。QUEEN 直接在每个时间步学习连续帧之间的高斯属性残差,无需对其施加结构性约束,从而实现高质量的重建和良好的泛化能力。 为了高效存储残差,我们进一步提出了一种量化-稀疏性框架,其中包含一个学习的潜在解码器,用于有效量化除高斯位置以外的属性残差,以及一个学习的门控模块,用于稀疏化位置残差。此外,我们利用高斯视空间梯度差向量作为信号,将场景的静态和动态内容分离。此信号可以引导有效的稀疏学习并加速训练。 在多个 FVV 基准数据集上,QUEEN 在所有指标上均超越现有最先进的在线 FVV 方法。尤其是在一些高度动态的场景中,QUEEN 将模型大小降低到每帧仅 0.7 MB,训练时间少于 5 秒,并以 350 FPS 的速度渲染。