ChatSplat: 3D Conversational Gaussian Splatting

Humans naturally interact with their 3D surroundings using language, and modeling 3D language fields for scene understanding and interaction has gained growing interest. This paper introduces ChatSplat, a system that constructs a 3D language field, enabling rich chat-based interaction within 3D space. Unlike existing methods that primarily use CLIP-derived language features focused solely on segmentation, ChatSplat facilitates interaction on three levels: objects, views, and the entire 3D scene. For view-level interaction, we designed an encoder that encodes the rendered feature map of each view into tokens, which are then processed by a large language model (LLM) for conversation. At the scene level, ChatSplat combines multi-view tokens, enabling interactions that consider the entire scene. For object-level interaction, ChatSplat uses a patch-wise language embedding, unlike LangSplat's pixel-wise language embedding that implicitly includes mask and embedding. Here, we explicitly decouple the language embedding into separate mask and feature map representations, allowing more flexible object-level interaction. To address the challenge of learning 3D Gaussians posed by the complex and diverse distribution of language embeddings used in the LLM, we introduce a learnable normalization technique to standardize these embeddings, facilitating effective learning. Extensive experimental results demonstrate that ChatSplat supports multi-level interactions -- object, view, and scene -- within 3D space, enhancing both understanding and engagement.

人类自然地通过语言与三维环境交互，而针对场景理解和交互的三维语言场建模正引起越来越多的关注。本文介绍了ChatSplat，这是一种构建三维语言场的系统，能够在三维空间中实现丰富的基于对话的交互。与现有主要使用基于CLIP的语言特征并仅专注于分割的方式不同，ChatSplat在三个层次上实现交互：对象、视角和整个三维场景。在视角层次，ChatSplat设计了一种编码器，用于将每个视角的渲染特征图编码为令牌，这些令牌随后由大型语言模型（LLM）处理以支持对话。在场景层次，ChatSplat结合了多视角令牌，实现了考虑整个场景的交互。在对象层次，ChatSplat采用了基于patch的语言嵌入，与LangSplat的基于像素的语言嵌入（隐式包含掩码和嵌入）不同，这里明确地将语言嵌入解耦为单独的掩码和特征图表示，从而实现更灵活的对象级交互。针对LLM中语言嵌入复杂多样分布对三维高斯学习带来的挑战，我们引入了一种可学习的归一化技术，用于标准化这些嵌入，从而促进高效学习。大量实验结果表明，ChatSplat支持三维空间中的多层次交互（对象、视角和场景），显著增强了场景理解和交互体验。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2412.00734.md

2412.00734.md

ChatSplat: 3D Conversational Gaussian Splatting

Files

2412.00734.md

Latest commit

History

2412.00734.md

File metadata and controls

ChatSplat: 3D Conversational Gaussian Splatting