Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.6 KB

2412.02140.md

File metadata and controls

7 lines (5 loc) · 2.6 KB

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting their effectiveness in changeable environments. In contrast, we propose SparseGrasp, a novel open-vocabulary robotic grasping system that operates efficiently with sparse-view RGB images and handles scene updates fastly. Our system builds upon and significantly enhances existing computer vision modules in robotic learning. Specifically, SparseGrasp utilizes DUSt3R to generate a dense point cloud as the initialization for 3D Gaussian Splatting (3DGS), maintaining high fidelity even under sparse supervision. Importantly, SparseGrasp incorporates semantic awareness from recent vision foundation models. To further improve processing efficiency, we repurpose Principal Component Analysis (PCA) to compress features from 2D models. Additionally, we introduce a novel render-and-compare strategy that ensures rapid scene updates, enabling multi-turn grasping in changeable environments. Experimental results show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability, providing a robust solution for multi-turn grasping in changeable environment.

语言引导的机器人抓取是一个快速发展的领域,通过人类语言指令让机器人抓取特定物体。然而,现有方法通常依赖于密集相机视图,并在快速更新场景时表现不佳,限制了其在变化环境中的有效性。 与之相比,我们提出了SparseGrasp,一种新颖的开放词汇机器人抓取系统,能够高效处理稀疏视图RGB图像,并快速应对场景更新。SparseGrasp在现有计算机视觉模块的基础上显著增强了机器人学习能力。具体而言,SparseGrasp利用DUSt3R生成稠密点云作为三维高斯散点(3D Gaussian Splatting, 3DGS)的初始化,即使在稀疏监督下仍能保持高保真度。此外,SparseGrasp结合了最新视觉基础模型的语义感知能力。为进一步提高处理效率,我们重新利用主成分分析(PCA)对二维模型特征进行压缩。同时,我们引入了一种新颖的渲染与比较策略(render-and-compare strategy),确保场景快速更新,从而支持多轮抓取任务。 实验结果表明,SparseGrasp在速度和适应性方面显著优于最先进的方法,为变化环境中的多轮抓取任务提供了鲁棒的解决方案。