Skip to content

[ECCV 2024] SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

License

Notifications You must be signed in to change notification settings

ShuhongLL/SGS-SLAM

Repository files navigation

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

ECCV 2024

Mingrui Li*, Shuhong Liu*, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang

(*equal contribution)

Table of Contents

Overview
  1. Abtract
  2. Installation
  3. Download Dataset
  4. Usage
  5. Saving and Visualization
  6. Logging
  7. Acknowledgement
  8. Citation

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on 3D Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.

Logo

Installation

conda create -n sgs-slam python=3.9
conda activate sgs-slam
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 cudatoolkit=11.8 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Download Dataset

DATAROOT is ./data by default. Please change the input_folder path in the scene-specific config files if datasets are stored elsewhere.

Replica

You can download the Replica dataset with ground-truth semantic masks from this link. The original Replica scenes, created by the Meta research team, are accessible through their official repository. By accessing the dataset via the provided link, you consent to the terms of the license. The ground-truth semantic masks were generated by us refering to the preprocessing procedure of Semantic-NeRF. Additionally, the camera trajectories in the dataset were captured using iMAP.

ScanNet

Please follow the data downloading procedure on the ScanNet website, and extract color/depth/semantic frames from the .sens file using the following preprocessing step:

python preprocess/scannet/run.py --input_folder [input path] --output_folder [output path] --export_depth_images --export_color_images --export_poses --export_intrinsics --export_seg --label_map_file preprocess/scannet/scannetv2-labels.combined.tsv
[Directory structure of ScanNet (click to expand)]
  DATAROOT
  └── scannet
        └── scene0000_00
            └── frames
                ├── color
                │   ├── 0.jpg
                │   ├── 1.jpg
                │   ├── ...
                │   └── ...
                ├── depth
                │   ├── 0.png
                │   ├── 1.png
                │   ├── ...
                │   └── ...
                ├── intrinsic
                └── pose
                    ├── 0.txt
                    ├── 1.txt
                    ├── ...
                    └── ...

We use the following sequences following the previous studies:

scene0000_00
scene0059_00
scene0106_00
scene0181_00
scene0207_00

ScanNet++

Please follow the data downloading and image undistortion procedure on the ScanNet++ website.

We use the following sequences following SplaTAM:

8b5caf3398
b20a261fdf

To extract the ground-truth semantic masks, please follow the guidelines on its official repo

Usage

Logo

We use the Replica dataset as an example. Similar approaches apply to other datasets as well.

Run the slam system:

python scripts/slam.py configs/replica/slam.py

Run the post optimization after the slam system:

python scripts/post_slam_opt.py configs/replica/post_slam_opt.py

Visualize the reconstruction in an online mannar:

python viz_scripts/online_recon.py configs/replica/slam.py

Visualize the final reconstructed scenes and manipulate the scene:

python viz_scripts/tk_recon.py configs/replica/slam.py

Saving and Visualization

By default, the system stores the reconstructed scenes in .npz format, which includes both appearance and semantic features. Additionally, we save the final RGB and semantic maps in .ply format for easier visualization. You can view the scenes using any 3DGS viewer, such as SuperSplat. For interactive rendering, as illustrated above, we adopt the Open3D viewer from SplaTAM.

Logging

We use weights and biases for the logging. To enable this, set the wandb flag to True in the configuration file and specify the wandb_folder path. Make sure to adjust the entity configuration to match your account. Each scene is associated with a config folder where you must define the input_folder and output paths. Setting wandb=False to disable the online logging.

Acknowledgement

Our work is based on SplaTAM, and by using or modifying this work further, you agree to adhere to their terms of usage and include the license file. We extend our sincere gratitude for their outstanding contributions. We would also like to thank the authors of the following repositories for making their code available as open-source:

Citation

If you find our work useful, please kindly cite us:

@article{li2024sgs,
  title={Sgs-slam: Semantic gaussian splatting for neural dense slam},
  author={Li, Mingrui and Liu, Shuhong and Zhou, Heng and Zhu, Guohao and Cheng, Na and Deng, Tianchen and Wang, Hongyu},
  journal={arXiv preprint arXiv:2402.03246},
  year={2024}
}

About

[ECCV 2024] SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published