Skip to content

HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction

Notifications You must be signed in to change notification settings

Willyzw/HI-SLAM2

Repository files navigation

-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction

Wei Zhang · Qing Cheng · David Skuddis · Niclas Zeller · Daniel Cremers · Norbert Haala

Logo

HI-SLAM2 constructs a 3DGS map (a) from monocular input, achieving accurate mesh reconstructions (b) and high-quality renderings (c). It surpasses existing monocular SLAM methods in both geometric accuracy and rendering quality while achieving faster runtime.

Table of Contents
  1. Getting Started
  2. Data Preparation
  3. Run Demo
  4. Run Evaluation
  5. Acknowledgement
  6. Citation

Getting Started

  1. Clone the repo with submodules
git clone --recursive https://github.com/Willyzw/HI-SLAM2
  1. Create a new Conda environment and then activate it. Please note that we use the PyTorch version compiled by CUDA 11.8 in the environment.yaml file.
conda env create -f environment.yaml
conda activate hislam2
  1. Compile the CUDA kernel extensions (takes about 10 minutes). Please note that this process assume you have CUDA 11 installed, not 12. To look into the installed CUDA version, you can run nvcc --version in the terminal.
python setup.py install
  1. Download the pretrained weights of Omnidata models for generating depth and normal priors
wget https://zenodo.org/records/10447888/files/omnidata_dpt_normal_v2.ckpt -P pretrained_models
wget https://zenodo.org/records/10447888/files/omnidata_dpt_depth_v2.ckpt -P pretrained_models

Data Preparation

Replica

Download and prepare the Replica dataset by running

bash scripts/download_replica.sh
python scripts/preprocess_replica.py

where the data is converted to the expected format and put to data/Replica folder.

ScanNet

Please follow the instructions in ScanNet to download the data and put the extracted color/pose/intrinsic from the .sens files to data/ScanNet folder as following:

[Folder structure (click to expand)]
  scene0000_00
  ├── color
  │   ├── 000000.jpg
  │   └── ...
  ├── intrinsic
  │   └── intrinsic_color.txt
  └── pose
  │   ├── 000000.txt
  │   └── ...

Then run the following script to convert the data to the expected input format

python scripts/preprocess_scannet.py

We take the following sequences for evaluation: scene0000_00, scene0054_00, scene0059_00, scene0106_00, scene0169_00, scene0181_00, scene0207_00, scene0233_00.

Run Demo

After preparing the Replica dataset, you can run HI-SLAM2 for a demo. It takes about 2 minutes to run the demo on an Nvidia RTX 4090 GPU. The result will be saved in the outputs/room0 folder including the estimated camera poses, the Gaussian map, and the renderings. To visualize the constructing process of the Gaussian map, using the --gsvis flag. To visualize the intermediate results e.g. estimated depth and point cloud, using the --droidvis flag.

python demo.py \
--imagedir data/Replica/room0/colors \
--calib calib/replica.txt \
--config config/replica_config.yaml \
--output outputs/room0 \
[--gsvis] # Optional: Enable Gaussian map display
[--droidvis] # Optional: Enable point cloud display

To generate the TSDF mesh from the reconstructed Gaussian map, you can run

python tsdf_integrate.py --result outputs/room0 --voxel_size 0.01 --weight 2

Run Evaluation

Replica

Run the following script to automate the evaluation process on all sequences of the Replica dataset. It will evaluate the tracking error, rendering quality, and reconstruction accuracy.

python scripts/run_replica.py

ScanNet

Run the following script to automate the evaluation process on the selected 8 sequences of the ScanNet dataset. It will evaluate the tracking error and rendering quality.

python scripts/run_scannet.py

Run your own data

HI-SLAM2 supports casual video recordings from smartphone or camera (demo above with iPhone 15). To use your own video data, we provide a preprocessing script that extracts individual frames from your video and runs COLMAP to automatically estimate camera intrinsics. Run the preprocessing with:

python scripts/preprocess_owndata.py PATH_TO_YOUR_VIDEO PATH_TO_OUTPUT_DIR

once the intrinsics are obtained, you can run HI-SLAM2 by using the following command:

python demo.py \
--imagedir PATH_TO_OUTPUT_DIR/images \
--calib PATH_TO_OUTPUT_DIR/calib.txt \
--config config/owndata_config.yaml \
--output outputs/owndata \
--undistort --droidvis --gsvis

there are some other command line arguments you can use:

  • --undistort undistort the image if distortion parameters are provided in the calib file
  • --droidvis visualize the point cloud map and the intermediate results
  • --gsvis visualize the Gaussian map
  • --buffer max number of keyframes to pre-allocate memory for (default: 10% of total frames). Increase this if you encounter the error: IndexError: index X is out of bounds for dimension 0 with size X.
  • --start start frame index (default: from the first frame)
  • --length number of frames to process (default: all frames)

Acknowledgement

We build this project based on DROID-SLAM, MonoGS, RaDe-GS and 3DGS. The reconstruction evaluation is based on evaluate_3d_reconstruction_lib. We thank the authors for their great works and hope this open-source code can be useful for your research.

Citation

Our paper is available on arXiv. If you find this code useful in your research, please cite our paper.

@article{zhang2024hi2,
  title={HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction},
  author={Zhang, Wei and Cheng, Qing and Skuddis, David and Zeller, Niclas and Cremers, Daniel and Haala, Norbert},
  journal={arXiv preprint arXiv:2411.17982},
  year={2024}
}