This is the reference PyTorch implementation for training and testing our implcit depth estimation system:
Virtual Occlusions Through Implicit Depth
Jamie Watson, Mohamed Sayed, Zawar Qureshi, Gabriel J Brostow, Sara Vicente, Oisin Mac Aodha and Michael Firman
implicit_depth_teaser.mp4
This code is for non-commercial use; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link to this repo. Thanks!
Assuming a fresh Anaconda distribution, you can install dependencies with:
conda env create -f binarydepth_env.yml
conda activate binarydepth
To download and prepare the ScanNetv2 dataset, please follow the instructions on the SimpleRecon repo.
You will need to update the data configs to point to the location of your
ScanNetv2 data. You can do this by setting dataset_path: <YOUR_DATA_LOCATION>
in the six
configs/data/scannet_*.yaml
files.
To Download the Hypersim dataset, please follow the the intructions in the Hypersim Datatset repo.
Once the dataset has been downloaded and extracted, please update the dataset_path
argument for Hypersim data configs in configs/data/
to point to the extracted dataset.
Note that the depth maps provided as part of the dataset are not planar depths and need to be planarised. We have provided helper functions to planarise the depth maps (see _get_prependicular_depths
method in datasets/hypersim_dataset.py
). The planarised depth maps can be generated with the help of the data_scripts/generate_hypersim_planar_depths.py
script:
# train
python ./data_scripts/generate_hypersim_planar_depths.py \
--data_config configs/data/hypersim_default_train.yaml \
--num_workers 8
# val
python ./data_scripts/generate_hypersim_planar_depths.py \
--data_config configs/data/hypersim_default_val.yaml \
--num_workers 8
Next, we need to generate the frame tuples similarly to ScanNetv2 dataset:
# train
python ./data_scripts/generate_train_tuples.py
--data_config configs/data/hypersim_default_train.yaml
--num_workers 8
# val
python ./data_scripts/generate_val_tuples.py
--data_config configs/data/hypersim_default_val.yaml
--num_workers 8
After the tuple generation, you should be ready to train on hypersim using the provided configs!
We provide the train and val splits we used for our experiments (see data_splits/hypersim/bd_split/train_files_bd.json
and data_splits/hypersim/bd_split/val_files_bd.json
)
We provide the following pretrained models for you to try out - we suggest using the HyperSim trained model to obtain the best qualitative results.
Model Type | Training Data | Temporal Smoothing |
---|---|---|
Implicit Depth (Ours - best qualitative results) | HyperSim | Yes |
Implicit Depth (Ours) | ScanNet | Yes |
Implicit Depth (Ours) | ScanNet | No |
Regression | ScanNet | No |
Download these to the weights/
folder.
After downloading the models, you can run our occlusion evaluation as described in the paper with:
CUDA_VISIBLE_DEVICES=0 python test_bd.py --name implicit_depth \
--output_base_path outputs \
--config_file configs/models/implicit_depth.yaml \
--load_weights_from_checkpoint weights/implicit_depth.ckpt \
--data_config configs/data/scannet_default_test.yaml \
--num_workers 8 \
--batch_size 4;
To run depth evaluation add the flag --binary_eval_depth
.
To instead run our temporal evaluation, use:
CUDA_VISIBLE_DEVICES=0 python test_bd.py --name implicit_depth \
--output_base_path outputs \
--config_file configs/models/implicit_depth.yaml \
--load_weights_from_checkpoint weights/implicit_depth.ckpt \
--data_config configs/data/scannet_default_test.yaml \
--num_workers 8 \
--batch_size 4 \
--temporal_evaluation \
--mv_tuple_file_suffix _eight_view_deepvmvs_dense.txt;
We can also evaluate regression models using:
CUDA_VISIBLE_DEVICES=0 python test_reg.py --name regression \
--output_base_path outputs \
--config_file configs/models/regression.yaml \
--load_weights_from_checkpoint weights/regression.ckpt \
--data_config configs/data/scannet_default_test.yaml \
--num_workers 8 \
--batch_size 4 \
--regression_plane_eval;
First, download our example data from here, and extract in the current folder. This should create an example_data
folder here.
Then run the following steps to prepare the dataset.
# Create a config and txt file specifically for this single sequence
python -m inference.make_config_and_txt_file \
--input-sequence-dir example_data/scans/garden_chair \
--save-dir example_data/scans/config
# Find keyframes (test tuples) for this sequence, and save to disk
python ./data_scripts/generate_test_tuples.py \
--data_config example_data/scans/config/config.yaml \
--num_workers 16
Next we can run inference on each frame in the sequence with the following.
Our implicit depth model requires a depth map as input for each frame, for example as rendered from an AR asset in the scene.
These are loaded from the folder at --rendered_depth_map_load_dir
.
The --rendered_depth_map_load_dir
flag is optional; if it isn't specified, then a fixed plane at 2m from the camera is used as the depth input for each inference frame.
You should also download our hypersim model from here.
CUDA_VISIBLE_DEVICES=0 python -m inference.inference \
--config configs/models/sr_bd_high_res_sup_pretrained_7525.yaml \
--load_weights_from_checkpoint weights/implicit_depth_temporal_hypersim.ckpt \
--data_config example_data/scans/config/config.yaml \
--rendered_depth_map_load_dir example_data/renders \
--use_prior \
--output_base_path example_data/predictions/ \
--dataset_path example_data/scans
To make AR visualisations such as the above video, you need three things:
- A folder containing:
- RGB images of a real scene, and
- A json file with camera intrinsics and extrinsics (see example file below for expected format)
- A folder of RGB and depth renderings of a virtual object for the scene.
- These are assumed to be
.png
and.npy
files respectively, named according to the same convention as the real RGB images.
- These are assumed to be
- Either:
- A folder of compositing masks, e.g. as predicted by our inference code, or
- A folder of depth maps, e.g. as predicted by a regression baseline.
You can download our example data here.
Here are some example commands to run compositing. These should save mp4
files into the --out-dir
folders.
SEQUENCE_DIR=example_data/scans/garden_chair
RENDERS_DIR=example_data/renders
# Composite with predicted *masks* (e.g. from our implicit depth method)
python -m inference.composite \
--predicted-masks-dir example_data/reference_predictions/implicit_depth \
--renders-dir $RENDERS_DIR \
--vdr-dir $SEQUENCE_DIR \
--out-dir example_data/composited/implicit_depth
# Composite with predicted *depths* (e.g. from a depth regression baseline)
python -m inference.composite \
--predicted-depths-dir example_data/reference_predictions/simplerecon_regression \
--renders-dir $RENDERS_DIR \
--vdr-dir $SEQUENCE_DIR \
--out-dir example_data/composited/regression_baseline
# Composite with *lidar* (from apple iphone)
python -m inference.composite \
--renders-dir $RENDERS_DIR \
--vdr-dir $SEQUENCE_DIR \
--out-dir example_data/composited/lidar_baseline
The commands above use reference depth and mask predictions given in our example data download. You can instead use the predictions you made in the 'Inference' step above, by using:
--predicted-masks-dir example_data/predictions/implicit_depth/render/garden_chair/ \
By default models and tensorboard event files are saved to ~/tmp/tensorboard/<model_name>
.
This can be changed with the --log_dir
flag.
We train with a batch_size of 12 with 16-bit precision on two A100s on the default ScanNetv2 split.
Example command to train with two GPUs:
CUDA_VISIBLE_DEVICES=0,1 python train_bd.py --name implicit_depth \
--log_dir logs \
--config_file configs/models/implicit_depth.yaml \
--data_config configs/data/scannet_default_train.yaml \
--lazy_load_weights_from_checkpoint weights/regression.ckpt \
--gpus 2 \
--batch_size 12;
Note that we initialise our implicit depth models using a trained regression network, so you will need to download those weights first (see above).
Alternatively, you could train your own regression network from scratch using:
CUDA_VISIBLE_DEVICES=0,1 python train.py --name regression \
--log_dir logs \
--config_file configs/models/regression.yaml \
--data_config configs/data/scannet_default_train.yaml \
--gpus 2 \
--batch_size 16;
The code supports any number of GPUs for training.
You can specify which GPUs to use with the CUDA_VISIBLE_DEVICES
environment.
See options.py
for the range of other training options, such as learning rates and ablation settings, and testing options.
Many thanks to Daniyar Turmukhambetov, Jamie Wynn, Clement Godard, and Filippo Aleotti for their valuable help and suggestions. We'd also like to thank Niantic's infrastructure team for quick actions when we needed them. Thanks folks!
If you find our work useful in your research please consider citing our paper:
@inproceedings{watson2023implict,
title={Virtual Occlusions Through Implicit Depth},
author={Watson, Jamie and Sayed, Mohamed and Qureshi, Zawar and Brostow, Gabriel J and Vicente, Sara and Mac Aodha, Oisin and Firman, Michael},
booktitle={Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023},
}
Copyright © Niantic, Inc. 2023. Patent Pending. All rights reserved. Please see the license file for terms.