Skip to content

Implement NGPs encoding into the NeRF-in-the-wild

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE_bmild
Notifications You must be signed in to change notification settings

chatzisceid/nerf-w_NGP

Repository files navigation

nerf_pl

Unofficial implementation of NeRF-W (NeRF in the wild) using pytorch (pytorch-lightning). I try to reproduce (some of) the results on the lego dataset (Section D). Training on Phototourism real images (as the main content of the paper) has also passed. Please read the following sections for the results.

The code is largely based on NeRF implementation (see master or dev branch), the main difference is the model structure and the rendering process, which can be found in the two files under models/.

💻 Installation

Hardware

  • OS: Ubuntu 18.04
  • NVIDIA GPU with CUDA>=10.2 (tested with 1 RTX2080Ti)

Software

  • Clone this repo by git clone hhttps://github.com/chatzisceid/nerf-w_NGP.git
  • Python>=3.6 (installation via anaconda is recommended, use conda create -n nerf_pl python=3.6 to create a conda environment and activate it by conda activate nerf_pl)
  • Python libraries
    • Install core requirements by pip install -r requirements.txt

🔑 Training

Update: There is a difference between the paper: I didn't add the appearance embedding in the coarse model while it should. Please change this line to self.encode_appearance = encode_appearance to align with the paper.

Blender

Steps

Data download

Download nerf_synthetic.zip from here

Data perturbations

All random seeds are fixed to reproduce the same perturbations every time. For detailed implementation, see blender.py.

  • Color perturbations: Uses the same parameters in the paper.

color

  • Occlusions: The square has size 200x200 (should be the same as the paper), the position is randomly sampled inside the central 400x400 area; the 10 colors are random.

occ

  • Combined: First perturb the color then add square.

combined

Training model

Base:

python train.py \
   --dataset_name blender \
   --root_dir $BLENDER_DIR \
   --N_importance 64 --img_wh 400 400 --noise_std 0 \
   --num_epochs 20 --batch_size 1024 \
   --optimizer adam --lr 5e-4 --lr_scheduler cosine \
   --exp_name exp

Add --encode_a for appearance embedding, --encode_t for transient embedding.

Add --data_perturb color occ to perturb the dataset.

Example:

python train.py \
   --dataset_name blender \
   --root_dir $BLENDER_DIR \
   --N_importance 64 --img_wh 400 400 --noise_std 0 \
   --num_epochs 20 --batch_size 1024 \
   --optimizer adam --lr 5e-4 --lr_scheduler cosine \
   --exp_name exp \
   --data_perturb occ \
   --encode_t --beta_min 0.1

To train NeRF-U on occluders (Table 3 bottom left).

See opt.py for all configurations.

You can monitor the training process by tensorboard --logdir logs/ and go to localhost:6006 in your browser.

Example training loss evolution (NeRF-U on occluders):

log

Phototourism dataset

Steps

Data download

Download the scenes you want from here (train/test splits are only provided for "Brandenburg Gate", "Sacre Coeur" and "Trevi Fountain", if you want to train on other scenes, you need to clean the data (Section C) and split the data by yourself)

Download the train/test split from the "Additional links" here and put under each scene's folder (the same level as the "dense" folder)

(Optional but highly recommended) Run python prepare_phototourism.py --root_dir $ROOT_DIR --img_downscale {an integer, e.g. 2 means half the image sizes} to prepare the training data and save to disk first, if you want to run multiple experiments or run on multiple gpus. This will largely reduce the data preparation step before training.

Data visualization (Optional)

Take a look at phototourism_visualization.ipynb, a quick visualization of the data: scene geometry, camera poses, rays and bounds, to assure you that my data convertion works correctly.

Training model

Run (example)

python train.py \
  --root_dir /home/ubuntu/data/IMC-PT/brandenburg_gate/ --dataset_name phototourism \
  --img_downscale 8 --use_cache --N_importance 64 --N_samples 64 \
  --encode_a --encode_t --beta_min 0.03 --N_vocab 1500 \
  --num_epochs 20 --batch_size 1024 \
  --optimizer adam --lr 5e-4 --lr_scheduler cosine \
  --exp_name brandenburg_scale8_nerfw

--encode_a and --encode_t options are both required to maximize NeRF-W performance.

--N_vocab should be set to an integer larger than the number of images (dependent on different scenes). For example, "brandenburg_gate" has in total 1363 images (under dense/images/), so any number larger than 1363 works (no need to set to exactly the same number). Attention! If you forget to set this number, or it is set smaller than the number of images, the program will yield RuntimeError: CUDA error: device-side assert triggered (which comes from torch.nn.Embedding).

Pretrained models and logs

Download the pretrained models and training logs in release.

🔎 Testing

Use eval.py to create the whole sequence of moving views. It will create folder results/{dataset_name}/{scene_name} and run inference on all test data, finally create a gif out of them.

Lego from Blender

All my experiments are done with image size 200x200, so theoretically PSNR is expected to be lower.

  1. test_nerfa_color shows that NeRF-A is able to capture image-dependent color variations.


Left: NeRF, PSNR=23.17 (paper=23.38). Right: pretrained NeRF-A, PSNR=28.20 (paper=30.66).

  1. test_nerfu_occ shows that NeRF-U is able to decompose the scene into static and transient components when the scene has random occluders.


Left: NeRF, PSNR=21.94 (paper=19.35). Right: pretrained NeRF-U, PSNR=28.60 (paper=23.47).

  1. test_nerfw_all shows that NeRF-W is able to both handle color variation and decompose the scene into static and transient components (color variation is not that well learnt though, maybe adding more layers in the static rgb head will help).


Left: NeRF, PSNR=18.83 (paper=15.73). Right: pretrained NeRF-W, PSNR=24.86 (paper=22.19).

  1. Reference: Original NeRF (without --encode_a and --encode_t) trained on unperturbed data.


PSNR=30.93 (paper=32.89)

Brandenburg Gate from Phototourism dataset

See test_phototourism.ipynb for some paper results' reproduction.

Use eval.py (example) to create a flythrough video. You might need to design a camera path to make it look more cool!

brandenburg_test

⚠️ Notes on differences with the paper

  • Network structure (nerf.py):

    • My base MLP uses 8 layers of 256 units as the original NeRF, while NeRF-W uses 512 units each.
    • The static rgb head uses 1 layer as the original NeRF, while NeRF-W uses 4 layers. Empirically I found more layers to overfit when there is data perturbation, as it tends to explain the color change by the view change as well.
    • I use softplus activation for sigma (reason explained here) while NeRF-W uses relu.
    • I apply +beta_min all the way at the end of compositing all raw betas (see results['beta'] in rendering.py). The paper adds beta_min to raw betas first then composite them. I think my implementation is the correct way because initially the network outputs low sigmas, in which case the composited beta (if beta_min is added first) will be low too. Therefore not only values lower than beta_min will be output, but sometimes the composited beta will be zero if all sigmas are zeros, which causes problem in loss computation (division by zero). I'm not totally sure about this part, if anyone finds a better implementation please tell me.
  • Training hyperparameters

    • I find larger (but not too large) beta_min achieves better result, so my default beta_min is 0.1 instead of 0.03 in the paper.
    • I add 3 to beta_loss (equation 13) to make it positive empirically.
    • When there is no transient head (NeRF-A), the loss is the average MSE error of coarse and fine models (not specified in the paper).
    • Other hyperparameters differ quite a lot from the paper (although many are not specified, they say that they use grid search to find the best). Please check each pretrained models in the release.
  • Phototourism evaluation

    • To evaluate the results on the testing set, they train on the left half of the image and evaluate on the right half (to train the embedding of the test images). I didn't perform this additional training, I only evaluated on the training images. It should be easy to implement this.

About

Implement NGPs encoding into the NeRF-in-the-wild

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE_bmild

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published