How to use underlying representation for a new task with different input/output dimensions? #148

JonathonLuiten · 2022-01-14T22:32:55Z

JonathonLuiten
Jan 14, 2022

This code and project are awesome! Thanks a lot.

In terms of building upon this, I wonder how to access, edit and train the underlying hash+NN representation for a new task.

For example, let's say I have a task with different number of input or output coordinates, e.g. some special video++ like representation which should be directly fitted like the image is done in the demo (e.g. 3 input coordinates (x,y,t) and 5 output coordinates (r,g,b,a,b)).

Do I have access through this code (ideally python bindings, but if not then some c code), to edit the number of input and output coordinates, to provide my own training data which fits with these input/output coordinates, and train your hash+NN representation on a new task?

If this is possible with the code, then pointers on how to do this would be very much appreciated. I'm currently very lost with how to expose this ability in your code base.

Making this easier to do (e.g. easy python bindings), I'm sure would be greatly appreciated by the research community, in order to be able to build upon this awesome work as easily as possible.

Cheers,
Jonathon

Answered by Tom94

Feb 14, 2022

I just pushed a first version (call it "beta") for a PyTorch extension of tiny-cuda-nn. See this section of the README for installation/usage instructions and please do report problems you encounter along the way. :)

View full answer

JonathonLuiten · 2022-01-14T23:12:03Z

JonathonLuiten
Jan 14, 2022
Author

30 minutes more of digging: an attempt to answer my own question (my current understanding, please correct if I am wrong):

It seems as though this functionality is NOT built into the current python bindings at all.
The best way to go about doing this is actually to ignore everything in this repo, and instead only use the code from the https://github.com/NVlabs/tiny-cuda-nn/ repo (which has now been updated to include the 'grid' encoding, which is the underlying hash + mlp representation from this paper).
This can be done with c++/CUDA following the 'usage' example in the above mentioned repo.
To do this, any wrapping code would have to be written in C++/CUDA and currently there are no python bindings enabling one to train this using python.

Please let me know if this is correct?

Also I would LOVE to see either:
a) A python API that let's you do these things with this optimized library in python.
or (less ideal because I think it would be slower)
b) a pytorch or jax or TF2 implementation of this grid+hash+NN representation.

Will leave open until confirming what I have said above is correct.

0 replies

Tom94 · 2022-01-15T07:41:33Z

Tom94
Jan 15, 2022
Maintainer

Please let me know if this is correct?

Correct!

Also I would LOVE to see either:
...

a) colleagues of ours have internally built a functional (but not quite ready) PyTorch wrapper around tiny-cuda-nn (both the encoding and the MLPs), which we hope to release soonish. Based on experience so far, there will be a slowdown compared to the native C++ API, but still significantly faster than Python-native MLPs.

b) I actually made one of those ages ago and found it difficult to get good performance -- to the point where the hash encoding was much slower than the frequency encoding in these frameworks. The dynamic indexing required by the hash table lookups really does not play to the strengths of these tensor-based frameworks. Plus, these frameworks don't (to my knowledge) expose good primitives to write cache-local / shmem-based fused kernel code. Loads of wasted bandwidth going back and forth to GPU RAM.

That said, I am not a PyTorch/TF wizard by any stretch, so maybe somebody else can cook something up. I'd welcome it for sure!

But ultimately, I think the easier way forward is to go with a proper wrapper as we plan.

0 replies

cfoster0 · 2022-01-15T16:44:16Z

cfoster0
Jan 15, 2022

Re (b): what was the issue you were seeing around dynamic indexing? I'd imagine you were probably using gather()/scatter() to index into the grid tensor, which doesn't sound like it'd be too terrible, although the frequency coding should be naturally fast since it is pure compute.

Do you recall around how long it was taking? I'm interested in applications of the hash encoding technique that might not be quite so latency sensitive as instant rendering. :)

0 replies

Tom94 · 2022-01-15T18:02:40Z

Tom94
Jan 15, 2022
Maintainer

Pretty much! My "quick and dirty" implementation used gather() to load from the hash tables.

I'm hesitant to share too many details because it has been so long, the hyperparameters of the hash tables were very different then, and the experiments were anything but systematic.

What I feel comfortable saying:

I drop-in replaced the encoding part of the original NeRF codebase
I observed the expected quality increase and better training at equal iteration count.
Individual iterations became slower (by a factor; not just some percentage).

But I feel it's important to clarify that I don't want to discourage TF/PyTorch implementations of the encoding at all! As you say, the perf vs. quality tradeoff may still be favorable -- even if slower than a dedicated op/kernel -- especially since the encoding takes just a couple dozen lines of code to implement in Python.

What I mostly wanted to convey above was that I feel like a custom op will give most people a more favorable trade-off between performance and hackability.

0 replies

cfoster0 · 2022-01-15T20:14:56Z

cfoster0
Jan 15, 2022

Makes a lot of sense. I think there's a lot of room for further exploration of the hash encoding method, so standalone ops for using it in existing frameworks would be very useful, esp. since CUDA wizardry is a rare talent.

Also, thank you, this is some incredible work y'all have done.

0 replies

zshureih · 2022-01-19T20:10:12Z

zshureih
Jan 19, 2022

Just to hop back on the Pytorch wrapper of tiny-cuda-nn, would soon-ish be somewhere in the realm of next quarter, or beyond?

0 replies

Tom94 · 2022-01-24T07:21:36Z

Tom94
Jan 24, 2022
Maintainer

Definitely in the next quarter. Probably much sooner (~month scale)

0 replies

ashawkey · 2022-01-24T12:57:54Z

ashawkey
Jan 24, 2022

I made a pytorch binding of the hashgrid encoder here, but the native MLP in pytorch is still the major time bottleneck. Looking forward to the official implementation :)

0 replies

Tom94 · 2022-02-14T14:26:33Z

Tom94
Feb 14, 2022
Maintainer

I just pushed a first version (call it "beta") for a PyTorch extension of tiny-cuda-nn. See this section of the README for installation/usage instructions and please do report problems you encounter along the way. :)

0 replies

JonathonLuiten · 2022-02-14T15:39:34Z

JonathonLuiten
Feb 14, 2022
Author

Amazing! Will spend today playing this this and let you know how it goes!

0 replies

JonathonLuiten · 2022-02-14T22:14:10Z

JonathonLuiten
Feb 14, 2022
Author

So I played with this today and here are the results:

1.) Installing and running the 2D (image reconstruction) demo:
Super easy to do, works really well and is really fast.

2.) Trying to incorporate into a NeRF framework (steps to reproduce below):
course+fine: Doesn't work at all, somehow the fine network only gives all 'ones' for RGB and backward pass can't find a forward pass to use for calculating gradients.
course only: Code runs, and somewhat works.
Speed: 27 min for 200,000 iterations. Vs. 75 mins for original nerf = ~2.8x speed up (note, both are course only, run on 1x 3090)
Results: After 10,000 iterations: PSNR is only 15. Vs. PSNR of 24 for original NeRF.
E.g. something is wrong and it is not really learning properly. Also the 3x speed up is nice, but not as much as I expected.

Below I detail exactly how to reproduce these results so that if someone else is interested they can build upon this and try to make it work, and let me know what I am doing wrong.

Steps:
1.) Install both tiny-cuda-nn (https://github.com/NVlabs/tiny-cuda-nn) and nerf-pytorch (https://github.com/yenchenlin/nerf-pytorch)
2.) Replace the function 'create_nerf(args)' with the following function:

  def create_instant_NGP(args):
  
      hash_encoding_config = {
          "otype": "HashGrid",
          "n_levels": 16,
          "n_features_per_level": 2,
          "log2_hashmap_size": 19,
          "base_resolution": 16,
          "per_level_scale": 2
      }
      base_network_config = {
          "otype": "FullyFusedMLP",
          "activation": "ReLU",
          "output_activation": "None",
          "n_neurons": 64,
          "n_hidden_layers": 1
      }
      dir_encoding_config = {
          "otype": "Composite",
          "nested": [
              {
                  "n_dims_to_encode": 3,
                  "otype": "SphericalHarmonics",
                  "degree": 4
              },
              {
                  "otype": "Identity"
              }
          ]
      }
      rgb_network_config = {
          "otype": "FullyFusedMLP",
          "activation": "ReLU",
          "output_activation": "None",
          "n_neurons": 64,
          "n_hidden_layers": 2
      }
      num_feats_from_base_to_rgb = 64
  
      # Model
      class Instant_NGP(nn.Module):
          def __init__(self, hash_encoding_config, base_network_config, dir_encoding_config, rgb_network_config,
                       num_feats_from_base_to_rgb):
              super(Instant_NGP, self).__init__()
              self.input_ch = 3
              self.input_ch_views = 3
              self.base_model = tcnn.NetworkWithInputEncoding(n_input_dims=3,
                                                              n_output_dims=1 + num_feats_from_base_to_rgb,
                                                              encoding_config=hash_encoding_config,
                                                              network_config=base_network_config)
              self.rgb_model = tcnn.NetworkWithInputEncoding(n_input_dims=3 + num_feats_from_base_to_rgb,
                                                             n_output_dims=3,
                                                             encoding_config=dir_encoding_config,
                                                             network_config=rgb_network_config)
          def forward(self, x):
              input_pts, input_views = torch.split(x, [self.input_ch, self.input_ch_views], dim=-1)
              base_output = self.base_model(input_pts)
              alpha = base_output[:, 0]
              base_feats = base_output[:, 1:]
              concat_feats = torch.cat([input_views, base_feats], 1)
              rgb = self.rgb_model(concat_feats)
              out = torch.cat([rgb, alpha[:, None]], 1)
              return out
  
      model = Instant_NGP(hash_encoding_config, base_network_config, dir_encoding_config, rgb_network_config,
                          num_feats_from_base_to_rgb).to(device)
      grad_vars = list(model.parameters())
      model_fine = None
      if args.N_importance > 0:
          model_fine = Instant_NGP(hash_encoding_config, base_network_config, dir_encoding_config, rgb_network_config,
                          num_feats_from_base_to_rgb).to(device)
          grad_vars += list(model_fine.parameters())
  
      optimizer = torch.optim.Adam(params=grad_vars, lr=args.lrate, betas=(0.9, 0.999))
  
      def run_network_instant_NGP(inputs, viewdirs, fn, netchunk=1024 * 64):
          """Prepares inputs and applies network 'fn'.
          """
          inputs_flat = torch.reshape(inputs, [-1, inputs.shape[-1]])
          input_dirs = viewdirs[:, None].expand(inputs.shape)
          input_dirs_flat = torch.reshape(input_dirs, [-1, input_dirs.shape[-1]])
          concat_input = torch.cat([inputs_flat, input_dirs_flat], -1)
          outputs_flat = batchify(fn, netchunk)(concat_input)
          outputs = torch.reshape(outputs_flat, list(inputs.shape[:-1]) + [outputs_flat.shape[-1]])
          return outputs
  
      network_query_fn = lambda inputs, viewdirs, network_fn : run_network_instant_NGP(inputs, viewdirs, network_fn,
                                                                  netchunk=args.netchunk)
  
      start = 0
      basedir = args.basedir
      expname = args.expname
  
      ##########################
  
      # Load checkpoints
      if args.ft_path is not None and args.ft_path!='None':
          ckpts = [args.ft_path]
      else:
          ckpts = [os.path.join(basedir, expname, f) for f in sorted(os.listdir(os.path.join(basedir, expname))) if 'tar' in f]
  
      print('Found ckpts', ckpts)
      if len(ckpts) > 0 and not args.no_reload:
          ckpt_path = ckpts[-1]
          print('Reloading from', ckpt_path)
          ckpt = torch.load(ckpt_path)
  
          start = ckpt['global_step']
          optimizer.load_state_dict(ckpt['optimizer_state_dict'])
  
          # Load model
          model.load_state_dict(ckpt['network_fn_state_dict'])
          if model_fine is not None:
              model_fine.load_state_dict(ckpt['network_fine_state_dict'])
  
      ##########################
  
      # TODO: this turns off the 'fine' network, but is a hack, should be done properly with args
      args.N_importance = 0
  
      render_kwargs_train = {
          'network_query_fn': network_query_fn,
          'perturb': args.perturb,
          'N_importance': args.N_importance,
          'network_fine': model_fine,
          'N_samples': args.N_samples,
          'network_fn': model,
          'use_viewdirs': args.use_viewdirs,
          'white_bkgd': args.white_bkgd,
          'raw_noise_std': args.raw_noise_std,
      }
  
      # NDC only good for LLFF-style forward facing data
      if args.dataset_type != 'llff' or args.no_ndc:
          print('Not ndc!')
          render_kwargs_train['ndc'] = False
          render_kwargs_train['lindisp'] = args.lindisp
  
      render_kwargs_test = {k : render_kwargs_train[k] for k in render_kwargs_train}
      render_kwargs_test['perturb'] = False
      render_kwargs_test['raw_noise_std'] = 0.
  
      return render_kwargs_train, render_kwargs_test, start, grad_vars, optimizer

0 replies

JonathonLuiten · 2022-02-14T22:18:00Z

JonathonLuiten
Feb 14, 2022
Author

I can't spend any more time on this. So if someone can get this to work properly, and let me know would be super. Especially if anyone finds the code above helpful :)

0 replies

JonathonLuiten · 2022-02-15T22:37:00Z

JonathonLuiten
Feb 15, 2022
Author

If people want to contribute to this, we're building this code open source together here:

https://github.com/JonathonLuiten/instant-nerf-pytorch

0 replies

Tom94 · 2022-02-16T09:45:16Z

Tom94
Feb 16, 2022
Maintainer

Perhaps it's worth trying to use tcnn.Encoding(..., dtype=torch.float), coupled with a torch-native fp32 MLP to rule out numerical problems with precision. Fingers crossed!

(You'll need to pull the latest tiny-cuda-nn to get fp32 encoding support.)

0 replies

riven314 · 2022-03-20T03:58:23Z

riven314
Mar 20, 2022

Thanks for the detailed address! @Tom94
I am wondering if JAX implementation could break the bottleneck encountered in PyTorch/ TF
Wondering if anyone try implementing this in JAX with some success?
Or anyone knows the likelihood that JAX could break the bottleneck?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use underlying representation for a new task with different input/output dimensions? #148

{{title}}

Replies: 15 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to use underlying representation for a new task with different input/output dimensions? #148

JonathonLuiten Jan 14, 2022

Replies: 15 comments

JonathonLuiten Jan 14, 2022 Author

Tom94 Jan 15, 2022 Maintainer

cfoster0 Jan 15, 2022

Tom94 Jan 15, 2022 Maintainer

cfoster0 Jan 15, 2022

zshureih Jan 19, 2022

Tom94 Jan 24, 2022 Maintainer

ashawkey Jan 24, 2022

Tom94 Feb 14, 2022 Maintainer

JonathonLuiten Feb 14, 2022 Author

JonathonLuiten Feb 14, 2022 Author

JonathonLuiten Feb 14, 2022 Author

JonathonLuiten Feb 15, 2022 Author

Tom94 Feb 16, 2022 Maintainer

riven314 Mar 20, 2022

JonathonLuiten
Jan 14, 2022

JonathonLuiten
Jan 14, 2022
Author

Tom94
Jan 15, 2022
Maintainer

cfoster0
Jan 15, 2022

Tom94
Jan 15, 2022
Maintainer

cfoster0
Jan 15, 2022

zshureih
Jan 19, 2022

Tom94
Jan 24, 2022
Maintainer

ashawkey
Jan 24, 2022

Tom94
Feb 14, 2022
Maintainer

JonathonLuiten
Feb 14, 2022
Author

JonathonLuiten
Feb 14, 2022
Author

JonathonLuiten
Feb 14, 2022
Author

JonathonLuiten
Feb 15, 2022
Author

Tom94
Feb 16, 2022
Maintainer

riven314
Mar 20, 2022