Images? SDF? #286

zeptofine · 2022-02-28T15:29:54Z

zeptofine
Feb 28, 2022

I'm sorry if this is the wrong place to post this, but I'm curious what these two training modes do. What is the Image training mode even doing? the testbed image trainer leads me to believe it's trying to replicate that specific image and become very good at replicating that specific image. How is this useful? what am I missing? how can I make gigapixel images described in the paper?
Onto the SDF trainer, as far as I know, all this does is create a more convoluted format of the input model. everything I find when I google SDF leads to different stuff, and their explanations are... sparse and not clear lol. Apologies, I come from the perspective of an outsider that is curious about the technology, has no idea what it does or what it's for but thinks it looks cool. So what is this thing?

Answered by leventt

Mar 15, 2022

In this lecture, one of the authors of the NeRF paper describes the image replication you are referring to as a toy problem:
https://youtu.be/nRyOzHpcr4Q?t=1454

You can think of this as a toy example as he describes the problem they were exploring... why and how of trying to teach an image to a neural network even to overfit or have it memorize the input data, may have been challenging and how some encoding may help it. On their publication it was fourier analysis based or frequency based as instant-ngp refers to it... This has parallels to positional encodings recently done for transformer based models like GPT if I am understanding this correctly and it is to provide structural informat…

View full answer

leventt · 2022-03-15T05:14:00Z

leventt
Mar 15, 2022

In this lecture, one of the authors of the NeRF paper describes the image replication you are referring to as a toy problem:
https://youtu.be/nRyOzHpcr4Q?t=1454

You can think of this as a toy example as he describes the problem they were exploring... why and how of trying to teach an image to a neural network even to overfit or have it memorize the input data, may have been challenging and how some encoding may help it. On their publication it was fourier analysis based or frequency based as instant-ngp refers to it... This has parallels to positional encodings recently done for transformer based models like GPT if I am understanding this correctly and it is to provide structural information to otherwise unstructured inputs into the neural network. You can compare this yourself, trying to fit an image representation into a vanilla MLP (fully connected neural network) without these types of "positional encodings"... What is proposed with instant-ngp seems to be some comparison between different methods of this encoding. It seem like they found spatial hash grid structures with multi-resolutions actually can replace frequency based encodings with better performance... This kind of data structures are usually typical in computer graphics to describe and accelerate high dimensional, such as 3 dimensional, data and highly optimized for parallel computing in GPUs.

Not only the encoding but also building on the previous work of the author(s?) on tiny-cuda-nn which I believe they published as well to describe some optimizations specific to newer NVIDIA GPUs with bigger L1/L2 caches and different shared logic architectures to fuse some typical MLP operations for activations, has made this work out-perform things in spectacular ways.

SDF refers to signed distance functions and they have been made popular (as far as I know) with these articles:
https://www.iquilezles.org/www/articles/distfunctions/distfunctions.htm
combined with the author's (of that blog not this repo) work on shadertoy (which he started as a website to have people share fragment shaders on webgl/browser)

These SDF representations are nothing new (just stating that if not obvious, to make sure I am not misleading when I refer to its popularity) but popular now, also because there are specific optimizations you can do with these data structures to produce better final pixels (if SDF can be modelled in a convenient way, which isn't usually the case) and also make spatial calculations faster as the data structure lends itself better for some geometric computation paradigms better.

Having said all this, I believe the main goal of the authors along with other examples, like radiance caching for ray tracers or volume renderers, or nerf, is to show how generic this encoding may be for different tasks.

I hope this may help you understand but take my comments with a pinch of salt :D I am not one of the authors.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images? SDF? #286

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Images? SDF? #286

zeptofine Feb 28, 2022

Replies: 1 comment

leventt Mar 15, 2022

zeptofine
Feb 28, 2022

leventt
Mar 15, 2022