Comparison with direct voxel lookups - Chapter. 5.4. #156
-
Hi, congrats on this awesome project and TCNN. From Chapter 5.4, "Comparison with direct voxel lookups", ablation study, I quote "we replace the entire neural network with a single linear matrix multiplication". I'm interested in exploring this tradeoff further. Could you clarify what you did? In particular what were the grid parameters, since the first MLP is normally predicting direction-independent color information this now needs to be represented directly as trainable features in the grid. (For Plenoxels I guess that would be like having a 1-level dense grid with 28-feature vectors, which is not possible here). My questions:
Do you still have this code around in a branch maybe? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi there, what we did amounts to simply setting the number of hidden layers of the neural networks to zero. (All other parameters---input/output dims, output activations, optimizers, F, T, etc.---being equal.) Note that despite there being two neural networks (one for density and one for color), the concatenation of 2 linear transforms is still a linear transform. Therefore, it's equivalent to having just a single transformation matrix. (Although the computation structure is such that the predicted density can't depend on view direction---so it's fairer to say that there is one linear transform of Hashgrid->Density and another from Hashgrid+SphericalHarmonigs->RGB, I.e. the single conceptual large linear transform has some of its matrix entries forced to zero). We've bundled the corresponding config as Cheers! |
Beta Was this translation helpful? Give feedback.
-
Thanks, I see. From the phrasing I initially thought it was completely side stepping the training of network weights. |
Beta Was this translation helpful? Give feedback.
-
Yep! Although in a specialized implementation that doesn't need padding, the matrices would be 32x1 and 48x3, which amounts to "just" 176 FMA instructions. 48 of these (16x3) are inherent in the SH coefficient representation, and the remaining ones could conceivably be halved by lowering the output dimensionality of the hash encoding (e.g. by choosing F=1 or using L=8). That'd be my attempted approach, for a total of 112 mults, if FMA instructions are well and truly at a premium. |
Beta Was this translation helpful? Give feedback.
Hi there, what we did amounts to simply setting the number of hidden layers of the neural networks to zero. (All other parameters---input/output dims, output activations, optimizers, F, T, etc.---being equal.)
Note that despite there being two neural networks (one for density and one for color), the concatenation of 2 linear transforms is still a linear transform. Therefore, it's equivalent to having just a single transformation matrix. (Although the computation structure is such that the predicted density can't depend on view direction---so it's fairer to say that there is one linear transform of Hashgrid->Density and another from Hashgrid+SphericalHarmonigs->RGB, I.e. the single concept…