WIP: add high-precision GPU trilinear interpolation for 3D LUTs #1794
+116
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addresses #1763.
The new high-precision code path can be enabled by disabling the new default-enabled OPTIMIZATION_NATIVE_GPU_TRILINEAR optimization flag.
The existing code path used the GPU's native trilinear texture interpolation function, which (although faster) quantized the lookup coordinates and could cause color banding. That's still the default, but now full-precision trilinear interpolation can optionally be used instead.
WIP
This is PR is WIP. It functions, but it's not ready to merge yet:
GetLut3DGPUShaderProgram()
. I tried several approaches, but every approach ended up affecting some public API.GPUProcessor::Impl
, and then pass them toOp::extractGpuShaderInfo()
via a new argument with a default value. Of all the approaches I tried, this seemed to have the lowest impact on both the code and the API.I am very much open to suggestions for a better approach to get the optimization flags to
GetLut3DGPUShaderProgram()
.Additionally, I still need to add unit tests for the new code path.
Performance
Some initial naive performance tests indicate that the high precision code is notably slower than than GPU-native trilinear interpolation, but about on par with OCIO's tetrahedral interpolation. More testing is needed, however. For example, using higher-res LUTs and testing on a variety of GPUs. I'll update with actual data once I've had a better go at this.