-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avfilter/tonemapx: fix more range handling #477
Conversation
d02fa6b
to
eecdbb4
Compare
eecdbb4
to
2fdd1c5
Compare
Now that I remember it, I had similar overflow with my custom impl before switching to the one in |
Just a nitpick, one drawback of our |
But the upstream one is also using averaging? |
I know zimg can use bilinear interpolation but I'm a bit concerned about doing something like this in our software filter as the performance penalty could be high, and out GPU implementation already using some sort of averaging sampler which performs similar to bilinear interpolation. |
It appears that the fix for range handling in #472 was incomplete, and there was a misunderstanding about the use of the mysterious
28672
. This value wasn't related to limited range, but rather an implementation borrowed fromvf_colorspace
. The decision not to use the full int16 range in that code was to provide buffer space for overflows and underflows, which I also encountered during debugging. While this approach works for upstream conversion, it's not straightforward and makes it impossible to use the CPU's saturate instructions for efficient clamping.Also, even though limited range videos in 10-bit generally use the ranges 64-940 for Luma and 64-960 for Chroma, some videos extend below black (sub-blacks) and above nominal peak values (super-whites), providing signals ranging from 4 to 1019. These values are still considered TV range but can cause underflows and overflows if not handled carefully. The Samsung video sample we've tested has signals in such extended limit range.
To fix these issues, this PR implements a more direct approach: all input values, whether limited or full range, will be scaled to a 0-32767 range after the YUV-to-RGB conversion, and the reverse function will always expect the 0-32767 range. Previously, the scaling for limited range was set incorrectly, leading to over-bright outputs when converting from full to limited range, and also overflow on the output end. Overflow and underflow issues are already naturally handled in the AVX and SSE code paths after the range scaling fix, but significant changes were needed for the NEON code path.
Previously, the NEON implementation used half-width int16 to perform YUV-to-RGB conversion. This no longer works with the updated coefficients and causes overflow in cases like extended limited signals. Now, YUV-to-RGB operations are performed using full-width int32 values like AVX and SSE and saturated back to int16. This makes the NEON code path slightly slower, but the performance impact on a fast CPU (M1) is marginal (~3% on average, with a worst case of ~5%), which should be acceptable.
Additionally, the coefficient matrix now uses full-width int32 because int16 values can't hold the matrix values when scaling to 32767 from limited to full range. Most code paths already interpret the value as full-width integers, so there’s no observable performance impact, and theoretically only a slight increase in memory usage.
Changes
Issues