avfilter/tonemapx: fix more range handling #477

gnattu · 2024-10-09T04:14:37Z

It appears that the fix for range handling in #472 was incomplete, and there was a misunderstanding about the use of the mysterious 28672. This value wasn't related to limited range, but rather an implementation borrowed from vf_colorspace. The decision not to use the full int16 range in that code was to provide buffer space for overflows and underflows, which I also encountered during debugging. While this approach works for upstream conversion, it's not straightforward and makes it impossible to use the CPU's saturate instructions for efficient clamping.

Also, even though limited range videos in 10-bit generally use the ranges 64-940 for Luma and 64-960 for Chroma, some videos extend below black (sub-blacks) and above nominal peak values (super-whites), providing signals ranging from 4 to 1019. These values are still considered TV range but can cause underflows and overflows if not handled carefully. The Samsung video sample we've tested has signals in such extended limit range.

To fix these issues, this PR implements a more direct approach: all input values, whether limited or full range, will be scaled to a 0-32767 range after the YUV-to-RGB conversion, and the reverse function will always expect the 0-32767 range. Previously, the scaling for limited range was set incorrectly, leading to over-bright outputs when converting from full to limited range, and also overflow on the output end. Overflow and underflow issues are already naturally handled in the AVX and SSE code paths after the range scaling fix, but significant changes were needed for the NEON code path.

Previously, the NEON implementation used half-width int16 to perform YUV-to-RGB conversion. This no longer works with the updated coefficients and causes overflow in cases like extended limited signals. Now, YUV-to-RGB operations are performed using full-width int32 values like AVX and SSE and saturated back to int16. This makes the NEON code path slightly slower, but the performance impact on a fast CPU (M1) is marginal (~3% on average, with a worst case of ~5%), which should be acceptable.

Additionally, the coefficient matrix now uses full-width int32 because int16 values can't hold the matrix values when scaling to 32767 from limited to full range. Most code paths already interpret the value as full-width integers, so there’s no observable performance impact, and theoretically only a slight increase in memory usage.

Changes

use full int32 for neon yuv2rgb and the matrix
fix range scaling in yuv2rgb/rgb2yuv matrix derivation
clean up some unused branching after p016 support removal

Issues

nyanmisaka · 2024-10-09T06:20:48Z

         * General design:
         * - yuv2rgb converts from whatever range the input was ([16-235/240] or
         *   [0,255] or the 10/12bpp equivalents thereof) to an integer version
         *   of RGB in psuedo-restricted 15+sign bits. That means that the float
         *   range [0.0,1.0] is in [0,28762], and the remainder of the int16_t
         *   range is used for overflow/underflow outside the representable
         *   range of this RGB type. rgb2yuv is the exact opposite.

Now that I remember it, I had similar overflow with my custom impl before switching to the one in vf_colorspace.

nyanmisaka · 2024-10-09T08:31:47Z

Just a nitpick, one drawback of our vf_tonemapx (420p*) compared to the original vf_tonemap (gbrp*) is that we don't use a more proper upscaling method for 4:2:0 chroma -> 4:4:4 chroma but simply do AVG. This may cause aliasing, which is more noticeable in red. Might be worth improving in the future.

gnattu · 2024-10-09T13:18:12Z

Just a nitpick, one drawback of our vf_tonemapx (420p*) compared to the original vf_tonemap (gbrp*) is that we don't use a more proper upscaling method for 4:2:0 chroma -> 4:4:4 chroma but simply do AVG. This may cause aliasing, which is more noticeable in red. Might be worth improving in the future.

But the upstream one is also using averaging?

nyanmisaka · 2024-10-09T13:35:47Z

Just a nitpick, one drawback of our vf_tonemapx (420p*) compared to the original vf_tonemap (gbrp*) is that we don't use a more proper upscaling method for 4:2:0 chroma -> 4:4:4 chroma but simply do AVG. This may cause aliasing, which is more noticeable in red. Might be worth improving in the future.

But the upstream one is also using averaging?

Oh my bad, it's not the AVG in the YUV output but the YUV input - four luma share the same one chroma without using any interp method.

But the yuv420p10 -> gbrpf32 conversion is done before vf_tonemap, which includes a chroma upscaling step.

It's noticeable on the edge of the red lines. Video players usually apply an upscaling algorithm so it's only noticeable in the browser.

gnattu · 2024-10-09T14:01:08Z

Oh my bad, it's not the AVG in the YUV output but the YUV input - four luma share the same one chroma without using any interp method.

I know zimg can use bilinear interpolation but I'm a bit concerned about doing something like this in our software filter as the performance penalty could be high, and out GPU implementation already using some sort of averaging sampler which performs similar to bilinear interpolation.

nyanmisaka · 2024-10-09T14:33:54Z

Oh my bad, it's not the AVG in the YUV output but the YUV input - four luma share the same one chroma without using any interp method.

I know zimg can use bilinear interpolation but I'm a bit concerned about doing something like this in our software filter as the performance penalty could be high, and out GPU implementation already using some sort of averaging sampler which performs similar to bilinear interpolation.

This is what I am worried about, there is no free image sampler on the CPU. For now we may have to deal with it. Our GPU impl uses no interp for Y and linear samplers for UV. That's why it looks okayish.

gnattu requested a review from a team October 9, 2024 04:14

gnattu force-pushed the fix-more-range-tonemapx branch from d02fa6b to eecdbb4 Compare October 9, 2024 05:57

avfilter/tonemapx: fix more range handling

2fdd1c5

gnattu force-pushed the fix-more-range-tonemapx branch from eecdbb4 to 2fdd1c5 Compare October 9, 2024 06:13

Shadowghost approved these changes Oct 9, 2024

View reviewed changes

nyanmisaka approved these changes Oct 9, 2024

View reviewed changes

gnattu merged commit ac98a44 into jellyfin Oct 9, 2024
27 checks passed

gnattu deleted the fix-more-range-tonemapx branch October 9, 2024 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avfilter/tonemapx: fix more range handling #477

avfilter/tonemapx: fix more range handling #477

gnattu commented Oct 9, 2024 •

edited

Loading

nyanmisaka commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

gnattu commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

gnattu commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

avfilter/tonemapx: fix more range handling #477

avfilter/tonemapx: fix more range handling #477

Conversation

gnattu commented Oct 9, 2024 • edited Loading

nyanmisaka commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

gnattu commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

gnattu commented Oct 9, 2024

nyanmisaka commented Oct 9, 2024

gnattu commented Oct 9, 2024 •

edited

Loading