avfilter/vf_overlay_videotoolbox: add fast code path for bgra overlay #410
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous implementation needed to convert both main and overlay frames to BGRA texture and then convert back to YUV. This operation is bandwidth heavy.
Add a faster shader when the overlay is in BGRA format which calculates YUV values in the shader. This eliminates the need to convert the main frame and does not require extra copy for the overlay frame, leading to more than 100% performance improvements overlaying 10-bit 1080p HEVC inputs on M1 Max (190fps -> 407fps).
The rgb to yuv formula is currently hard-coded to premultiplied BT.709 matrix.
Changes
Issues