Skip to content

Rendering

boogie edited this page Jan 22, 2024 · 1 revision

Rendering

Overview

The frames decoded by rkmpp decoders are DRM_PRIME frames and not all players support those type of frames. To render a DRM_PRIME frame there are two common ways.

Rendering Through EGL

The EGL interface coming from MESA or a custom GL implementation from a vendor (ie: arm mali blobs) can import those DRM_PRIME frames and render on the screen. However that kind of rendering scheme is not yet implemented in all players. In this case the rendering flow is as follows.

  Kernel (Rockchip BSP)            UserSpace (Linux)
+----------------------+      +------------------------+
| Rockchip HW Decoders |      |                        |
|           ↓          |      |                        |
|    mpp service ------+------+---------> mpp          |
|                      |      |           ↓            | 
|      rgamulti  <-----+------+-librga -> ffmpeg       | 
|                      |      |           ↓ (DRMPRIME) | 
|                      |      |         player         | 
|                      |      |           ↓            | 
|    GPU Driver  <-----+------+-------- mesa(EGL)      | 
|           |          |      |                        |
|           +----------+------+--> mesa (pan*/lima*)   |
|                      |      |           ↓            | 
|   DRM/KMS (vop2) <---+------+-------- DE (gnome/KDE) |
|           ↓          |      |                        |
|   Display Out        |      |                        |
|  (hdmi/dp/mipi,lvds) |      |                        |
+----------------------+      +------------------------+

Rendering Through KMS/GBM

This flow is exactly like above but the gpu+DE part is bypassed, and the player directly renders on the kernels DRM interface, since the GPU is bypassed this is to most efficient rendering method but supported by very few players. (Only kodi as of writing)

  Kernel (Rockchip BSP)            UserSpace (Linux)
+----------------------+      +------------------------+
| Rockchip HW Decoders |      |                        |
|           ↓          |      |                        |
|    mpp service ------+------+---------> mpp          |
|                      |      |           ↓            | 
|      rgamulti  <-----+------+-librga -> ffmpeg       | 
|                      |      |           ↓ (DRMPRIME) | 
|                      |      |         player         | 
|                      |      |           |            | 
|   DRM/KMS (vop2) <---+------+-----------+            |
|           ↓          |      |                        |
|   Display Out        |      |                        |
|  (hdmi/dp/mipi,lvds) |      |                        |
+----------------------+      +------------------------+

Soft Frames (Not Supported)

When an FFMPeg decoder decodes the picture using CPU (ie: libx64 or libdavid), they are provided to a memory buffer and generally copied around several times. This type of flow is ok may be 1080p or even 4k picture to some point, but those several copies introduces a lot of delay and choppiness in the video. Therefore rkmpp decoders do not support soft Frames as a priority, because this is not the intended use of an hardware decoder.

Buffer Concerns and AFBC topic

As mentioned above, the 2 known rendering schemes, should never copy frames to another buffer to get the desired performance. And even in the case of 0-copy rendering, generating big frames like 8K in 60fps require a lot of DDR bandwidth and not even possible in generic DDR4 speeds. To tackle this problem, ARM introduced a compression mechanism for the frames called AFBC (Arm Frame Buffer Compression). Rkmpp* decoders support AFBC compressed frames generation, but it is very important to have an AFBC capable mesa and/or AFBC capable DRM driver(vop2) to get really up 8k@60fps speeds smoothly.

In the case AFBC is not supported by the rendering chain, there is still option to decode the frames AFBC compressed, and decompress the with RGA filter in ffmpeg as well. This method will also give 8K@60fps performance levels with a trade of increasing the required memory size (this also depends on your DDR performance.)

Usage on players

FFPlay for testing (Not Supported)

FFplay supports DRMPrime Frames only when decoder gets this frames over VAAPI interface, otherwise there is no support in FFplay take advantage of those rkmpp accelerated frames.

So currently there is no proper support for FFPlay.

MPV

Mpv supports DRMPrime frames through EGL

To get basic support with mpv run mpv with below syntax:

mpv --profile=fast --hwdec=rkmpp path-to-file

  • Limitations:
    • This will bring a speed up to 4k@60 fps rendering. For faster rendering you should activate the AFBC mode due to DDR bandwidth.
    • This will most likely not play 10bit files, because MESA and MPV both currently do not support NV15, NV20 10bit plane formats that the rkmpp decoders generate.

To workaround those issues, ffmpeg can use RGA filters to decompress the AFBC compressed frames, and convert the 10bit NV15,NV20 frames to something more accepted in mesa.

Below flags for mpv, will run rkmpp decoders in afbc mode, and pass those to RGA filters. RGA will convert NV15 frames to P010 and NV20 frames to P210 format.

mpv --profile=fast --hwdec=rkmpp --vd-lavc-o=afbc=on --vf=scale_rkrga=force_yuv=auto path-to-file

This will get true 10bit decoded rendering (if your display and mesa actually supports it) but due to the fact that P010 and P210 picture formats are not very efficient formats, above usage may still hit memory bandwidth limitations around 8k@55fps. To improve the performance in that regard, there should be direct AFBC rendering support in both mesa and mpv. I have tried several approaches, but could not find a proper solution get afbc support in mpv, and mesa in its current form. May be someone can take this up and improve.

To improve the performance more, a dynamic 10bit to 8bit conversion can be apllied with rga as below. This works exactly like above, but converts NV15 frames to NV12 and NV20 to NV16.

mpv --profile=fast --hwdec=rkmpp --vd-lavc-o=afbc=on --vf=scale_rkrga=force_yuv=8bit path-to-file

  • Limitations:

    • In both cases, mesa expects 64 byte aligned picture buffers, however mpp currently gives dynamicly aligned frame buffer which may not be 64 byte aligned. If the the picture width is an oddnumber * 64 (720x480), then it is possible that mesa will not accept the provided frames over EGL.
  • Tweaking mpv

    • --profile=fast is only required when your mesa is not fast enough to render the decoded frames. In a faster GL implementation like mali, of hopefully future panthor, you do not have to enable this feature
    • --swapchain-depth=8 might help to increase the delay and reduce to dropped frames due to whatever bottleneck in the rendering path
    • --msg-level=ffmpeg=debug or --msg-level=ffmpeg=trace can give extra useful information about the ffmpeg decoding process. trace option might be overkill, debug should be ok
    • --vo=gpu-next flag allows to use the new gpu backend in mpv which uses libplacebo. This might give slightly better performance
    • --ao=null --ao-null-untimed disables to sync video from audio. If you are testing from command line and have no proper audio backend, you can prevent frame drops due to lack of audio sync when testing

Kodi

Kodi provides DRMPrime frames support through both EGL and KMS/GBM.

When the windowing manager is using X, there is no way to support DRMPrime frames, neither through EGL or GBM/KMS When the windowing system is using Wayland, you can get EGL support. When there is no windowing manager, and you start Kodi with GBM, you can get both EGL and GBM/KMS support.

Kodi under Wayland

You have to do the following configuration to get decoding over EGL.

settings->player->videos->render method
Allow using DRM PRIME Decoder=enable
Allow Hardware Acceleation with DRM PRIME=enable
Prime Render Method=EGL
  • Limitations:
    • Same restrictions of mpv withtout any rga usage apply here as well, unfortunately Kodi can not use FFMpeg filters and make use of RGA
    • 10bit formats will not work
    • AFBC improvements can not be used
    • Performance will be limited 4k@60
    • 64 byte unaligned frames will not be rendered

Kodi under GBM

This type of rendering is the fastest method you can get. To run kodi with gbm support, the active Desktop Environment must be stopped so that Kodi can directly interact with KMS. You need to start kodi with

FFMPEG_RKMPP_DEC_OPT="afbc=on" kodi --windowing=gbm --audio-backend=alsa

Note: Audio backend force to alsa is not necesssary if you have a proper pipewire configuration.

Then configure kodi to render directly over KMS planes.

settings->player->videos->render method
Allow using DRM PRIME Decoder=enable
Allow Hardware Acceleation with DRM PRIME=enable
Prime Render Method=Direct to Plane

As you might notice, the decoder is currently running in AFBC mode, so there is no restriction in this mode in terms of performance, you should be able to get 8k@60 without any DDR bandwidth limitation.

  • Limitations:
    • if your attached monitor's resolution is <4K, you will not be able to render 8K frames properly, because vop2 activates the 8K rendering and scaling capabilities if the attached monitor is actually an 8k monitor. This is a limitation in rockchip hardware.
    • As a general thumb of rule, 8K@60 means only if the hardware actually allows this. In rk3588 this means only HEVC and H264 frames. AV1 and VP9 decoder is limited to 4K performance in rk3588. This depends on your actual hardware if the device in use is not RK3588.

Moonlight

Moonlight automatically detects which FFMpeg decoder can create a DRM Prime frame with hardware acceleration and detect rkmpp decoders automatically. If not users can still force the rkmpp decoders with below environment variables.

H264_DECODER_HINT=h264_rkmpp HEVC_DECODER_HINT=hevc_rkmpp AV1_DECODER_HINT=av1_rkmpp moonlight

But as mentioned it is not necessary to force the codecs. Make sure you have the latest moonlight with the patch fixes regession fro V4L2 codecs listed below.

Required Patches and Fixes in the toolchain.

Clone this wiki locally