-
Notifications
You must be signed in to change notification settings - Fork 1
FAQ: Video and Audio Playback Performance Metrics
This document provides a standardized set of metrics for monitoring and evaluating playback performance across video and audio streams on various platforms. These metrics help assess synchronization between the display and audio pipelines, efficiency in frame and audio buffer handling, and overall playback quality. This approach enables consistent performance analysis and identification of playback issues for both video and audio-only playback.
- Definition: The frame rate of the display, representing the refresh rate at which frames are displayed.
- Use: Determines the frequency of display refreshes, crucial for synchronizing video playback.
- Definition: Total count of vertical synchronization (VSync) events during playback.
- Use: Tracks display refresh occurrences over a specific playback duration.
- Definition: The count of decoded frames successfully displayed on the screen.
- Use: Indicates the number of frames processed and displayed, which should typically align with the number of VSyncs or half of them, depending on the playback rate.
- Definition: Occurs when the encoded frame rate differs from the display refresh rate, necessitating frame rate adjustments.
-
Use: Measures the proportion of frame rate conversions to displayed frames. For example:
- A 24fps encoded stream shown at 48fps should maintain a 50:50 FRC-to-displayed frames ratio.
- A 60fps encoded stream displayed at 60fps should have an FRC-to-displayed frames ratio of 0:100.
- Definition: Count of frames repeated due to the next frame’s presentation timestamp (PTS) arriving too early, or video pause.
- Use: Indicates frame repetition instances, often due to timing issues where frames are displayed more than once because of early frame arrival or playback pauses.
- Definition: Count of frames discarded because their PTS was behind the system’s presentation clock (STC).
- Use: Indicates frame drops, often caused by performance lags, impacting playback smoothness.
- Definition: Count of times the video pipeline was empty, meaning the display frame queue size was one frame or lower.
- Use: Tracks pipeline starvation, where playback cannot keep up with display demands, causing playback interruptions.
During playback, the following metric behaviors are expected:
-
VSync Events: The number of VSync events should match the playback duration when divided by the VSync frame rate.
-
Displayed Frames: Should correspond to the encoded frame rate in relation to the VSync rate. For example, 10 seconds of playback at a 60Hz VSync rate should yield around 600 VSync events, with displayed frames aligning proportionally.
-
Frame Rate Conversions: Frame rate conversions should match the ratio expected based on the encoded vs. display frame rates.
-
Repeated, Dropped Frames, and Underruns: These values should remain at zero for optimal playback. Non-zero values highlight issues like timing (repeated frames), performance (dropped frames), or pipeline delays (underruns).
For audio playback, where the stream is audio-only or audio is being decoded alongside video, the focus is on buffer management and pipeline stability:
- Definition: The count of times the audio buffer was empty, causing interruptions in audio playback.
- Use: Indicates situations where the audio playback pipeline fails to maintain a continuous stream, often due to decoding or buffer management issues, causing gaps or stutters in audio.
- Definition: The count of instances where the audio buffer exceeded its maximum capacity, typically causing delays or blocking further processing until the buffer is cleared.
- Use: Reveals audio buffer overflows, potentially indicating a blockage or backpressure within the pipeline. This may occur if the buffer fills up faster than it can be processed, leading to stalled playback or delayed audio.
-
Free-Running Audio: Typically, in non-broadcast environments, audio is expected to be the master clock and run freely, without frame drops or repeats. Video playback adjusts to remain in sync with the audio, ideally by adjusting the display clock to align with the audio playback.
-
Underruns and Overruns: For audio playback, underruns and overruns should ideally be zero. Frequent underruns indicate that the audio decoding process cannot supply data quickly enough, resulting in gaps or interruptions. Overruns suggest buffer management issues, possibly due to a backlog in audio processing or timing delays.
When using these metrics to validate playback performance and check for regressions, a set of standardized test streams should be used. For each test stream, baseline metrics (expected values for VSyncs, displayed frames, FRC, underruns, overruns, etc.) should be predefined. The measured values during playback can then be compared against these baselines to determine if the build passes or fails the playback test.
By establishing known output values for these metrics, you can:
- Quickly identify deviations from expected behavior, which may indicate performance or synchronization regressions.
- Verify that the latest build maintains playback quality and stability across video and audio metrics.
The initial design for these generic hardware abstraction layer (HAL) metrics collection is intended to provide a unified approach for gathering video and audio playback performance metrics across platforms ensuring standardized collection across hardware could enhance cross-platform testing and optimization.
Author: S.Webster