Single-O Benchmarking #989

j0sh · 2019-07-17T07:30:35Z

Single-O Testing

This issue summarizes the current benchmarking progress with a single O and outlines further steps.

Single-O / Single-T

We are currently seeing a low success rate (~80%) when transcoding one 4-rendition stream [1] with a single O and a 16-core T (1O-1T). Two factors contribute to this low success rate:

Segments are not always a consistent length. When a segment is followed by a shorter segment, it is likely the O is still busy with the first segment. This accounts for ~55% of the failures.
The T cannot keep up in real time for a number of segments. This accounts for ~45% of remaining failures. In practice, this slowness exacerbates the first factor: an O would be able to better handle variations in segment lengths if it were faster in processing them.

[1] 240p, 360p, 576p, 720p

The LPMS transcoder uses a new session for each segment. Setting up and tearing down a new transcode session frequently incurs significant overhead for both CPU and GPU encoding. For a 16 vCPU machine, the slowdown is roughly 1.75x over performing all the processing for a given stream in the same session. The slowdown is such that even a 16-core transcoder has difficulty keeping up with some incoming segments. (Related: livepeer/lpms#119 , although the slowdown is closer to 75-100% rather than the 10-15% mentioned in the issue)

The O/T itself adds less than 2% of overhead; the running time is dominated by the LPMS transcoder. (Just to demonstrate how tight the transcoding margins are: statistically the success rate should go up to ~86% if the O/T overhead were excluded.)

With the particular bottleneck for 1O-1T identified as being the transcoder itself, we can proceed to some next steps for single-O testing.

Single-O Multi-T

Given that we know the bottleneck with the current testing configuration is the transcoder itself, we should dial down the transcoding configuration in order to benchmark other aspects of single-O. Two things remain to be tested:

Multi-T bottlenecks. Given Ts that are "fast enough", we need to ensure that O's load balancing among multiple T is reasonable. This is separate from the bottleneck of a single T, which we have already identified.
Single-O bottlenecks. Once we have determined that multi-T load balancing is in a good place, we can increase the capacity attached to an O in order to discover other bottlenecks in the O.

We should be comfortable with our understanding of any multi-T limitations prior to further benchmarking single-O, and incorporate real transcoders into the testing. This way we can stress the full transcoding pipeline in its entirety.

Testing Steps

One approach to doing multi-T benchmarking is this:

Identify a transcoding configuration which a given T can complete comfortably under real-time. One option is the set of 240p, 360p, 576p. Rough numbers show this is typically 45% faster than realtime with a single stream on a 16 vCPU machine.

- Verify single-T benchmark numbers with 240p, 360p, 576p or an equally comfortable alternative
- Using the current goclient master branch, determine the baseline success rate for 1O-1T-1 stream at 240p, 360p, 576p. The success rate may not be 100% due to variation in segment lengths.
- Increase the number of concurrent streams and concurrent transcoders. Maintain a 1-stream-1T ratio. Hypothesis: For our testing, 1O-4T-4 streams is likely to only see a roughly ~50% success rate on the master branch.
- Run the same 1O-4T-4S test on the ja/loadfactor branch. As long as there are no bottlenecks for O, we should see a similar success rate as 1O-1T on master. If not, we may need another approach to multi-T load balancing.
- Continue adding T/S until we hit diminishing returns for a single O (as marked by an increase in error rates).

The text was updated successfully, but these errors were encountered:

ya7ya · 2019-07-18T10:54:05Z

I ran the benchmarking to compare master to ja/loadfactor and although they have the same performance for 1 stream , ja/loadfactor performed significantly better when adding more concurrent streams (while maintaining 1 stream per T) here is the comparison.

setup	renditions	concurrent streams	master success rate	`ja/loadfactor` success rate
1B 1O 1T	P240p30fps16x9,P360p30fps16x9,P576p30fps16x9	1	96.7061	96.8961
1B 1O 2T	P240p30fps16x9,P360p30fps16x9,P576p30fps16x9	2	93.7922	97.8885
1B 1O 3T	P240p30fps16x9,P360p30fps16x9,P576p30fps16x9	3	84.1216	97.3395
1B 1O 4T	P240p30fps16x9,P360p30fps16x9,P576p30fps16x9	4	74.0287	96.7061

j0sh · 2019-09-20T18:41:26Z

@darkdarkdragon Can we close this issue?

j0sh assigned darkdarkdragon and ya7ya Jul 17, 2019

darkdarkdragon unassigned ya7ya Aug 14, 2019

darkdarkdragon closed this as completed Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-O Benchmarking #989

Single-O Benchmarking #989

j0sh commented Jul 17, 2019 •

edited

Loading

ya7ya commented Jul 18, 2019

j0sh commented Sep 20, 2019

Single-O Benchmarking #989

Single-O Benchmarking #989

Comments

j0sh commented Jul 17, 2019 • edited Loading

Single-O Testing

Single-O / Single-T

Single-O Multi-T

Testing Steps

ya7ya commented Jul 18, 2019

j0sh commented Sep 20, 2019

j0sh commented Jul 17, 2019 •

edited

Loading