GPU Load Balancing #1108

j0sh · 2019-10-02T06:36:12Z

The current load balancing algorithm for GPUs is a simple, stateless round robin. For each segment:

gpu = gpus[i++ % len(gpus)`

With the LPMS bottleneck fix, transcode sessions become stateful. A naive approach to maintaining "sticky" sessions is this:

if sessions[id].nil ? sessions[id] = gpus[i++ % len(gpus)]
gpu = sessions[id]

Can we do better than the naive approach? Probably.

Some challenges here are:

Segments for a given session may be sporadic
- O / T may not necessarily be the “primary” for a given stream
Session assignment on GPUs may grow unbalanced
- Sessions come and go, some may be long-lived, others short-lived
Workload varies per session
- Different transcoding profiles have different performance characteristics, segment lengths, etc
Hardware capabilities not necessarily known
- Would be nice not to manually tune / configure

Need to determine:

TODOs:

The text was updated successfully, but these errors were encountered:

j0sh added the Epic label Oct 2, 2019

j0sh mentioned this issue Nov 8, 2019

Integrate new LPMS API and load balancer. #1124

Merged

3 tasks

darkdarkdragon mentioned this issue Dec 16, 2019

Make RemoteTranscoderManager sticky for LB #1273

Closed

j0sh closed this as completed in #1124 Jan 10, 2020

Provide feedback