Skip to content

Latest commit

 

History

History
81 lines (45 loc) · 4.56 KB

spark-taskscheduler-pool.adoc

File metadata and controls

81 lines (45 loc) · 4.56 KB

Schedulable Pool

Pool is a Schedulable entity that represents a tree of TaskSetManagers, i.e. it contains a collection of TaskSetManagers or the Pools thereof.

A Pool has a mandatory name, a scheduling mode, initial minShare and weight that are defined when it is created.

Note
An instance of Pool is created when TaskSchedulerImpl is initialized.
Note
The TaskScheduler Contract and Schedulable Contract both require that their entities have rootPool of type Pool.

taskSetSchedulingAlgorithm Attribute

Using the scheduling mode (given when a Pool object is created), Pool selects SchedulingAlgorithm and sets taskSetSchedulingAlgorithm:

It throws an IllegalArgumentException when unsupported scheduling mode is passed on:

Unsupported spark.scheduler.mode: [schedulingMode]
Tip
Read about the scheduling modes in SchedulingMode.
Note
taskSetSchedulingAlgorithm is used in getSortedTaskSetQueue.

addSchedulable

Note
addSchedulable is part of the Schedulable Contract.

addSchedulable adds a Schedulable to the schedulableQueue and schedulableNameToSchedulable.

More importantly, it sets the Schedulable entity’s parent to itself.

Getting TaskSetManagers Sorted (getSortedTaskSetQueue method)

Note
getSortedTaskSetQueue is part of the Schedulable Contract.

getSortedTaskSetQueue sorts all the Schedulables in schedulableQueue queue by a SchedulingAlgorithm (from the internal taskSetSchedulingAlgorithm).

Schedulables by Name (schedulableNameToSchedulable registry)

schedulableNameToSchedulable = new ConcurrentHashMap[String, Schedulable]

schedulableNameToSchedulable is a lookup table of Schedulable objects by their names.

Beside the obvious usage in the housekeeping methods like addSchedulable, removeSchedulable, getSchedulableByName from the Schedulable Contract, it is exclusively used in SparkContext.getPoolForName.

SchedulingAlgorithm

SchedulingAlgorithm is the interface for a sorting algorithm to sort Schedulables.

There are currently two SchedulingAlgorithms:

FIFOSchedulingAlgorithm

FIFOSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their priority first and, when equal, by their stageId.

Note
priority and stageId are part of Schedulable Contract.
Caution
FIXME A picture is worth a thousand words. How to picture the algorithm?

FairSchedulingAlgorithm

FairSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their minShare, runningTasks, and weight.

Note
minShare, runningTasks, and weight are part of Schedulable Contract.
spark pool FairSchedulingAlgorithm
Figure 1. FairSchedulingAlgorithm

For each input Schedulable, minShareRatio is computed as runningTasks by minShare (but at least 1) while taskToWeightRatio is runningTasks by weight.