FairSchedulableBuilder
is a SchedulableBuilder with the pools configured in an optional allocations configuration file.
It reads the allocations file using the internal buildFairSchedulerPool method.
Tip
|
Enable Add the following line to
Refer to Logging. |
buildPools
builds the rootPool
based on the allocations configuration file from the optional spark.scheduler.allocation.file or fairscheduler.xml
(on the classpath).
Note
|
buildPools is part of the SchedulableBuilder Contract.
|
Tip
|
Spark comes with fairscheduler.xml.template to use as a template for the allocations configuration file to start from.
|
addTaskSetManager
looks up the default pool (using Pool.getSchedulableByName).
Note
|
addTaskSetManager is part of the SchedulableBuilder Contract.
|
Note
|
Although the Pool.getSchedulableByName method may return no Schedulable for a name, the default root pool does exist as it is assumed it was registered before.
|
If properties
for the Schedulable
were given, spark.scheduler.pool
property is looked up and becomes the current pool name (or defaults to default
).
Note
|
spark.scheduler.pool is the only property supported. Refer to spark.scheduler.pool later in this document.
|
If the pool name is not available, it is registered with the pool name, FIFO
scheduling mode, minimum share 0
, and weight 1
.
After the new pool was registered, you should see the following INFO message in the logs:
INFO FairSchedulableBuilder: Created pool [poolName], schedulingMode: FIFO, minShare: 0, weight: 1
The manager
schedulable is registered to the pool (either the one that already existed or was created just now).
You should see the following INFO message in the logs:
INFO FairSchedulableBuilder: Added task set [manager.name] to pool [poolName]
SparkContext.setLocalProperty allows for setting properties per thread. This mechanism is used by FairSchedulableBuilder
to watch for spark.scheduler.pool
property to group jobs from a thread and submit them to a non-default pool.
val sc: SparkContext = ???
sc.setLocalProperty("spark.scheduler.pool", "myPool")
Tip
|
See addTaskSetManager for how this setting is used. |
The allocations configuration file is an XML file.
The default conf/fairscheduler.xml.template
looks as follows:
<?xml version="1.0"?>
<allocations>
<pool name="production">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
<pool name="test">
<schedulingMode>FIFO</schedulingMode>
<weight>2</weight>
<minShare>3</minShare>
</pool>
</allocations>
Tip
|
The top-level element’s name allocations can be anything. Spark does not insist on allocations and accepts any name.
|
buildDefaultPool
method checks whether default
was defined already and if not it adds the default
pool with FIFO
scheduling mode, minimum share 0
, and weight 1
.
You should see the following INFO message in the logs:
INFO FairSchedulableBuilder: Created default pool default, schedulingMode: FIFO, minShare: 0, weight: 1
buildFairSchedulerPool(is: InputStream)
buildFairSchedulerPool
reads Pools from the allocations configuration file (as is
).
For each pool
element, it reads its name (from name
attribute) and assumes the default pool configuration to be FIFO
scheduling mode, minimum share 0
, and weight 1
(unless overrode later).
Caution
|
FIXME Why is the difference between minShare 0 and weight 1 vs rootPool in TaskSchedulerImpl.initialize - 0 and 0? It is definitely an inconsistency.
|
If schedulingMode
element exists and is not empty for the pool it becomes the current pool’s scheduling mode. It is case sensitive, i.e. with all uppercase letters.
If minShare
element exists and is not empty for the pool it becomes the current pool’s minShare
. It must be an integer number.
If weight
element exists and is not empty for the pool it becomes the current pool’s weight
. It must be an integer number.
The pool is then registered to rootPool
.
If all is successful, you should see the following INFO message in the logs:
INFO FairSchedulableBuilder: Created pool [poolName], schedulingMode: [schedulingMode], minShare: [minShare], weight: [weight]
spark.scheduler.allocation.file
is the file path of an optional scheduler configuration file that FairSchedulableBuilder.buildPools uses to build pools.