Skip to content

Latest commit

 

History

History
136 lines (86 loc) · 5.67 KB

spark-taskscheduler-FairSchedulableBuilder.adoc

File metadata and controls

136 lines (86 loc) · 5.67 KB

FairSchedulableBuilder - SchedulableBuilder for FAIR Scheduling Mode

FairSchedulableBuilder is a SchedulableBuilder with the pools configured in an optional allocations configuration file.

It reads the allocations file using the internal buildFairSchedulerPool method.

Tip

Enable INFO logging level for org.apache.spark.scheduler.FairSchedulableBuilder logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.scheduler.FairSchedulableBuilder=INFO

Refer to Logging.

buildPools

buildPools builds the rootPool based on the allocations configuration file from the optional spark.scheduler.allocation.file or fairscheduler.xml (on the classpath).

Note
buildPools is part of the SchedulableBuilder Contract.
Tip
Spark comes with fairscheduler.xml.template to use as a template for the allocations configuration file to start from.

addTaskSetManager

Note
addTaskSetManager is part of the SchedulableBuilder Contract.
Note
Although the Pool.getSchedulableByName method may return no Schedulable for a name, the default root pool does exist as it is assumed it was registered before.

If properties for the Schedulable were given, spark.scheduler.pool property is looked up and becomes the current pool name (or defaults to default).

Note
spark.scheduler.pool is the only property supported. Refer to spark.scheduler.pool later in this document.

If the pool name is not available, it is registered with the pool name, FIFO scheduling mode, minimum share 0, and weight 1.

After the new pool was registered, you should see the following INFO message in the logs:

INFO FairSchedulableBuilder: Created pool [poolName], schedulingMode: FIFO, minShare: 0, weight: 1

The manager schedulable is registered to the pool (either the one that already existed or was created just now).

You should see the following INFO message in the logs:

INFO FairSchedulableBuilder: Added task set [manager.name] to pool [poolName]

spark.scheduler.pool Property

SparkContext.setLocalProperty allows for setting properties per thread. This mechanism is used by FairSchedulableBuilder to watch for spark.scheduler.pool property to group jobs from a thread and submit them to a non-default pool.

val sc: SparkContext = ???
sc.setLocalProperty("spark.scheduler.pool", "myPool")
Tip
See addTaskSetManager for how this setting is used.

fairscheduler.xml Allocations Configuration File

The allocations configuration file is an XML file.

The default conf/fairscheduler.xml.template looks as follows:

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>
Tip
The top-level element’s name allocations can be anything. Spark does not insist on allocations and accepts any name.

Ensure Default Pool is Registered (buildDefaultPool method)

buildDefaultPool method checks whether default was defined already and if not it adds the default pool with FIFO scheduling mode, minimum share 0, and weight 1.

You should see the following INFO message in the logs:

INFO FairSchedulableBuilder: Created default pool default, schedulingMode: FIFO, minShare: 0, weight: 1

Build Pools from XML Allocations File (buildFairSchedulerPool method)

buildFairSchedulerPool(is: InputStream)

buildFairSchedulerPool reads Pools from the allocations configuration file (as is).

For each pool element, it reads its name (from name attribute) and assumes the default pool configuration to be FIFO scheduling mode, minimum share 0, and weight 1 (unless overrode later).

Caution
FIXME Why is the difference between minShare 0 and weight 1 vs rootPool in TaskSchedulerImpl.initialize - 0 and 0? It is definitely an inconsistency.

If schedulingMode element exists and is not empty for the pool it becomes the current pool’s scheduling mode. It is case sensitive, i.e. with all uppercase letters.

If minShare element exists and is not empty for the pool it becomes the current pool’s minShare. It must be an integer number.

If weight element exists and is not empty for the pool it becomes the current pool’s weight. It must be an integer number.

The pool is then registered to rootPool.

If all is successful, you should see the following INFO message in the logs:

INFO FairSchedulableBuilder: Created pool [poolName], schedulingMode: [schedulingMode], minShare: [minShare], weight: [weight]

Settings

spark.scheduler.allocation.file

spark.scheduler.allocation.file is the file path of an optional scheduler configuration file that FairSchedulableBuilder.buildPools uses to build pools.