Skip to content

Latest commit

 

History

History
41 lines (24 loc) · 2.08 KB

spark-taskscheduler-tasksets.adoc

File metadata and controls

41 lines (24 loc) · 2.08 KB

TaskSets

Introduction

A TaskSet is a collection of tasks that belong to a single stage and a stage attempt. It has also priority and properties attributes. Priority is used in FIFO scheduling mode (see Priority Field and FIFO Scheduling) while properties are the properties of the first job in the stage.

Caution
FIXME Where are properties of a TaskSet used?

A TaskSet represents the missing partitions of a stage.

The pair of a stage and a stage attempt uniquely describes a TaskSet and that is what you can see in the logs when a TaskSet is used:

TaskSet [stageId].[stageAttemptId]

A TaskSet contains a fully-independent sequence of tasks that can run right away based on the data that is already on the cluster, e.g. map output files from previous stages, though it may fail if this data becomes unavailable.

TaskSet can be submitted (consult TaskScheduler Contract).

removeRunningTask

Caution
FIXME Review TaskSet.removeRunningTask(tid)

Where TaskSets are used

  • DAGScheduler.submitMissingTasks

    • TaskSchedulerImpl.submitTasks

  • DAGScheduler.taskSetFailed

  • DAGScheduler.handleTaskSetFailed

  • TaskSchedulerImpl.createTaskSetManager

Priority Field and FIFO Scheduling

A TaskSet has priority field that turns into the priority field’s value of TaskSetManager (which is a Schedulable).

The priority field is used in FIFOSchedulingAlgorithm in which equal priorities give stages an advantage (not to say priority).

Note
FIFOSchedulingAlgorithm is only used for FIFO scheduling mode in a Pool (i.e. a schedulable collection of Schedulable objects).

Effectively, the priority field is the job’s id of the first job this stage was part of (for FIFO scheduling).