Skip to content

Latest commit

 

History

History
160 lines (106 loc) · 7.26 KB

spark-yarn-ExecutorRunnable.adoc

File metadata and controls

160 lines (106 loc) · 7.26 KB

ExecutorRunnable

Note
Despite the name ExecutorRunnable is not a java.lang.Runnable anymore after SPARK-12447.
Tip

Enable INFO logging level for org.apache.spark.deploy.yarn.ExecutorRunnable logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.deploy.yarn.ExecutorRunnable=INFO

Refer to Logging.

Creating ExecutorRunnable Instance

ExecutorRunnable(
  container: Container,
  conf: Configuration,
  sparkConf: SparkConf,
  masterAddress: String,
  slaveId: String,
  hostname: String,
  executorMemory: Int,
  executorCores: Int,
  appId: String,
  securityMgr: SecurityManager,
  localResources: Map[String, LocalResource])

YarnAllocator creates an instance of ExecutorRunnable when launching Spark executors in allocated YARN containers.

A single ExecutorRunnable is created with the YARN container to run a Spark executor in.

The input conf (Hadoop’s Configuration), sparkConf, masterAddress directly correspond to the constructor arguments of YarnAllocator.

The input slaveId is from the internal counter in YarnAllocator.

The input hostname is the host of the YARN container.

The input executorMemory and executorCores are from YarnAllocator, but come from spark.executor.memory and spark.executor.cores configuration settings.

The input appId, securityMgr, and localResources are the same as YarnAllocator was created for.

prepareEnvironment

Caution
FIXME

Running ExecutorRunnable (run method)

When called, you should see the following INFO message in the logs:

INFO ExecutorRunnable: Starting Executor Container

It creates a YARN NMClient, inits it with yarnConf and starts it.

Starting CoarseGrainedExecutorBackend in Container (startContainer method)

startContainer(): java.util.Map[String, ByteBuffer]

startContainer uses the NMClient API to start a CoarseGrainedExecutorBackend in a YARN container.

When startContainer is executed, you should see the following INFO message in the logs:

INFO ExecutorRunnable: Setting up ContainerLaunchContext

It then creates a YARN ContainerLaunchContext (which represents all of the information for the YARN NodeManager to launch a container) with the local resources and environment being the localResources and env, respectively, passed in to the ExecutorRunnable when it was created. It also sets security tokens.

It prepares the command to launch CoarseGrainedExecutorBackend with all the details as provided when the ExecutorRunnable was created.

You should see the following INFO message in the logs:

INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
  env:
    [key] -> [value]
    ...

  command:
    [commands]
===============================================================================

The command is set to the just-created ContainerLaunchContext.

It sets application ACLs using YarnSparkHadoopUtil.getApplicationAclsForYarn.

If external shuffle service is used, it registers with the YARN shuffle service already started on the NodeManager. The external shuffle service is set in the ContainerLaunchContext context as a service data using spark_shuffle.

Ultimately, startContainer requests the YARN NodeManager to start the YARN container for a Spark executor (as passed in when the ExecutorRunnable was created) with the ContainerLaunchContext context.

If any exception happens, a SparkException is thrown.

Exception while starting container [containerId] on host [hostname]
Note
startContainer is exclusively called as a part of running ExecutorRunnable.

Preparing Command to Launch CoarseGrainedExecutorBackend (prepareCommand method)

prepareCommand(
  masterAddress: String,
  slaveId: String,
  hostname: String,
  executorMemory: Int,
  executorCores: Int,
  appId: String): List[String]

prepareCommand is a private method to prepare the command that is used to start org.apache.spark.executor.CoarseGrainedExecutorBackend application in a YARN container. All the input parameters of prepareCommand become the command-line arguments of CoarseGrainedExecutorBackend application.

The input executorMemory is in m and becomes -Xmx in the JVM options.

It uses the optional spark.executor.extraJavaOptions for the JVM options.

If the optional SPARK_JAVA_OPTS environment variable is defined, it is added to the JVM options.

It uses the optional spark.executor.extraLibraryPath to set prefixEnv. It uses Client.getClusterPath.

Caution
FIXME Client.getClusterPath?

It sets -Dspark.yarn.app.container.log.dir=<LOG_DIR> It sets the user classpath (using Client.getUserClasspath).

Caution
FIXME Client.getUserClasspath?

Finally, it creates the entire command to start org.apache.spark.executor.CoarseGrainedExecutorBackend with the following arguments:

  • --driver-url being the input masterAddress

  • --executor-id being the input slaveId

  • --hostname being the input hostname

  • --cores being the input executorCores

  • --app-id being the input appId

Internal Registries

yarnConf

yarnConf is an instance of YARN’s YarnConfiguration. It is created when ExecutorRunnable is.