ExecutorRunnable
starts a YARN container with CoarseGrainedExecutorBackend
application. If external shuffle service is used, it is set in the ContainerLaunchContext
context as a service data using spark_shuffle
.
Note
|
Despite the name ExecutorRunnable is not a java.lang.Runnable anymore after SPARK-12447.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
ExecutorRunnable(
container: Container,
conf: Configuration,
sparkConf: SparkConf,
masterAddress: String,
slaveId: String,
hostname: String,
executorMemory: Int,
executorCores: Int,
appId: String,
securityMgr: SecurityManager,
localResources: Map[String, LocalResource])
YarnAllocator
creates an instance of ExecutorRunnable
when launching Spark executors in allocated YARN containers.
A single ExecutorRunnable
is created with the YARN container to run a Spark executor in.
The input conf
(Hadoop’s Configuration), sparkConf
, masterAddress
directly correspond to the constructor arguments of YarnAllocator.
The input slaveId
is from the internal counter in YarnAllocator
.
The input hostname
is the host of the YARN container.
The input executorMemory
and executorCores
are from YarnAllocator
, but come from spark.executor.memory and spark.executor.cores configuration settings.
The input appId
, securityMgr
, and localResources
are the same as YarnAllocator
was created for.
When called, you should see the following INFO message in the logs:
INFO ExecutorRunnable: Starting Executor Container
It ultimately starts CoarseGrainedExecutorBackend
in the container.
startContainer(): java.util.Map[String, ByteBuffer]
startContainer
uses the NMClient API to start a CoarseGrainedExecutorBackend in a YARN container.
When startContainer
is executed, you should see the following INFO message in the logs:
INFO ExecutorRunnable: Setting up ContainerLaunchContext
It then creates a YARN ContainerLaunchContext (which represents all of the information for the YARN NodeManager to launch a container) with the local resources and environment being the localResources
and env
, respectively, passed in to the ExecutorRunnable
when it was created. It also sets security tokens.
It prepares the command to launch CoarseGrainedExecutorBackend
with all the details as provided when the ExecutorRunnable
was created.
You should see the following INFO message in the logs:
INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
[key] -> [value]
...
command:
[commands]
===============================================================================
The command is set to the just-created ContainerLaunchContext
.
It sets application ACLs using YarnSparkHadoopUtil.getApplicationAclsForYarn.
If external shuffle service is used, it registers with the YARN shuffle service already started on the NodeManager. The external shuffle service is set in the ContainerLaunchContext
context as a service data using spark_shuffle
.
Ultimately, startContainer
requests the YARN NodeManager to start the YARN container for a Spark executor (as passed in when the ExecutorRunnable
was created) with the ContainerLaunchContext
context.
If any exception happens, a SparkException
is thrown.
Exception while starting container [containerId] on host [hostname]
Note
|
startContainer is exclusively called as a part of running ExecutorRunnable .
|
prepareCommand(
masterAddress: String,
slaveId: String,
hostname: String,
executorMemory: Int,
executorCores: Int,
appId: String): List[String]
prepareCommand
is a private method to prepare the command that is used to start org.apache.spark.executor.CoarseGrainedExecutorBackend
application in a YARN container. All the input parameters of prepareCommand
become the command-line arguments of CoarseGrainedExecutorBackend
application.
The input executorMemory
is in m
and becomes -Xmx
in the JVM options.
It uses the optional spark.executor.extraJavaOptions for the JVM options.
If the optional SPARK_JAVA_OPTS
environment variable is defined, it is added to the JVM options.
It uses the optional spark.executor.extraLibraryPath to set prefixEnv
. It uses Client.getClusterPath
.
Caution
|
FIXME Client.getClusterPath ?
|
It sets -Dspark.yarn.app.container.log.dir=<LOG_DIR>
It sets the user classpath (using Client.getUserClasspath
).
Caution
|
FIXME Client.getUserClasspath ?
|
Finally, it creates the entire command to start org.apache.spark.executor.CoarseGrainedExecutorBackend with the following arguments:
-
--driver-url
being the inputmasterAddress
-
--executor-id
being the inputslaveId
-
--hostname
being the inputhostname
-
--cores
being the inputexecutorCores
-
--app-id
being the inputappId
yarnConf
is an instance of YARN’s YarnConfiguration. It is created when ExecutorRunnable
is.