sbin/start-slave.sh
script starts a Spark worker (aka slave) on the machine the script is executed on. It launches SPARK_WORKER_INSTANCES
instances.
./sbin/start-slave.sh [masterURL]
The mandatory masterURL
parameter is of the form spark://hostname:port
, e.g. spark://localhost:7077
. It is also possible to specify a comma-separated master URLs of the form spark://hostname1:port1,hostname2:port2,…
with each element to be hostname:port
.
Internally, the script starts sparkWorker RPC environment.
The order of importance of Spark configuration settings is as follows (from least to the most important):
The script uses the following system environment variables (directly or indirectly):
-
SPARK_WORKER_INSTANCES
(default:1
) - the number of worker instances to run on this slave. -
SPARK_WORKER_PORT
- the base port number to listen on for the first worker. If set, subsequent workers will increment this number. If unset, Spark will pick a random port. -
SPARK_WORKER_WEBUI_PORT
(default:8081
) - the base port for the web UI of the first worker. Subsequent workers will increment this number. If the port is used, the successive ports are tried until a free one is found. -
SPARK_WORKER_CORES
- the number of cores to use by a single executor -
SPARK_WORKER_MEMORY
(default:1G
)- the amount of memory to use, e.g.1000M
,2G
-
SPARK_WORKER_DIR
(default:$SPARK_HOME/work
) - the directory to run apps in
The script uses the following helper scripts:
-
sbin/spark-config.sh
-
bin/load-spark-env.sh
You can use the following command-line options:
-
--host
or-h
sets the hostname to be available under. -
--port
or-p
- command-line version of SPARK_WORKER_PORT environment variable. -
--cores
or-c
(default: the number of processors available to the JVM) - command-line version of SPARK_WORKER_CORES environment variable. -
--memory
or-m
- command-line version of SPARK_WORKER_MEMORY environment variable. -
--work-dir
or-d
- command-line version of SPARK_WORKER_DIR environment variable. -
--webui-port
- command-line version of SPARK_WORKER_WEBUI_PORT environment variable. -
--properties-file
(default:conf/spark-defaults.conf
) - the path to a custom Spark properties file -
--help
After loading the default SparkConf, if --properties-file
or SPARK_WORKER_OPTS
define spark.worker.ui.port
, the value of the property is used as the port of the worker’s web UI.
SPARK_WORKER_OPTS=-Dspark.worker.ui.port=21212 ./sbin/start-slave.sh spark://localhost:7077
or
$ cat worker.properties
spark.worker.ui.port=33333
$ ./sbin/start-slave.sh spark://localhost:7077 --properties-file worker.properties
Ultimately, the script calls sbin/spark-daemon.sh start
to kick off org.apache.spark.deploy.worker.Worker
with --webui-port
, --port
and the master URL.
Upon starting, a Spark worker creates the default SparkConf.
It parses command-line arguments for the worker using WorkerArguments
class.
-
SPARK_LOCAL_HOSTNAME
- custom host name -
SPARK_LOCAL_IP
- custom IP to use (whenSPARK_LOCAL_HOSTNAME
is not set or hostname resolves to incorrect IP)
It starts sparkWorker RPC Environment and waits until the RpcEnv terminates.
The org.apache.spark.deploy.worker.Worker
class starts its own sparkWorker RPC environment with Worker
endpoint.
The ./sbin/start-slaves.sh
script starts slave instances on each machine specified in the conf/slaves
file.
It has support for starting Tachyon using --with-tachyon
command line option. It assumes tachyon/bin/tachyon
command be available in Spark’s home directory.
The script uses the following helper scripts:
-
sbin/spark-config.sh
-
bin/load-spark-env.sh
-
conf/spark-env.sh
The script uses the following environment variables (and sets them when unavailable):
-
SPARK_PREFIX
-
SPARK_HOME
-
SPARK_CONF_DIR
-
SPARK_MASTER_PORT
-
SPARK_MASTER_IP
The following command will launch 3 worker instances on each node. Each worker instance will use two cores.
SPARK_WORKER_INSTANCES=3 SPARK_WORKER_CORES=2 ./sbin/start-slaves.sh