Spark shell is an interactive shell for learning about Apache Spark, ad-hoc queries and developing Spark applications. It is a very convenient tool to explore the many things available in Spark and one of the many reasons why Spark is so helpful even for very simple tasks (see Why Spark).
There are variants of Spark for different languages: spark-shell
for Scala and pyspark
for Python.
Note
|
This document uses spark-shell only.
|
spark-shell
is based on Scala REPL with automatic instantiation of Spark context as sc
and SQL context as spark
.
Note
|
When you execute
Set |
Spark shell boils down to executing Spark submit and so command-line arguments of Spark submit become Spark shell’s, e.g. --verbose
.
You start Spark shell using spark-shell
script (available in bin
directory).
$ ./bin/spark-shell
Spark context available as sc.
SQL context available as spark.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0-SNAPSHOT
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Spark shell gives you the sc
value which is the SparkContext for the session.
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@2ac0cb64
Besides, there is also spark
which is an instance of org.apache.spark.sql.SQLContext to use Spark SQL. Refer to Spark SQL.
scala> spark
res1: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@60ae950f
To close Spark shell, you press Ctrl+D
or type in :q
(or any subset of :quit
).
scala> :quit
One way to learn about a tool like the Spark shell is to read its error messages. Together with the source code it may be a viable tool to reach mastery.
Let’s give it a try using spark-shell
.
While trying it out using an incorrect value for the master’s URL, you’re told about --help
and --verbose
options.
➜ spark git:(master) ✗ ./bin/spark-shell --master mss
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
You’re also told about the acceptable values for --master
.
Let’s see what --verbose
gives us.
➜ spark git:(master) ✗ ./bin/spark-shell --verbose --master mss
Using properties file: null
Parsed arguments:
master mss
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile null
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass org.apache.spark.repl.Main
primaryResource spark-shell
name Spark shell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file null:
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
Tip
|
These `null’s could instead be replaced with some other, more meaningful values. |