Skip to content

Latest commit

 

History

History
123 lines (92 loc) · 4.34 KB

spark-shell.adoc

File metadata and controls

123 lines (92 loc) · 4.34 KB

Spark shell

Spark shell is an interactive shell for learning about Apache Spark, ad-hoc queries and developing Spark applications. It is a very convenient tool to explore the many things available in Spark and one of the many reasons why Spark is so helpful even for very simple tasks (see Why Spark).

There are variants of Spark for different languages: spark-shell for Scala and pyspark for Python.

Note
This document uses spark-shell only.

spark-shell is based on Scala REPL with automatic instantiation of Spark context as sc and SQL context as spark.

Note

When you execute spark-shell it executes Spark submit as follows:

org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shell

Set SPARK_PRINT_LAUNCH_COMMAND to see the entire command to be executed. Refer to Print Launch Command of Spark Scripts.

Spark shell boils down to executing Spark submit and so command-line arguments of Spark submit become Spark shell’s, e.g. --verbose.

Using Spark shell

You start Spark shell using spark-shell script (available in bin directory).

$ ./bin/spark-shell
Spark context available as sc.
SQL context available as spark.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
      /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Spark shell gives you the sc value which is the SparkContext for the session.

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@2ac0cb64

Besides, there is also spark which is an instance of org.apache.spark.sql.SQLContext to use Spark SQL. Refer to Spark SQL.

scala> spark
res1: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@60ae950f

To close Spark shell, you press Ctrl+D or type in :q (or any subset of :quit).

scala> :quit

Learning Spark interactively

One way to learn about a tool like the Spark shell is to read its error messages. Together with the source code it may be a viable tool to reach mastery.

Let’s give it a try using spark-shell.

While trying it out using an incorrect value for the master’s URL, you’re told about --help and --verbose options.

➜  spark git:(master) ✗ ./bin/spark-shell --master mss
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output

You’re also told about the acceptable values for --master.

Let’s see what --verbose gives us.

➜  spark git:(master) ✗ ./bin/spark-shell --verbose --master mss
Using properties file: null
Parsed arguments:
  master                  mss
  deployMode              null
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.repl.Main
  primaryResource         spark-shell
  name                    Spark shell
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
Tip
These `null’s could instead be replaced with some other, more meaningful values.