You can download pre-packaged versions of Apache Spark from the project’s web site. The packages are built for a different Hadoop versions, but only for Scala 2.10.
Note
|
Since [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version the default version of Scala is 2.11. |
If you want a Scala 2.11 version of Apache Spark "users should download the Spark source package and build with Scala 2.11 support" (quoted from the Note at Download Spark).
The build process for Scala 2.11 takes around 15 mins (on a decent machine) and is so simple that it’s unlikely to refuse the urge to do it yourself.
The build command with sbt as the build tool is as follows:
./build/sbt -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean assembly
Using Java 8 to build Spark using sbt takes ca 10 minutes.
➜ spark git:(master) ✗ ./build/sbt -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean assembly
...
[success] Total time: 496 s, completed Dec 7, 2015 8:24:41 PM
The build command with Apache Maven is as follows:
$ ./build/mvn -Pyarn -P 'hadoop-2.7' -Phive -Phive-thriftserver -DskipTests clean install
After a couple of minutes your freshly baked distro is ready to fly!
I’m using Oracle Java 8 to build Spark.
➜ spark git:(master) ✗ java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
➜ spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean install
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Using `mvn` from path: /usr/local/bin/mvn
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Project Tags
[INFO] Spark Project Sketch
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Unsafe
[INFO] Spark Project Launcher
[INFO] Spark Project Core
[INFO] Spark Project GraphX
[INFO] Spark Project Streaming
[INFO] Spark Project Catalyst
[INFO] Spark Project SQL
[INFO] Spark Project ML Local Library
[INFO] Spark Project ML Library
[INFO] Spark Project Tools
[INFO] Spark Project Hive
[INFO] Spark Project REPL
[INFO] Spark Project YARN Shuffle Service
[INFO] Spark Project YARN
[INFO] Spark Project Hive Thrift Server
[INFO] Spark Project Assembly
[INFO] Spark Project External Flume Sink
[INFO] Spark Project External Flume
[INFO] Spark Project External Flume Assembly
[INFO] Spark Integration for Kafka 0.8
[INFO] Spark Project Examples
[INFO] Spark Project External Kafka Assembly
[INFO] Spark Integration for Kafka 0.10
[INFO] Spark Integration for Kafka 0.10 Assembly
[INFO] Spark Project Java 8 Tests
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Parent POM 2.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 4.186 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 4.893 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 5.066 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 11.108 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.051 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 7.650 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 9.905 s]
[INFO] Spark Project Core ................................. SUCCESS [02:09 min]
[INFO] Spark Project GraphX ............................... SUCCESS [ 19.317 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 42.077 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:32 min]
[INFO] Spark Project SQL .................................. SUCCESS [01:47 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.049 s]
[INFO] Spark Project ML Library ........................... SUCCESS [01:36 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 3.520 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 52.528 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 7.243 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 7.898 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 15.380 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 24.876 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 2.971 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 7.377 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 10.752 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 1.695 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 13.013 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 31.728 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 3.472 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 12.297 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.789 s]
[INFO] Spark Project Java 8 Tests ......................... SUCCESS [ 4.267 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12:29 min
[INFO] Finished at: 2016-07-07T22:29:56+02:00
[INFO] Final Memory: 110M/913M
[INFO] ------------------------------------------------------------------------
Please note the messages that say the version of Spark (Building Spark Project Parent POM 2.0.0-SNAPSHOT), Scala version (maven-clean-plugin:2.6.1:clean (default-clean) @ spark-parent_2.11) and the Spark modules built.
The above command gives you the latest version of Apache Spark 2.0.0-SNAPSHOT built for Scala 2.11.8 (see the configuration of scala-2.11
profile).
Tip
|
You can also know the version of Spark using ./bin/spark-shell --version .
|
./make-distribution.sh
is the shell script to make a distribution. It uses the same profiles as for sbt and Maven.
Use --tgz
option to have a tar gz version of the Spark distribution.
➜ spark git:(master) ✗ ./make-distribution.sh --tgz -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests
Once finished, you will have the distribution in the current directory, i.e. spark-2.0.0-SNAPSHOT-bin-2.7.2.tgz
.