Skip to content
Ved Prakash Singh edited this page Feb 14, 2016 · 34 revisions
                               _               _             _                    _           _ 
                              | |             | |           | |                  (_)         | |
  ___   _ __     __ _   _ __  | | __  ______  | |_   _   _  | |_    ___    _ __   _    __ _  | |
 / __| | '_ \   / _` | | '__| | |/ / |______| | __| | | | | | __|  / _ \  | '__| | |  / _` | | |
 \__ \ | |_) | | (_| | | |    |   <           | |_  | |_| | | |_  | (_) | | |    | | | (_| | | |
 |___/ | .__/   \__,_| |_|    |_|\_\           \__|  \__,_|  \__|  \___/  |_|    |_|  \__,_| |_|
       | |                                                                                      
       |_|                                                                                      

This tutorial provides a quick introduction to using Spark. It demonstrates the basic functionality of RDD and DataFrame API

Initializing Spark

val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)

Note: Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.

We have tried to cover basics of Spark Core, SQL, Streaming, ML and GraphX programming contexts.

*Create SQL Context *Creating DataFrames *Creating Datasets *Inferring the Schema using Reflection *Programmatically Specifying the Schema *DataFrame Operations in JSON file *DataFrame Operations in Text file *DataFrame Operations in CSV file *DataFarme API *Action *Basic DataFrame functions *DataFrame Operations

Welcome to the spark-tutorial wiki!

We are using SparkCommon from Utils package to run the Examples of in this tutorial.

object SparkCommon {

  lazy val conf = {
    new SparkConf(false)
      .setMaster("local[*]")
      .setAppName("Spark Tutorial")
  }
  lazy val sparkContext = new SparkContext(conf)
  lazy val sparkSQLContext = SQLContext.getOrCreate(sparkContext)
  lazy val streamingContext = StreamingContext.getActive()
    .getOrElse(new StreamingContext(sparkContext, Seconds(2)))
}

Spark Core

Clone this wiki locally