This tutorial provides a quick introduction to using Spark. It demonstrates the basic functionality of RDD and DataFrame API
This is the start of using Spark with Scala, from next week onwards we would be working on this tutorial to make it grow. We would look at how we can add more functionality into it , then we would be adding more modules to it together. If you have any changes then feel free to send in pull requests and we would do the merges :) Stay tuned.
####Spark Core
- [Introduction ] (https://github.com/rklick-solutions/spark-tutorial/wiki/Spark-Core#introduction-to-apache-spark)
- Features
- Initializing
- [RDDs] (https://github.com/rklick-solutions/spark-tutorial/wiki/Spark-Core#rdds)
- [Create] (https://github.com/rklick-solutions/spark-tutorial/wiki/Spark-Core#create-rdds)
- Operations
- Introduction
- Create SQL Context
- Basic Query
- DataFrames
- Creating DataFrames
- DataFarme API Functionality
- Interoperating with RDDs
- Inferring the Schema using Reflection
- Programmatically Specifying the Schema
- Data Sources
- DataFrame Operations in JSON file
- DataFrame Operations in Text file
- DataFrame Operations in CSV file
- Datasets
- Creating Datasets
- Basic Opeartion