Skip to content

Latest commit

 

History

History
370 lines (323 loc) · 19.8 KB

SUMMARY.adoc

File metadata and controls

370 lines (323 loc) · 19.8 KB

Summary

  1. Introduction

  2. Overview of Spark

  3. Anatomy of Spark Application

  4. Spark Tools

  5. Spark Architecture

  6. Spark Services

    1. MemoryManager — Memory Management

    2. SparkEnv — Spark Runtime Environment

    3. DAGScheduler

    4. Task Scheduler

    5. Scheduler Backend

    6. Executor Backend

    7. BlockManager

    8. Dynamic Allocation (of Executors)

    9. Shuffle Manager

    10. ExternalClusterManager

    11. HTTP File Server

    12. Broadcast Manager

    13. Data Locality

    14. Cache Manager

    15. Spark, Akka and Netty

    16. OutputCommitCoordinator

    17. RPC Environment (RpcEnv)

    18. ContextCleaner

    19. MapOutputTracker

  7. Deployment Environments — Run Modes

    1. Spark local (pseudo-cluster)

    2. Spark on cluster

      1. Spark on YARN

      2. Spark Standalone

      3. Spark on Mesos

  8. Execution Model

  9. Optimising Spark

  10. Security

  11. Data Sources in Spark

  12. Spark Application Frameworks

    1. Spark SQL

      1. SparkSession — Entry Point to Datasets

      2. SQLConf

      3. Catalog

      4. Dataset

      5. DataSource API — Loading and Saving Datasets

      6. Functions - Computations on Rows

      7. Structured Streaming

      8. Joins

      9. Hive Integration

      10. SQL Parsers

      11. Caching

      12. Datasets vs RDDs

      13. SessionState

      14. SQLExecution Helper Object

      15. SQLContext

      16. Performance Optimizations

      17. Settings

    2. Spark Streaming

    3. Spark MLlib - Machine Learning in Spark

    4. Spark GraphX - Distributed Graph Computations

  13. Monitoring, Tuning and Debugging

  14. Varia

  15. Interactive Notebooks

  16. Spark Tips and Tricks

  17. Exercises

  18. Further Learning

  19. Spark Distributions

  20. Commercial Products using Apache Spark

  21. Spark Advanced Workshop

  22. Spark Talks Ideas (STI)