This repository contains supplementary material for arXiv:1807.03078.
You must have Apache Spark and Jupyter notebook installed on your machine or your cluster. Other Python dependencies are described in the notebook.
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook" pyspark \
--master local[*] \
--packages $PACK
Standalone mode:
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook --debug --no-browser --port=$PORT1" pyspark \
--master $SPARKURL \
--packages $PACK \
--driver-memory $MEMDRIVER --executor-memory $MEMEXEC --executor-cores $EXECCORES --total-executor-cores $TOTALCORES
Source your DESC environment. Then go to the Jupyter Lab web interface, and execute the notebook with the desc-pyspark kernel.