Analysing billion-objects catalogue interactively: Apache Spark for physicists

This repository contains supplementary material for arXiv:1807.03078.

How to run the notebook

You must have Apache Spark and Jupyter notebook installed on your machine or your cluster. Other Python dependencies are described in the notebook.

On a local machine

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook" pyspark \
     --master local[*] \
     --packages $PACK

On a cluster

Standalone mode:

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook --debug --no-browser --port=$PORT1" pyspark \
     --master $SPARKURL \
     --packages $PACK \
     --driver-memory $MEMDRIVER --executor-memory $MEMEXEC --executor-cores $EXECCORES --total-executor-cores $TOTALCORES

DESC members: working at NERSC

Source your DESC environment. Then go to the Jupyter Lab web interface, and execute the notebook with the desc-pyspark kernel.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
Spark4Physicists.ipynb		Spark4Physicists.ipynb
Spark4Physicists.scala		Spark4Physicists.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysing billion-objects catalogue interactively: Apache Spark for physicists

How to run the notebook

On a local machine

On a cluster

DESC members: working at NERSC

About

Releases

Packages

Languages

License

abualia4/1807.03078

Folders and files

Latest commit

History

Repository files navigation

Analysing billion-objects catalogue interactively: Apache Spark for physicists

How to run the notebook

On a local machine

On a cluster

DESC members: working at NERSC

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages