Hail is a genomic analysis tool that enables distributed parallel computing over multiple computer nodes. In this guide, we plan to demonstrate how to utilize Hail on Alpine interactively. The python script that we use to demonstrate it was downloaded from here.
-
Make sure to clone the hail repository that we implemented and go into that directory. We choose to this in the scratch directory.
cd /scratch/alpine/$USER git clone https://github.com/kf-cuanschutz/Hail_support_cu_anschutz.git cd Hail_support_cu_anschutz
-
Now for this demonstration, we want to request 2 cores per nodes and 2 nodes to demonstrate the parallel distribution. The CPU partition to demonstrate testing on Alpine is called "atesting". Please refer to the CU Boulder page for more information. We request the partition for a walltime of 10 minutes.
sinteractive --partition=atesting --nodes=2 --ntasks=2 --time=00:10:00
-
Now we want to export all the TMP related variables to scratch:
export TMP=/gpfs/alpine1/scratch/$USER/cache_dir mkdir -pv $TMP export TEMP=$TMP export TMPDIR=$TMP export TEMPDIR=$TMP export PIP_CACHE_DIR=$TMP
-
If you need to use hail please submit a request by emailing [email protected] and we will install Hail for your lab. For this demonstration, we are using an already existing hail install path that belongs to a lab.
module use --append /pl/active/CCPM/software/lmod-files module load hail
-
We want to make sure that we save the name of the nodes we requested inside a text file that we will later call.
scontrol show hostname > $SLURM_SUBMIT_DIR/nodelist.txt export SLURM_NODEFILE=$SLURM_SUBMIT_DIR/nodelist.txt
-
We may now execute the python script.
python slurm-spark-submit \
--jars $HAIL_HOME/backend/hail-all-spark.jar
--conf spark.driver.extraClassPath=$HAIL_HOME/backend/hail-all-spark.jar
--conf spark.executor.extraClassPath=./hail-all-spark.jar
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator
--work-dir $SLURM_SUBMIT_DIR
hail-script.py --temp_dir $TMP