Skip to content

Latest commit

 

History

History
63 lines (49 loc) · 2.46 KB

hail_guide.md

File metadata and controls

63 lines (49 loc) · 2.46 KB

Guide to use Hail on Alpine:

Hail is a genomic analysis tool that enables distributed parallel computing over multiple computer nodes. In this guide, we plan to demonstrate how to utilize Hail on Alpine interactively. The python script that we use to demonstrate it was downloaded from here.

Mamba utilization steps.

  1. Make sure to clone the hail repository that we implemented and go into that directory. We choose to this in the scratch directory.

    cd /scratch/alpine/$USER
    git clone https://github.com/kf-cuanschutz/Hail_support_cu_anschutz.git
    cd Hail_support_cu_anschutz
  2. Now for this demonstration, we want to request 2 cores per nodes and 2 nodes to demonstrate the parallel distribution. The CPU partition to demonstrate testing on Alpine is called "atesting". Please refer to the CU Boulder page for more information. We request the partition for a walltime of 10 minutes.

    sinteractive --partition=atesting --nodes=2 --ntasks=2 --time=00:10:00
  3. Now we want to export all the TMP related variables to scratch:

    export TMP=/gpfs/alpine1/scratch/$USER/cache_dir
    mkdir -pv $TMP
    export TEMP=$TMP
    export TMPDIR=$TMP
    export TEMPDIR=$TMP
    export PIP_CACHE_DIR=$TMP
  4. If you need to use hail please submit a request by emailing [email protected] and we will install Hail for your lab. For this demonstration, we are using an already existing hail install path that belongs to a lab.

    module use --append /pl/active/CCPM/software/lmod-files
    module load hail
  5. We want to make sure that we save the name of the nodes we requested inside a text file that we will later call.

    scontrol show hostname > $SLURM_SUBMIT_DIR/nodelist.txt
    export SLURM_NODEFILE=$SLURM_SUBMIT_DIR/nodelist.txt
  6. We may now execute the python script.

    python slurm-spark-submit \

--jars $HAIL_HOME/backend/hail-all-spark.jar
--conf spark.driver.extraClassPath=$HAIL_HOME/backend/hail-all-spark.jar
--conf spark.executor.extraClassPath=./hail-all-spark.jar
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator
--work-dir $SLURM_SUBMIT_DIR
hail-script.py --temp_dir $TMP