Skip to content
Marcus DuPont edited this page Oct 20, 2022 · 20 revisions

Running on Palmetto (Clemson)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Install dependency Python modules
pip3 install --user cffi # only needed for running in CPU mode
pip3 install --user cupy-cuda111 # only needed for running in GPU mode
  1. Get an interactive compute node
qsub -I -l select=1:ncpus=56:mem=200gb:ngpus=2:gpu_model=a100,walltime=24:00:00
  1. Load the cuda module
module load cuda/11.1.1-gcc/9.5.0
  1. Go to the sailfish directory and run something
cd sailfish
./scripts/main.py -g kitp-code-comparison

To redirect your data outputs to the zfs filesystem, add the flag -o /zfs/warpgate/your-username.

Running on Pleiades (NASA)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Install dependency Python modules

Pleiades has a good installation of Python 3.9.5 that already has cffi. They are on CUDA 11.0; the matching cupy version is cupy-cuda110.

module load python3
pip3 install cupy-cuda110
  1. Get an interactive compute node (below will give you one of the nodes with 8 NVIDIA V100's)
qsub -I -l select=1:ncpus=1:model=sky_gpu:ngpus=8,walltime=0:30:00 -q devel@pbspl4
  1. Load modules
module load python3 cuda
  1. Go to the sailfish directory and run something
cd sailfish
./scripts/main.py -g kitp-code-comparison

Here is a sample submission script for running on Pleiades (note use of v100@pbspl4 queue).

#PBS -N r10-n2k-e00-q05
#PBS -l select=1:ncpus=1:model=sky_gpu:ngpus=4,walltime=24:00:00
#PBS -q v100@pbspl4

module load python3 cuda

cd $PBS_O_WORKDIR

./scripts/main.py kitp-code-comparison \
    --model which_diagnostics=forces \
      eccentricity=0.0 mass_ratio=0.5 \
      sink_radius=0.03 softening_length=0.03 \
    --new-timestep-cadence=10 --cfl=0.2 --patches=4 -g \
    --resolution=2000 --checkpoint=50.0 --timeseries=0.01 -e 3000 \
    -o data/r10-n2k-e00-q05 > r10-n2k-e00-q05.out

To check on your runs on a GPU queue you also need to specify the queue name for the qstat command, e.g.

qstat v100@pbspl4 -u your-username

Running on Greene (NYU)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Create a Singularity overlay. Install dependency Python modules to the environment when instructed:
pip3 install cupy-cudaXXX # cupy wheel must match the Cuda version of your Singularity image (e.g. cupy-cuda112 for Cuda/11.2)
  1. Write a SLURM job submission script that starts your Singularity image and runs Sailfish; e.g. run.sbatch:
#!/bin/bash

#SBATCH --job-name=my-job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --gres=gpu:4         # gres=gpu:v100:4 to request only v100's
#SBATCH --mem=2GB

module purge

singularity exec --nv \
	    --overlay /scratch/<NetID>/my_singularity/my_example.ext3:ro \
	    /scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \
	    /bin/bash -c \
        "source /ext3/env.sh; python ~/sailfish/scripts/main.py kitp-code-comparison --mode gpu --patches 4"  # patches match cpus-per-task/num gpus
  1. Submit to the queue
sbatch run.sbatch

Running on workstations

The primary difference when running on a workstation is installing the correct version of cupy for your machine.

NVIDIA GPUs

On a workstation running CUDA version 11.7 with an NVIDIA GPU, one would run

pip3 install --user cupy-cuda117 # only needed for running in GPU mode

To check which CUDA version is installed, you can run

nvidia-smi

AMD GPUs

At the moment, cupy only seems to support versions 4.3 and 5.0 of ROCm. You can check which version is installed using

apt show rocm-libs # on a system with the apt package manager
ocm-libs # on a system with the yum package manager

Then, install the version of cupy corresponding to your ROCm version, e.g.

pip3 install --user cupy-rocm-5-0

General cupy install for ROCm runtime (requires modern GCC to handle std::enable_if statements in HIP libs)

$ export HCC_AMDGPU_TARGET=<gfxid> # This is under the Name section when running rocminfo. MI50s give gfx906
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ export ROCM_HOME=/opt/rocm # Typical rocm install location. Might vary from system to system
$ pip install cupy

Note!

the cupy module somehow translates the various cupy.cuda.runtime calls to the respective rocm runtime automatically, so there is no disruption to the current implementation of sailfish even on ROCm systems.

Clone this wiki locally