Home

Running on Palmetto (Clemson)

Checkout sailfish

git clone [email protected]:clemson-cal/sailfish.git

Install dependency Python modules

pip3 install --user cffi # only needed for running in CPU mode
pip3 install --user cupy-cuda111 # only needed for running in GPU mode

Get an interactive compute node

qsub -I -l select=1:ncpus=56:mem=200gb:ngpus=2:gpu_model=a100,walltime=24:00:00

Load the cuda module

module load cuda/11.1.1-gcc/9.5.0

Go to the sailfish directory and run something

cd sailfish
./scripts/main.py -g kitp-code-comparison

To redirect your data outputs to the zfs filesystem, add the flag -o /zfs/warpgate/your-username.

Running on Pleiades (NASA)

Checkout sailfish

git clone [email protected]:clemson-cal/sailfish.git

Install dependency Python modules

Pleiades has a good installation of Python 3.9.5 that already has cffi. They are on CUDA 11.0; the matching cupy version is cupy-cuda110.

module load python3
pip3 install cupy-cuda110

Get an interactive compute node (below will give you one of the nodes with 8 NVIDIA V100's)

qsub -I -l select=1:ncpus=1:model=sky_gpu:ngpus=8,walltime=0:30:00 -q devel@pbspl4

Load modules

module load python3 cuda

Go to the sailfish directory and run something

cd sailfish
./scripts/main.py -g kitp-code-comparison

Here is a sample submission script for running on Pleiades (note use of v100@pbspl4 queue).

#PBS -N r10-n2k-e00-q05
#PBS -l select=1:ncpus=1:model=sky_gpu:ngpus=4,walltime=24:00:00
#PBS -q v100@pbspl4

module load python3 cuda

cd $PBS_O_WORKDIR

./scripts/main.py kitp-code-comparison \
    --model which_diagnostics=forces \
      eccentricity=0.0 mass_ratio=0.5 \
      sink_radius=0.03 softening_length=0.03 \
    --new-timestep-cadence=10 --cfl=0.2 --patches=4 -g \
    --resolution=2000 --checkpoint=50.0 --timeseries=0.01 -e 3000 \
    -o data/r10-n2k-e00-q05 > r10-n2k-e00-q05.out

To check on your runs on a GPU queue you also need to specify the queue name for the qstat command, e.g.

qstat v100@pbspl4 -u your-username

Running on Greene (NYU)

Checkout sailfish

git clone [email protected]:clemson-cal/sailfish.git

Create a Singularity overlay. Install dependency Python modules to the environment when instructed:

pip3 install cupy-cudaXXX # cupy wheel must match the Cuda version of your Singularity image (e.g. cupy-cuda112 for Cuda/11.2)

Write a SLURM job submission script that starts your Singularity image and runs Sailfish; e.g. run.sbatch:

#!/bin/bash

#SBATCH --job-name=my-job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --gres=gpu:4         # gres=gpu:v100:4 to request only v100's
#SBATCH --mem=2GB

module purge

singularity exec --nv \
	    --overlay /scratch/<NetID>/my_singularity/my_example.ext3:ro \
	    /scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \
	    /bin/bash -c \
        "source /ext3/env.sh; python ~/sailfish/scripts/main.py kitp-code-comparison --mode gpu --patches 4"  # patches match cpus-per-task/num gpus

Submit to the queue

sbatch run.sbatch

Running on workstations

The primary difference when running on a workstation is installing the correct version of cupy for your machine.

NVIDIA GPUs

On a workstation running CUDA version 11.7 with an NVIDIA GPU, one would run

pip3 install --user cupy-cuda117 # only needed for running in GPU mode

To check which CUDA version is installed, you can run

nvidia-smi

AMD GPUs

At the moment, cupy only seems to support versions 4.3 and 5.0 of ROCm. You can check which version is installed using

apt show rocm-libs # on a system with the apt package manager
ocm-libs # on a system with the yum package manager

Then, install the version of cupy corresponding to your ROCm version, e.g.

pip3 install --user cupy-rocm-5-0

General cupy install for ROCm runtime (requires modern GCC to handle std::enable_if statements in HIP libs)

$ export HCC_AMDGPU_TARGET=<gfxid> # This is under the Name section when running rocminfo. MI50s give gfx906
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ export ROCM_HOME=/opt/rocm # Typical rocm install location. Might vary from system to system
$ pip install cupy

Note!

the cupy module somehow translates the various cupy.cuda.runtime calls to the respective rocm runtime automatically, so there is no disruption to the current implementation of sailfish even on ROCm systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly