Skip to content
jzrake edited this page Dec 2, 2022 · 20 revisions

Running on Palmetto (Clemson)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Install dependency Python modules
pip3 install --user cffi # only needed for running in CPU mode
pip3 install --user cupy-cuda111 # only needed for running in GPU mode
  1. Get an interactive compute node
qsub -I -l select=1:ncpus=56:mem=200gb:ngpus=2:gpu_model=a100,walltime=24:00:00
  1. Load the CUDA module, and specify a reductions accelerator (see documentation):
module load cuda/11.1.1-gcc/9.5.0
export CUPY_ACCELERATORS=cub
  1. Go to the sailfish directory and run something
cd sailfish
bin/sailfish -g kitp-code-comparison

To redirect your data outputs to the zfs filesystem, add the flag -o /zfs/warpgate/your-username.

Note: Palmetto has Python 3.6.8 installed to a system location. However sailfish might soon require Python 3.7 or higher. If so we will need to use Anaconda (which comes with a Python 3.9.12), unless the palmetto admins install a newer vanilla Python. Under Anaconda, the modules and pip-installs would be:

module load anaconda3/2022.05-gcc/9.5.0
module load cuda/11.6.2-gcc/9.5.0
pip3 install --user cupy-cuda116

There is no need to pip-install cffi since Anaconda includes one that will work.

Running on Pleiades (NASA)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Install dependency Python modules

Pleiades has a good installation of Python 3.9.5 that already has cffi. They are on CUDA 11.0; the matching cupy version is cupy-cuda110.

module load python3
pip3 install --upgrade --user cupy-cuda110 numpy loguru

Note: the final two dependencies are only needed for work on the experimental sailfish v0.6. NumPy needs to be upgraded from the system 1.20 to the newest 1.23, and sailfish v0.6 depends on the loguru module for logging.

  1. Get an interactive compute node (below will give you one of the nodes with 8 NVIDIA V100's)
qsub -I -l select=1:ncpus=1:model=sky_gpu:ngpus=8,walltime=0:30:00 -q devel@pbspl4
  1. Load modules
module load python3 cuda
  1. Go to the sailfish directory and run something
cd sailfish
bin/sailfish -g kitp-code-comparison

Here is a sample submission script for running on Pleiades (note use of v100@pbspl4 queue).

#PBS -N r10-n2k-e00-q05
#PBS -l select=1:ncpus=1:model=sky_gpu:ngpus=4,walltime=24:00:00
#PBS -q v100@pbspl4

module load python3 cuda

cd $PBS_O_WORKDIR

bin/sailfish kitp-code-comparison \
    --model which_diagnostics=forces \
      eccentricity=0.0 mass_ratio=0.5 \
      sink_radius=0.03 softening_length=0.03 \
    --new-timestep-cadence=10 --cfl=0.2 --patches=4 -g \
    --resolution=2000 --checkpoint=50.0 --timeseries=0.01 -e 3000 \
    -o data/r10-n2k-e00-q05 > r10-n2k-e00-q05.out

To check on your runs on a GPU queue you also need to specify the queue name for the qstat command, e.g.

qstat v100@pbspl4 -u your-username

Running on Greene (NYU)

  1. Checkout sailfish
git clone [email protected]:clemson-cal/sailfish.git
  1. Create a Singularity overlay. Install dependency Python modules to the environment when instructed:
pip3 install cupy-cudaXXX # cupy wheel must match the Cuda version of your Singularity image (e.g. cupy-cuda112 for Cuda/11.2)
  1. Write a SLURM job submission script that starts your Singularity image and runs Sailfish; e.g. run.sbatch:
#!/bin/bash

#SBATCH --job-name=my-job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=48:00:00
#SBATCH --gres=gpu:4         # gres=gpu:v100:4 to request only v100's
#SBATCH --mem=2GB

module purge

singularity exec --nv \
  --overlay /scratch/<NetID>/my_singularity/my_example.ext3:ro \
	    /scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \
	    /bin/bash -c \
  "source /ext3/env.sh; python ~/sailfish/bin/sailfish kitp-code-comparison --mode=gpu --patches=4"

# None: in GPU mode (only), choose --patches=<num GPUS>
  1. Submit to the queue
sbatch run.sbatch

Running on ALICE (Leiden University)

  1. Log into the alice1 or alice2 login nodes
  2. Load Python module
module load Python/3.9.5-GCCcore-10.3.0
  1. Create a virtual environment (e.g. named venv_sailfish) and activate it
python -m venv venv_sailfish
source venv_sailfish/bin/activate
  1. Install dependencies
pip install --upgrade pip
pip install --upgrade setuptools
pip install wheel
pip install numpy
pip install cffi # if you want to run on CPUs
pip install cupy # if you want to run on GPUS
pip install matplotlib # if you want to run the plotting script on the cluster
  1. Checkout sailfish
git clone https://github.com/clemson-cal/sailfish.git
  1. Here is an example job submission script
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --mail-user="[email protected]"
#SBATCH --mail-type="ALL"
#SBATCH --time=00:01:00
#SBATCH --partition=gpu-short
#SBATCH --output=example%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=150gb
#SBATCH --gres=gpu:1

cd /home/your-username/
module load Python/3.9.5-GCCcore-10.3.0
source venv_sailfish/bin/activate
export CUPY_ACCELERATORS=cub
nvidia-smi

cat $PBS_NODEFILE | uniq
cd sailfish
bin/sailfish circumbinary-disk -g -c1.0 -e1.0 -o /home/westernacherjrws/data1 > example_output_file

Running on workstations

The primary difference when running on a workstation is installing the correct version of cupy for your machine.

NVIDIA GPUs

On a workstation running CUDA version 11.7 with an NVIDIA GPU, one would run

pip3 install --user cupy-cuda117 # only needed for running in GPU mode

To check which CUDA version is installed, you can run

nvidia-smi

AMD GPUs

At the moment, cupy only seems to support versions 4.3 and 5.0 of ROCm. You can check which version is installed using

apt show rocm-libs # on a system with the apt package manager
yum info rocm-libs # on a system with the yum package manager

Then, install the version of cupy corresponding to your ROCm version, e.g.

pip3 install --user cupy-rocm-5-0

General cupy install for ROCm runtime (requires modern GCC to handle std::enable_if statements in HIP libs)

export HCC_AMDGPU_TARGET=<gfxid> # this is under the Name section when running rocminfo; MI50s give gfx906
export __HIP_PLATFORM_HCC__
export CUPY_INSTALL_USE_HIP=1
export ROCM_HOME=/opt/rocm # typical rocm install location, could vary from system to system
pip3 install cupy

Note: On ROCm, the cupy module redirects cupy.cuda.runtime calls to the ROCm runtime automatically, so pieces of sailfish source code that seem specialized to the CUDA runtime are actually agnostic to the hardware.