Skip to content

Commit

Permalink
changes
Browse files Browse the repository at this point in the history
  • Loading branch information
xiyang-aads-lilly committed Aug 2, 2024
1 parent cff21cd commit 6e72b98
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 7 deletions.
12 changes: 5 additions & 7 deletions experiments/demo_magtrain_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#SBATCH --job-name=llm_sft
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --nodes=4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=4
#SBATCH --gpus-per-task=4
Expand All @@ -12,8 +12,6 @@
#SBATCH --time=48:00:00
#SBATCH --output=/home/l069561/project/log/alignment/sft_%j.out
#SBATCH --partition=batch
##SBATCH --exclusive
##SBATCH --reservation=gatortrongpt

HOME=/home/l069561
SCRIPTPATH=${HOME}/project/alignment-handbook/experiments
Expand All @@ -26,8 +24,8 @@ source ${SCRIPTPATH}/util.sh

CONTAINER=${HOME}/container/pt2402.sif

srun --jobid $SLURM_JOB_ID apptainer exec -B $SLURM_TMPDIR:/cache --nv $CONTAINER bash ${SCRIPTPATH}/demo_magtrain_llm_sft.sh
# srun --jobid $SLURM_JOB_ID apptainer exec -B $SLURM_TMPDIR:/cache --nv $CONTAINER bash ${SCRIPTPATH}/demo_magtrain_llm_sft.sh

# NSYS=nsys profile -t cuda,nvtx -o /cache/nsys
# srun --jobid $SLURM_JOB_ID apptainer exec -B $SLURM_TMPDIR:/cache --nv $CONTAINER ${NSYS} bash ${SCRIPTPATH}/demo_magtrain_llm_sft.sh
# cp $SLURM_TMPDIR/nsys-rep /home/l069561/project/log/
# use nsys to profile training process
srun --jobid $SLURM_JOB_ID apptainer exec -B $SLURM_TMPDIR:/cache --nv $CONTAINER nsys profile -t cuda,nvtx -o /cache/nsys_${SLURM_JOB_ID} bash ${SCRIPTPATH}/demo_magtrain_llm_sft.sh
cp $SLURM_TMPDIR/nsys_${SLURM_JOB_ID}.nsys-rep ${HOME}/project/log/nsys/
Empty file.
Empty file.

0 comments on commit 6e72b98

Please sign in to comment.