Skip to content

Commit

Permalink
fixes to scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
alexbarghi-nv committed Jan 8, 2024
1 parent 61f30a2 commit 89ac530
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 15 deletions.
2 changes: 1 addition & 1 deletion benchmarks/cugraph/standalone/bulk_sampling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ You will need to modify the bash scripts to run appopriately for your environmen
desired training workflow. The standard sbatch arguments are at the top of the script, such as
job name, queue, etc. These will need to be modified for your SLURM cluster.

Next are arguments for the container image (which is currently set to the current DLFW image),
Next are arguments for the container image (required),
and directories where the data and outputs are stored. The directories default to subdirectories
of the current working directory. But if there is a high-throughput storage system available,
using that storage for the samples and datasets is highly recommended.
Expand Down
13 changes: 0 additions & 13 deletions benchmarks/cugraph/standalone/bulk_sampling/run_sampling.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ export CUDF_SPILL=1
export LIBCUDF_CUFILE_POLICY="OFF"
export GPUS_PER_NODE=8

PATCH_CUGRAPH=1

export SCHEDULER_FILE=$SCHEDULER_FILE
export LOGS_DIR=$LOGS_DIR

Expand All @@ -60,17 +58,6 @@ else
${MG_UTILS_DIR}/run-dask-process.sh workers &
fi

if [[ $PATCH_CUGRAPH == 1 ]]; then
mkdir /opt/cugraph-patch
git clone https://github.com/alexbarghi-nv/cugraph -b dlfw-patch-24.01 /opt/cugraph-patch

rm /opt/rapids/cugraph/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py
cp /opt/cugraph-patch/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py /opt/rapids/cugraph/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py
rm /usr/local/lib/python3.10/dist-packages/cugraph/structure/graph_implementation/simpleDistributedGraph.py
cp /opt/cugraph-patch/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py /usr/local/lib/python3.10/dist-packages/cugraph/structure/graph_implementation/simpleDistributedGraph.py

fi

echo "properly waiting for workers to connect"
NUM_GPUS=$(python -c "import os; print(int(os.environ['SLURM_JOB_NUM_NODES'])*int(os.environ['GPUS_PER_NODE']))")
handleTimeout 120 python ${MG_UTILS_DIR}/wait_for_workers.py \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#SBATCH -N 1
#SBATCH -t 00:25:00

CONTAINER_IMAGE="/lustre/fsw/rapids/abarghi/dlfw_patched.squash"
CONTAINER_IMAGE=${CONTAINER_IMAGE:="please_specify_container"}
SCRIPTS_DIR=$(pwd)
LOGS_DIR=${LOGS_DIR:=$(pwd)"/logs"}
SAMPLES_DIR=${SAMPLES_DIR:=$(pwd)/samples}
Expand Down

0 comments on commit 89ac530

Please sign in to comment.