Skip to content

Commit

Permalink
Merge branch 'branch-24.04' into remove-hardcoded-version
Browse files Browse the repository at this point in the history
  • Loading branch information
KyleFromNVIDIA committed Mar 8, 2024
2 parents 5912019 + 47119c3 commit b9ae2c1
Show file tree
Hide file tree
Showing 178 changed files with 719 additions and 752 deletions.
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Ignore cmake builds from local machine that might have occured before attempting Docker build. Including these files will cause CMake cache conflict issues
/cpp/build
/cpp/build
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
issues:
types:
- opened

pull_request_target:
types:
- opened
Expand Down
4 changes: 3 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@
exclude: '^thirdparty'
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/cugraph-dgl/pytest-based/README.MD
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Run Benchmarks
## Run Benchmarks

#### SG
#### SG
```
pytest bench_cugraph_dgl_uniform_neighbor_sample.py -k "SG and fanout_10_25 and rmat_24_4" --benchmark-save='1_rmat_24_4.json'
```
#### MG
#### MG

```
DASK_NUM_WORKERS=2 pytest bench_cugraph_dgl_uniform_neighbor_sample.py -k "MG and fanout_10_25 and rmat_24_16" --benchmark-save='2_rmat_24_8.json'
Expand Down
16 changes: 8 additions & 8 deletions benchmarks/cugraph/standalone/bulk_sampling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,21 @@ Required:
the samples will be written to a new folder in /home/samples that
contains information about the sampling run as well as the time
of the run.

--dataset_root
The folder where datasets are stored. Uses the format described
in the input format section.

--datasets
Comma-separated list of datasets; can specify ogb or rmat (i.e. ogb_papers100M[2],rmat_22_16).
For ogb datasets, can provide replication factor using brackets.
Will attempt to read from dataset_root/<datset_name>.

Optional:
--fanouts
Comma-separated list of fanout values (i.e. [10, 25]).
The default fanout is [10, 25].

--batch_sizes
Comma-separated list of batch sizes (i.e. 500, 1000).
Defaults to "512,1024"
Expand All @@ -39,7 +39,7 @@ Optional:
Comma-separated list of seeds per call. Controls the number of input seed vertices processed
in a single sampling call.
Defaults to 524288

--reverse_edges
Whether to reverse the edges of the input edgelist. Should be set to False for PyG and True for DGL.
Defaults to False (PyG).
Expand All @@ -52,8 +52,8 @@ Optional:
--random_seed
Seed for random number generation.
Defaults to '62'


### Input Format
The script expects its input data in the following format:
```
Expand Down Expand Up @@ -159,4 +159,4 @@ GPUs per node is currently unsupported by this script but should be possible in

### Output
The results of training will be outputted to the logs directory with an `output.txt` file for each worker.
These will be overwritten upon each run. Accuracy is only reported on rank 0.
These will be overwritten upon each run. Accuracy is only reported on rank 0.
6 changes: 3 additions & 3 deletions benchmarks/cugraph/standalone/bulk_sampling/run_sampling.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ handleTimeout 120 python ${MG_UTILS_DIR}/wait_for_workers.py \

DASK_STARTUP_ERRORCODE=$LAST_EXITCODE

echo $SLURM_NODEID
echo $SLURM_NODEID
if [[ $SLURM_NODEID == 0 ]]; then
echo "Launching Python Script"
python ${SCRIPTS_DIR}/cugraph_bulk_sampling.py \
Expand All @@ -78,7 +78,7 @@ if [[ $SLURM_NODEID == 0 ]]; then
--batch_sizes $BATCH_SIZE \
--seeds_per_call_opts "524288" \
--num_epochs $NUM_EPOCHS \
--random_seed 42
--random_seed 42

echo "DONE" > ${SAMPLES_DIR}/status.txt
fi
Expand Down Expand Up @@ -108,4 +108,4 @@ sleep 2

if [[ $SLURM_NODEID == 0 ]]; then
rm ${SAMPLES_DIR}/status.txt
fi
fi
3 changes: 1 addition & 2 deletions benchmarks/cugraph/standalone/bulk_sampling/run_train_job.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#SBATCH -p luna
#SBATCH -J datascience_rapids_cugraphgnn-papers:bulkSamplingPyG
#SBATCH -N 1
#SBATCH -t 00:25:00
#SBATCH -t 00:25:00

CONTAINER_IMAGE=${CONTAINER_IMAGE:="please_specify_container"}
SCRIPTS_DIR=$(pwd)
Expand Down Expand Up @@ -81,4 +81,3 @@ srun \
--fanout $FANOUT \
--replication_factor $REPLICATION_FACTOR \
--num_epochs $NUM_EPOCHS

2 changes: 1 addition & 1 deletion benchmarks/dgl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ pytest dgl_benchmark.py::bench_dgl_pure_gpu
## For UVA Benchmarks
```
pytest dgl_benchmark.py::bench_dgl_uva
```
```
10 changes: 5 additions & 5 deletions benchmarks/shared/build_cugraph_ucx/README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ docker build -f cugraph_ucx.dockerfile . -t cugraph_ucx
docker run --privileged -it --gpus=all --net=host cugraph_ucx /bin/bash

#### Client Bandwidth Test
python3 test_client_bandwidth.py
python3 test_client_bandwidth.py

```bash
(base) root@exp02:/home# python3 test_client_bandwidth.py
(base) root@exp02:/home# python3 test_client_bandwidth.py
2022-12-19 13:31:30,867 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-12-19 13:31:30,867 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-12-19 13:31:30,891 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
Expand All @@ -30,8 +30,8 @@ Bandwidth = 5.2037 gb/s
#### Sampling Test
python3 test_cugraph_sampling.py
```bash
test_client_bandwidth.py test_cugraph_sampling.py
(base) root@exp02:/home# python3 test_cugraph_sampling.py
test_client_bandwidth.py test_cugraph_sampling.py
(base) root@exp02:/home# python3 test_cugraph_sampling.py
[1671456769.722931] [exp02:93 :0] parser.c:1989 UCX WARN unused environment variable: UCX_MEMTYPE_CACHE (maybe: UCX_MEMTYPE_CACHE?)
[1671456769.722931] [exp02:93 :0] parser.c:1989 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
2022-12-19 13:32:56,228 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
Expand All @@ -54,4 +54,4 @@ Sampling 1,000 took = 69.15879249572754 ms
Sampling 10,000 took = 89.63620662689209 ms
Sampling 100,000 took = 135.9888792037964 ms
----------------------------------------Completed Test----------------------------------------
```
```
4 changes: 2 additions & 2 deletions benchmarks/shared/build_cugraph_ucx/build-ucx.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
set -ex

Expand All @@ -16,4 +16,4 @@ mkdir build-linux && cd build-linux
--enable-mt --enable-numa --with-gnu-ld --with-rdmacm --with-verbs \
--with-cuda=${CUDA_HOME} \
${CONFIGURE_ARGS}
make -j install
make -j install
2 changes: 1 addition & 1 deletion benchmarks/shared/build_cugraph_ucx/cugraph_ucx.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ RUN gpuci_mamba_retry install -y -c pytorch -c rapidsai-nightly -c rapidsai -c c
tqdm


# Build ucx from source with IB support
# Build ucx from source with IB support
# on 1.14.x
RUN conda remove --force -y ucx ucx-proc

Expand Down
4 changes: 2 additions & 2 deletions ci/test.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand Down Expand Up @@ -105,7 +105,7 @@ if hasArg "--run-python-tests"; then
# rmat is not tested because of MG testing
pytest --cache-clear --junitxml=${CUGRAPH_ROOT}/junit-cugraph-pytests.xml -v --cov-config=.coveragerc --cov=cugraph_pyg --cov-report=xml:${WORKSPACE}/python/cugraph_pyg/cugraph-coverage.xml --cov-report term --ignore=raft --ignore=tests/mg --ignore=tests/int --ignore=tests/generators --benchmark-disable
echo "Ran Python pytest for cugraph_pyg : return code was: $?, test script exit code is now: $EXITCODE"

echo "Python pytest for cugraph-service (single-GPU only)..."
cd ${CUGRAPH_ROOT}/python/cugraph-service
pytest -sv --cache-clear --junitxml=${CUGRAPH_ROOT}/junit-cugraph-service-pytests.xml --benchmark-disable -k "not mg" ./tests
Expand Down
6 changes: 1 addition & 5 deletions cpp/cmake/thirdparty/get_nccl.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#=============================================================================
# Copyright (c) 2021, NVIDIA CORPORATION.
# Copyright (c) 2021-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -32,7 +32,3 @@ function(find_and_configure_nccl)
endfunction()

find_and_configure_nccl()




Loading

0 comments on commit b9ae2c1

Please sign in to comment.