Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuGraph-DGL and WholeGraph Performance Testing with Feature Store Performance Improvements #4081

Merged
merged 554 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
554 commits
Select commit Hold shift + click to select a range
e9b39e4
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 1, 2023
d99b512
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 1, 2023
c15d580
the c api
alexbarghi-nv Sep 1, 2023
2ac8b86
work
alexbarghi-nv Sep 1, 2023
9135629
fix compile errors
alexbarghi-nv Sep 1, 2023
dfd1cb7
reformat
alexbarghi-nv Sep 1, 2023
6dfd4fe
rename test file from .cu to .cpp
seunghwak Sep 5, 2023
f600520
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 6, 2023
7d5821f
bug fixes
seunghwak Sep 6, 2023
58189ed
add fill wrapper
seunghwak Sep 6, 2023
39db98a
undo adding fill wrapper
seunghwak Sep 6, 2023
98c8e0a
sampling test from .cpp to .cu
seunghwak Sep 6, 2023
687d191
latest perf testing
alexbarghi-nv Sep 7, 2023
c151f95
fix a typo
seunghwak Sep 7, 2023
fc5a4f0
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 7, 2023
a7d1804
merge
alexbarghi-nv Sep 7, 2023
3cda233
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 7, 2023
0a18cde
do merge
alexbarghi-nv Sep 7, 2023
094aaf9
do not return valid nzd vertices if doubly_compress is false
seunghwak Sep 7, 2023
cf57a6d
bug fix
seunghwak Sep 8, 2023
2b48b7e
test code
seunghwak Sep 8, 2023
79acc8e
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 8, 2023
11009c6
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 8, 2023
0481bfb
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 8, 2023
2af9333
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 8, 2023
23cd2c2
bug fix
seunghwak Sep 8, 2023
6eaf67e
update documentation
seunghwak Sep 8, 2023
4dc0a92
fix c api issues
alexbarghi-nv Sep 11, 2023
2947b33
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 11, 2023
0a2b2b7
C API fixes, Python/PLC API work
alexbarghi-nv Sep 11, 2023
db35940
adjust hop offsets when there is a jump in major vertex IDs between hops
seunghwak Sep 11, 2023
b8b72be
add sort only function
seunghwak Sep 12, 2023
38dd11e
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 12, 2023
2a799a6
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 12, 2023
c86ceac
various improvements
alexbarghi-nv Sep 12, 2023
37a37bf
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 12, 2023
002fe93
fix merge conflict
alexbarghi-nv Sep 19, 2023
5051dfc
fix bad merge
alexbarghi-nv Sep 19, 2023
6cdf92b
asdf
alexbarghi-nv Sep 19, 2023
6682cb4
clarifying comments
alexbarghi-nv Sep 19, 2023
0d12a28
t
alexbarghi-nv Sep 19, 2023
f5733f2
latest code
alexbarghi-nv Sep 19, 2023
52e2f57
bug fix
seunghwak Sep 19, 2023
befeb25
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into bug_o…
seunghwak Sep 19, 2023
8781612
additional bug fix
seunghwak Sep 19, 2023
f92b5f5
add additional checking to detect the previously neglected bugs
seunghwak Sep 19, 2023
2bd93d9
Merge branch 'bug_offsets' of https://github.com/seunghwak/cugraph in…
alexbarghi-nv Sep 19, 2023
3195298
wrap up sg API
alexbarghi-nv Sep 20, 2023
74195cb
test fix, cleanup
alexbarghi-nv Sep 20, 2023
374b103
refactor code into new shared utility
alexbarghi-nv Sep 20, 2023
bd625e3
get mg api working
alexbarghi-nv Sep 20, 2023
b2a4ed1
add offset mg test
alexbarghi-nv Sep 20, 2023
9fb7438
fix renumber map issue in C++
alexbarghi-nv Sep 20, 2023
c770a17
verify new compression formats for sg
alexbarghi-nv Sep 20, 2023
b569563
complete csr/csc tests for both sg/mg
alexbarghi-nv Sep 20, 2023
ab2a185
get the bulk sampler working again
alexbarghi-nv Sep 20, 2023
89a1b33
remove unwanted file
alexbarghi-nv Sep 20, 2023
a9d46ef
fix wrong dataframe issue
alexbarghi-nv Sep 21, 2023
17e9013
update sg bulk sampler tests
alexbarghi-nv Sep 21, 2023
c5543b2
fix mg bulk sampler tests
alexbarghi-nv Sep 21, 2023
6581f47
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 21, 2023
16e83bc
write draft of csr bulk sampler
alexbarghi-nv Sep 21, 2023
1e7098d
overhaul the writer methods
alexbarghi-nv Sep 22, 2023
ae94c35
remove unused method
alexbarghi-nv Sep 22, 2023
7beba4b
style
alexbarghi-nv Sep 22, 2023
16ed5ef
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 22, 2023
79e3cef
remove notebook
alexbarghi-nv Sep 22, 2023
fd5cceb
add clarifying comment to c++
alexbarghi-nv Sep 22, 2023
a47691d
add future warnings
alexbarghi-nv Sep 22, 2023
195d063
cleanup
alexbarghi-nv Sep 22, 2023
0af1750
remove print statements
alexbarghi-nv Sep 22, 2023
d65632c
fix c api bug
alexbarghi-nv Sep 22, 2023
247d8d2
revert dataloader change
alexbarghi-nv Sep 22, 2023
72bebc2
fix empty df bug
alexbarghi-nv Sep 22, 2023
4d51751
style
alexbarghi-nv Sep 22, 2023
9dfa3fa
io
alexbarghi-nv Sep 22, 2023
10c8c1f
fix test failures, remove c++ compression enum
alexbarghi-nv Sep 23, 2023
08cf3e1
remove removed api from mg tests
alexbarghi-nv Sep 23, 2023
897e6d6
change to future warning
alexbarghi-nv Sep 23, 2023
bb5e621
resolve checking issues
alexbarghi-nv Sep 23, 2023
d20e593
Merge branch 'cugraph-pyg-loader-improvements' into cugraph-pyg-mfg
alexbarghi-nv Sep 23, 2023
eb3aadc
fix wrong index + off by 1 error, add check in test
alexbarghi-nv Sep 25, 2023
a124964
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 25, 2023
6990c23
add annotations
alexbarghi-nv Sep 25, 2023
920bed7
docstring correction
alexbarghi-nv Sep 25, 2023
f8df56f
remove empty batch check
alexbarghi-nv Sep 25, 2023
ef2ec5b
fix capi sg test
alexbarghi-nv Sep 25, 2023
8e22ab9
disable broken tests, they are too expensive to fix and redundant
alexbarghi-nv Sep 25, 2023
13bdd43
Merge branch 'cugraph-sample-convert' of https://github.com/alexbargh…
alexbarghi-nv Sep 25, 2023
c48a14b
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 25, 2023
cf612c7
update c code
alexbarghi-nv Sep 25, 2023
09a3bd8
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 26, 2023
140b6e4
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 27, 2023
e4544b6
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 27, 2023
0ee3798
Resolve merge conflict
alexbarghi-nv Sep 27, 2023
6212869
fix bad merge
alexbarghi-nv Sep 27, 2023
0f1a144
initial rewrite
alexbarghi-nv Sep 27, 2023
b369e97
fixes, more testing
alexbarghi-nv Sep 27, 2023
13be49c
fix issue with num nodes and edges
alexbarghi-nv Sep 27, 2023
185143c
e2e smoke test
alexbarghi-nv Sep 28, 2023
99efb9c
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 28, 2023
bc1f30b
Merge branch 'cugraph-sample-convert' into perf-testing-v2
alexbarghi-nv Sep 28, 2023
9ea6c6b
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 28, 2023
a127643
Merge branch 'cugraph-pyg-mfg' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Sep 28, 2023
262d1da
fix test column name issues
alexbarghi-nv Sep 29, 2023
7a05c10
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 29, 2023
c440f64
resolve merge conflicts
alexbarghi-nv Sep 29, 2023
d0d0cb2
copyright
alexbarghi-nv Sep 29, 2023
b4e6d06
testing
alexbarghi-nv Sep 29, 2023
20f138c
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Sep 29, 2023
7e770ad
debugging
alexbarghi-nv Sep 29, 2023
4ac962d
perf testing
alexbarghi-nv Oct 2, 2023
55b4e84
regex
alexbarghi-nv Nov 15, 2023
0fd367a
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Nov 15, 2023
894831e
update to latest
alexbarghi-nv Nov 15, 2023
3cad3f2
fixes
alexbarghi-nv Nov 15, 2023
912d6ca
node loader
alexbarghi-nv Nov 29, 2023
ea60f94
Merge branch 'branch-23.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Nov 29, 2023
9972619
finish patch
alexbarghi-nv Nov 29, 2023
1c401d1
merge latest
alexbarghi-nv Dec 1, 2023
02c7210
bulk sampling
alexbarghi-nv Dec 1, 2023
b67d5ed
perf testing
alexbarghi-nv Dec 5, 2023
da389e0
minor fixes
alexbarghi-nv Dec 6, 2023
e29b4e8
get the native workflow working
alexbarghi-nv Dec 6, 2023
d358257
wrap up first version of cugraph trainer
alexbarghi-nv Dec 7, 2023
e08c46c
remove stats file
alexbarghi-nv Dec 7, 2023
a9fc5af
Fixes
alexbarghi-nv Dec 8, 2023
49094db
x
alexbarghi-nv Dec 12, 2023
b8e2354
output multiple epochs, train/test/val
alexbarghi-nv Dec 12, 2023
0fd156b
remove unwanted file
alexbarghi-nv Dec 12, 2023
663febe
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Dec 12, 2023
2a3ee5a
revert file
alexbarghi-nv Dec 12, 2023
b424e7c
remove unwanted file
alexbarghi-nv Dec 12, 2023
b727fcb
remove cmake files
alexbarghi-nv Dec 12, 2023
d37f0d7
train/test
alexbarghi-nv Dec 12, 2023
d0ca16b
reformat
alexbarghi-nv Dec 12, 2023
06dc14d
add scripts
alexbarghi-nv Dec 13, 2023
a5f1b67
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Dec 13, 2023
ad83725
reorganize, add scripts
alexbarghi-nv Dec 13, 2023
e3d28a6
init
alexbarghi-nv Dec 13, 2023
d15a4d4
update
alexbarghi-nv Dec 14, 2023
70a509a
Merge branch 'pyg-nightly-input-nodes-fix' of https://github.com/alex…
alexbarghi-nv Dec 14, 2023
ecc2db1
cugraph
alexbarghi-nv Dec 26, 2023
726c81d
loader debug
alexbarghi-nv Dec 26, 2023
c095769
fix small bugs in cugraph-pyg
alexbarghi-nv Dec 26, 2023
4be1875
c
alexbarghi-nv Dec 26, 2023
59f030d
fix fanout issues
alexbarghi-nv Dec 26, 2023
4bc7f90
remove experimental warnings
alexbarghi-nv Dec 27, 2023
a58d358
remove test files
alexbarghi-nv Dec 27, 2023
318212d
data preprocessing
alexbarghi-nv Dec 27, 2023
68ca511
commit
alexbarghi-nv Dec 27, 2023
dbbd791
Merge branch 'dlfw-patch-24.01' of https://github.com/alexbarghi-nv/c…
alexbarghi-nv Dec 27, 2023
d47c3ba
comment
alexbarghi-nv Dec 27, 2023
367c79c
fixing issues impacting accuracy
alexbarghi-nv Dec 29, 2023
ac1cfbd
add readme
alexbarghi-nv Dec 29, 2023
cc2635b
refactor
alexbarghi-nv Dec 29, 2023
f1ce3e1
Fix mixed experimental import
alexbarghi-nv Dec 29, 2023
e38fe66
update readme
alexbarghi-nv Dec 29, 2023
f3f68bd
update readme
alexbarghi-nv Dec 29, 2023
d2734c4
fix environment variables
alexbarghi-nv Dec 29, 2023
7222cba
remove unwanted file
alexbarghi-nv Dec 29, 2023
c2e8520
minor change to avoid timeout
alexbarghi-nv Dec 29, 2023
a4dad32
remove stats file
alexbarghi-nv Jan 3, 2024
2109bfb
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Jan 3, 2024
6358f9b
switch versions of simple distributed graph for 24.02
alexbarghi-nv Jan 3, 2024
3898cb2
remove test python file
alexbarghi-nv Jan 3, 2024
3f266f5
remove mg utils dir
alexbarghi-nv Jan 3, 2024
864e55e
wait for workers
alexbarghi-nv Jan 3, 2024
67d6aa0
reformat
alexbarghi-nv Jan 3, 2024
78fc260
add copyrights
alexbarghi-nv Jan 3, 2024
d81a9a8
fix wrong file
alexbarghi-nv Jan 3, 2024
16f225a
remove stats file
alexbarghi-nv Jan 3, 2024
2189eb3
wg option
alexbarghi-nv Jan 3, 2024
b6b939f
add ability to construct fro wg embedding
alexbarghi-nv Jan 3, 2024
4366f9f
wholegraph
alexbarghi-nv Jan 5, 2024
4a5712d
generators
alexbarghi-nv Jan 5, 2024
259ec47
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 5, 2024
afb000c
style
alexbarghi-nv Jan 5, 2024
cff6cdf
reformat
alexbarghi-nv Jan 5, 2024
18571fe
fix copyright
alexbarghi-nv Jan 5, 2024
40502de
split off feature transfer time
alexbarghi-nv Jan 5, 2024
ea46748
style
alexbarghi-nv Jan 5, 2024
61f30a2
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 5, 2024
89ac530
fixes to scripts
alexbarghi-nv Jan 8, 2024
77b0788
compatibility issues
alexbarghi-nv Jan 8, 2024
e1b6651
perf testing dgl
alexbarghi-nv Jan 8, 2024
4e2a706
reset file
alexbarghi-nv Jan 8, 2024
18e43de
c
alexbarghi-nv Jan 8, 2024
c4c45db
copyright
alexbarghi-nv Jan 8, 2024
8ea5c92
whitespace
alexbarghi-nv Jan 8, 2024
379f498
Merge branch 'perf-testing-v2' into perf-testing-dgl
alexbarghi-nv Jan 9, 2024
3df7f82
update shell scripts for dgl
alexbarghi-nv Jan 9, 2024
f42507e
bugfixes
alexbarghi-nv Jan 16, 2024
8b3c694
resolve merge
alexbarghi-nv Jan 16, 2024
24f6cdb
debugging dgl hang
alexbarghi-nv Jan 17, 2024
5e5094a
fix merge conflicts
alexbarghi-nv Jan 18, 2024
fb573b3
fix get_accuracy_bug
VibhuJawa Jan 18, 2024
c7ff84e
Add print
VibhuJawa Jan 18, 2024
86b220f
Merge pull request #7 from VibhuJawa/perf-testing-dgl
alexbarghi-nv Jan 19, 2024
536bc5c
fixes
alexbarghi-nv Jan 19, 2024
d23576d
Merge branch 'branch-24.02' into perf-testing-dgl
alexbarghi-nv Jan 22, 2024
133b0d5
final fixes
alexbarghi-nv Jan 22, 2024
62e5eeb
remove unwanted files
alexbarghi-nv Jan 22, 2024
1bc597e
Merge branch 'branch-24.02' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Jan 23, 2024
d6033f8
Merge branch 'perf-testing-dgl' into perf-testing-wholegraph
alexbarghi-nv Jan 23, 2024
2434c93
wholegraph testing
alexbarghi-nv Jan 23, 2024
4e59a8f
remove stats file
alexbarghi-nv Jan 23, 2024
18df9b5
Merge branch 'branch-24.02' into perf-testing-dgl
alexbarghi-nv Jan 23, 2024
cea803e
wholegraph for dgl
alexbarghi-nv Jan 25, 2024
5557364
style
alexbarghi-nv Jan 25, 2024
1776cc8
Merge branch 'branch-24.02' into perf-testing-dgl
alexbarghi-nv Jan 25, 2024
cd44ee2
minor fixes, docs
alexbarghi-nv Jan 25, 2024
bd0a39c
Merge branch 'perf-testing-wholegraph' into perf-testing-dgl
alexbarghi-nv Jan 25, 2024
3c63406
Merge branch 'perf-testing-dgl' of https://github.com/alexbarghi-nv/c…
alexbarghi-nv Jan 25, 2024
74f1dd8
remove fixme
alexbarghi-nv Jan 25, 2024
b700efa
fixes, put data in wg
alexbarghi-nv Jan 25, 2024
228a21c
resolve merge conflict
alexbarghi-nv Jan 26, 2024
acf66b1
fix pyg
alexbarghi-nv Jan 27, 2024
601e042
fixes
alexbarghi-nv Jan 28, 2024
9682c3a
remove debug code
alexbarghi-nv Jan 28, 2024
7f49a37
c
alexbarghi-nv Jan 30, 2024
ad0fb96
remove unwanted file
alexbarghi-nv Jan 30, 2024
041bc7c
reformat
alexbarghi-nv Jan 30, 2024
1ca7615
Merge branch 'branch-24.02' into perf-testing-dgl
alexbarghi-nv Jan 30, 2024
97792d8
update copyright
alexbarghi-nv Jan 31, 2024
f1dbc82
Merge branch 'branch-24.02' into perf-testing-dgl
BradReesWork Feb 1, 2024
18aeafb
Merge branch 'branch-24.02' into perf-testing-dgl
BradReesWork Feb 2, 2024
270c18e
Merge branch 'branch-24.04' into perf-testing-dgl
alexbarghi-nv Feb 4, 2024
20f78e4
Merge branch 'branch-24.04' into perf-testing-dgl
alexbarghi-nv Feb 7, 2024
fec3f1c
Merge branch 'branch-24.04' into perf-testing-dgl
alexbarghi-nv Feb 8, 2024
b7e4497
update scripts
alexbarghi-nv Feb 16, 2024
84476fa
merge
alexbarghi-nv Feb 16, 2024
47b498b
debugging
alexbarghi-nv Feb 19, 2024
80090a0
remove file
alexbarghi-nv Feb 19, 2024
70d5168
separate split, use int16
alexbarghi-nv Feb 19, 2024
80c599b
wg
alexbarghi-nv Feb 19, 2024
f8d385f
172
alexbarghi-nv Feb 19, 2024
4c75f93
force integer value
alexbarghi-nv Feb 19, 2024
892eb89
bulksampling
alexbarghi-nv Mar 1, 2024
2773d4e
cleanup
alexbarghi-nv Mar 1, 2024
f6b86fb
cleanup
alexbarghi-nv Mar 1, 2024
8ea7d85
Merge branch 'branch-24.04' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Mar 1, 2024
188c516
c
alexbarghi-nv Mar 4, 2024
6909206
barrier
alexbarghi-nv Mar 4, 2024
9bad936
gpus per node split
alexbarghi-nv Mar 4, 2024
cc1ce9b
t
alexbarghi-nv Mar 4, 2024
0301357
fix bugs
alexbarghi-nv Mar 4, 2024
cf10bae
move status check to end
alexbarghi-nv Mar 4, 2024
933afd3
Merge branch 'branch-24.04' into perf-testing-dgl
alexbarghi-nv Mar 5, 2024
1fa73e9
fix conflicts
alexbarghi-nv Mar 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmarks/cugraph/standalone/bulk_sampling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Next are standard GNN training arguments such as `FANOUT`, `BATCH_SIZE`, etc. Y
the number of training epochs here. These are followed by the `REPLICATION_FACTOR` argument, which
can be used to create replications of the dataset for scale testing purposes.

The final two arguments are `FRAMEWORK` which can be either "cuGraphPyG" or "PyG", and `GPUS_PER_NODE`
The final two arguments are `FRAMEWORK` which can be "cuGraphDGL", "cuGraphPyG" or "PyG", and `GPUS_PER_NODE`
which must be set to the correct value, even if this is provided by a SLURM argument. If `GPUS_PER_NODE`
is not set to the correct number of GPUs, the script will hang indefinitely until it times out. Mismatched
GPUs per node is currently unsupported by this script but should be possible in practice.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,13 @@ def parse_args():
required=True,
)

parser.add_argument(
"--use_wholegraph",
action="store_true",
help="Whether to use WholeGraph feature storage",
required=False,
)

parser.add_argument(
"--model",
type=str,
Expand Down Expand Up @@ -162,6 +169,13 @@ def parse_args():
required=False,
)

parser.add_argument(
"--skip_download",
action="store_true",
help="Whether to skip downloading",
required=False,
)

return parser.parse_args()


Expand All @@ -186,16 +200,38 @@ def main(args):

world_size = int(os.environ["SLURM_JOB_NUM_NODES"]) * args.gpus_per_node

if args.use_wholegraph:
# TODO support WG without cuGraph
if args.framework not in ["cuGraphPyG", "cuGraphDGL"]:
raise ValueError("WG feature store only supported with cuGraph backends")
from pylibwholegraph.torch.initialize import (
get_global_communicator,
get_local_node_communicator,
init,
)

logger.info("initializing WG comms...")
init(global_rank, world_size, local_rank, args.gpus_per_node)
wm_comm = get_global_communicator()
get_local_node_communicator()

wm_comm = wm_comm.wmb_comm
logger.info(f"rank {global_rank} successfully initialized WG comms")
wm_comm.barrier()

dataset = OGBNPapers100MDataset(
replication_factor=args.replication_factor,
dataset_dir=args.dataset_dir,
train_split=args.train_split,
val_split=args.val_split,
load_edge_index=(args.framework == "PyG"),
backend="wholegraph" if args.use_wholegraph else "torch",
)

if global_rank == 0:
# Note: this does not generate WG files
if global_rank == 0 and not args.skip_download:
dataset.download()

dist.barrier()

fanout = [int(f) for f in args.fanout.split("_")]
Expand Down Expand Up @@ -234,6 +270,28 @@ def main(args):
replace=False,
num_neighbors=fanout,
batch_size=args.batch_size,
backend="wholegraph" if args.use_wholegraph else "torch",
)
elif args.framework == "cuGraphDGL":
sample_dir = os.path.join(
args.sample_dir,
f"ogbn_papers100M[{args.replication_factor}]_b{args.batch_size}_f{fanout}",
)
from trainers.dgl import DGLCuGraphTrainer

trainer = DGLCuGraphTrainer(
model=args.model,
dataset=dataset,
sample_dir=sample_dir,
device=local_rank,
rank=global_rank,
world_size=world_size,
num_epochs=args.num_epochs,
shuffle=True,
replace=False,
num_neighbors=[int(f) for f in args.fanout.split("_")],
batch_size=args.batch_size,
backend="wholegraph" if args.use_wholegraph else "torch",
)
else:
raise ValueError("unsupported framework")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -200,19 +200,20 @@ def sample_graph(

total_time = 0.0
for epoch in range(num_epochs):
steps = [("train", train_df), ("test", test_df)]
steps = [("train", train_df)]
if epoch == num_epochs - 1:
steps.append(("val", val_df))
steps.append(("test", test_df))

for step, batch_df in steps:
batch_df = batch_df.sample(frac=1.0, random_state=seed)

if step == "val":
output_sample_path = os.path.join(output_path, "val", "samples")
else:
if step == "train":
output_sample_path = os.path.join(
output_path, f"epoch={epoch}", f"{step}", "samples"
)
else:
output_sample_path = os.path.join(output_path, step, "samples")
os.makedirs(output_sample_path)

sampler = BulkSampler(
Expand Down Expand Up @@ -372,7 +373,7 @@ def load_disk_dataset(
can_edge_type = tuple(edge_type.split("__"))
edge_index_dict[can_edge_type] = dask_cudf.read_parquet(
Path(parquet_path) / edge_type / "edge_index.parquet"
).repartition(n_workers * 2)
).repartition(npartitions=n_workers * 2)

edge_index_dict[can_edge_type]["src"] += node_offsets_replicated[
can_edge_type[0]
Expand Down Expand Up @@ -431,7 +432,7 @@ def load_disk_dataset(
if os.path.exists(node_label_path):
node_labels[node_type] = (
dask_cudf.read_parquet(node_label_path)
.repartition(n_workers)
.repartition(npartitions=n_workers)
.drop("label", axis=1)
.persist()
)
Expand Down Expand Up @@ -574,8 +575,8 @@ def benchmark_cugraph_bulk_sampling(
"use_legacy_names": False,
"include_hop_column": False,
}
else:
# FIXME: Update these arguments when CSC mode is fixed in cuGraph-PyG (release 24.02)
elif sampling_target_framework == "cugraph_pyg":
# FIXME: Update these arguments when CSC mode is fixed in cuGraph-PyG (release 24.04)
sampling_kwargs = {
"deduplicate_sources": True,
"prior_sources_behavior": "exclude",
Expand All @@ -585,8 +586,10 @@ def benchmark_cugraph_bulk_sampling(
"use_legacy_names": False,
"include_hop_column": True,
}
else:
raise ValueError("Only cugraph_dgl_csr or cugraph_pyg are valid frameworks")

batches_per_partition = 600_000 // batch_size
batches_per_partition = 256
execution_time, allocation_counts = sample_graph(
G=G,
label_df=dask_label_df,
Expand Down Expand Up @@ -761,9 +764,9 @@ def get_args():
logger.setLevel(logging.INFO)

args = get_args()
if args.sampling_target_framework not in ["cugraph_dgl_csr", None]:
if args.sampling_target_framework not in ["cugraph_dgl_csr", "cugraph_pyg"]:
raise ValueError(
"sampling_target_framework must be one of cugraph_dgl_csr or None",
"sampling_target_framework must be one of cugraph_dgl_csr or cugraph_pyg",
"Other frameworks are not supported at this time.",
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def __init__(
train_split=0.8,
val_split=0.5,
load_edge_index=True,
backend="torch",
):
self.__replication_factor = replication_factor
self.__disk_x = None
Expand All @@ -43,6 +44,7 @@ def __init__(
self.__train_split = train_split
self.__val_split = val_split
self.__load_edge_index = load_edge_index
self.__backend = backend

def download(self):
import logging
Expand Down Expand Up @@ -152,6 +154,27 @@ def download(self):
)
ldf.to_parquet(node_label_file_path)

# WholeGraph
wg_bin_file_path = os.path.join(dataset_path, "wgb", "paper")
if self.__replication_factor == 1:
wg_bin_rep_path = os.path.join(wg_bin_file_path, "node_feat.d")
else:
wg_bin_rep_path = os.path.join(
wg_bin_file_path, f"node_feat_{self.__replication_factor}x.d"
)

if not os.path.exists(wg_bin_rep_path):
os.makedirs(wg_bin_rep_path)
if dataset is None:
from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset(
name="ogbn-papers100M", root=self.__dataset_dir
)
node_feat = dataset[0][0]["node_feat"]
for k in range(self.__replication_factor):
node_feat.tofile(os.path.join(wg_bin_rep_path, f"{k:04d}.bin"))

@property
def edge_index_dict(
self,
Expand Down Expand Up @@ -224,21 +247,59 @@ def edge_index_dict(

@property
def x_dict(self) -> Dict[str, torch.Tensor]:
if self.__disk_x is None:
if self.__backend == "wholegraph":
self.__load_x_wg()
else:
self.__load_x_torch()

return self.__disk_x

def __load_x_torch(self) -> None:
node_type_path = os.path.join(
self.__dataset_dir, "ogbn_papers100M", "npy", "paper"
)
if self.__replication_factor == 1:
full_path = os.path.join(node_type_path, "node_feat.npy")
else:
full_path = os.path.join(
node_type_path, f"node_feat_{self.__replication_factor}x.npy"
)

self.__disk_x = {"paper": torch.as_tensor(np.load(full_path, mmap_mode="r"))}

def __load_x_wg(self) -> None:
import logging

logger = logging.getLogger("OGBNPapers100MDataset")
logger.info("Loading x into WG embedding...")

import pylibwholegraph.torch as wgth

node_type_path = os.path.join(
self.__dataset_dir, "ogbn_papers100M", "wgb", "paper"
)
if self.__replication_factor == 1:
full_path = os.path.join(node_type_path, "node_feat.d")
else:
full_path = os.path.join(
node_type_path, f"node_feat_{self.__replication_factor}x.d"
)

if self.__disk_x is None:
if self.__replication_factor == 1:
full_path = os.path.join(node_type_path, "node_feat.npy")
else:
full_path = os.path.join(
node_type_path, f"node_feat_{self.__replication_factor}x.npy"
)
file_list = [os.path.join(full_path, f) for f in os.listdir(full_path)]

x = wgth.create_embedding_from_filelist(
wgth.get_global_communicator(),
"distributed", # TODO support other options
"cpu", # TODO support GPU
file_list,
torch.float32,
128,
)

self.__disk_x = {"paper": np.load(full_path, mmap_mode="r")}
logger.info("created x wg embedding")

return self.__disk_x
self.__disk_x = {"paper": x}

@property
def y_dict(self) -> Dict[str, torch.Tensor]:
Expand Down Expand Up @@ -321,7 +382,7 @@ def __get_labels(self):
torch.as_tensor(node_label.node.values, device="cpu")
] = torch.as_tensor(node_label.label.values, device="cpu")

self.__y = {"paper": node_label_tensor.contiguous()}
self.__y = {"paper": node_label_tensor.to(torch.int16).contiguous()}

train_ix, test_val_ix = train_test_split(
torch.as_tensor(node_label.node.values),
Expand Down
15 changes: 15 additions & 0 deletions benchmarks/cugraph/standalone/bulk_sampling/models/dgl/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from .models_dgl import GraphSAGE
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
import torch.nn.functional as F


class GraphSAGE(torch.nn.Module):
"""
GraphSAGE model implementation for DGL
supporting both native DGL and cuGraph-ops
backends.
"""

def __init__(
self,
in_channels,
hidden_channels,
out_channels,
num_layers,
model_backend="dgl",
):
if model_backend == "dgl":
from dgl.nn import SAGEConv
else:
from cugraph_dgl.nn import SAGEConv

super(GraphSAGE, self).__init__()
self.convs = torch.nn.ModuleList()
for _ in range(num_layers - 1):
self.convs.append(
SAGEConv(in_channels, hidden_channels, aggregator_type="mean")
)
in_channels = hidden_channels
self.convs.append(
SAGEConv(hidden_channels, out_channels, aggregator_type="mean")
)

def forward(self, blocks, x):
alexbarghi-nv marked this conversation as resolved.
Show resolved Hide resolved
"""
Runs the model forward pass given a list of blocks
and feature tensor.
"""

for i, conv in enumerate(self.convs):
x = conv(blocks[i], x)
if i != len(self.convs) - 1:
x = F.relu(x)
x = F.dropout(x, p=0.5)
return x


def create_model(feat_size, num_classes, num_layers, model_backend="dgl"):
model = GraphSAGE(
feat_size, 64, num_classes, num_layers, model_backend=model_backend
)
model = model.to("cuda")
model.train()
return model
Loading
Loading