Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds benchmarks for nx-cugraph #3854

Merged
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
a63d350
Adds initial benchmarks for cugraph-nx, still WIP.
rlratzel Aug 24, 2023
87045be
Adds benchmarks specifically for larger datasets that use the k param…
rlratzel Aug 26, 2023
c08d603
Update the docstrings of the similarity algorithms (#3817)
jnke2016 Aug 25, 2023
3448b96
Use `copy-pr-bot` (#3827)
ajschmidt8 Aug 29, 2023
eb7de38
Disable mg tests (#3833)
naimnv Aug 30, 2023
0b79ea3
Fix OD shortest distance matrix computation test failures. (#3813)
seunghwak Aug 30, 2023
3c8a8c6
Remove legacy betweenness centrality (#3829)
jnke2016 Aug 30, 2023
b496254
Update README.md (#3826)
lmeyerov Aug 30, 2023
f270817
Add `louvain_communities` to cugraph-nx (#3803)
eriknw Aug 31, 2023
d909d8d
[BUG] Fix Batch Renumbering of Empty Batches (#3823)
alexbarghi-nv Aug 31, 2023
c0df6e2
Simplify wheel build scripts and allow alphas of RAPIDS dependencies …
vyasr Aug 31, 2023
262e281
Remove Deprecated Sampling Options (#3816)
alexbarghi-nv Sep 1, 2023
ccc8653
Use new `raft::compiled_static` targets (#3842)
divyegala Sep 6, 2023
7c5f38b
[IMP] Add ability to get batch size from the loader in cuGraph-PyG (#…
alexbarghi-nv Sep 6, 2023
80b7ae0
Rename `cugraph-nx` to `nx-cugraph` (#3840)
eriknw Sep 6, 2023
a24341d
Migrate upstream models to `cugraph-pyg` (#3763)
tingyu66 Sep 6, 2023
5574eb6
Expose threshold in louvain (#3792)
ChuckHastings Sep 6, 2023
a2972b5
Remove the assumption made on the client data's keys (#3835)
jnke2016 Sep 7, 2023
4b18ddf
Adding metadata getter methods to datasets API (#3821)
nv-rliu Sep 8, 2023
6282965
Uses `conda mambabuild` rather than `mamba mambabuild` (#3853)
rlratzel Sep 8, 2023
5babc71
Merge remote-tracking branch 'upstream/branch-23.10' into branch-23.1…
rlratzel Sep 8, 2023
c3194c4
Renames dir to nx-cugraph for consistency with new package name.
rlratzel Sep 8, 2023
96dc4f2
Merge remote-tracking branch 'upstream/branch-23.10' into branch-23.1…
rlratzel Sep 26, 2023
5091a8b
Merge remote-tracking branch 'upstream/branch-23.10' into branch-23.1…
rlratzel Sep 27, 2023
9248af0
Adds benchmark for louvain using small graphs, adds support for Netwo…
rlratzel Sep 27, 2023
15b28ee
Adds benchmark for louvain_communities using medium size graphs.
rlratzel Sep 27, 2023
c458918
Removed unused imports, adds comment describing fixture args (ids, et…
rlratzel Sep 28, 2023
eb703ac
Merge remote-tracking branch 'upstream/branch-23.10' into branch-23.1…
rlratzel Sep 28, 2023
ca5e255
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
rlratzel Sep 28, 2023
1232897
Removes FIXME, minor code cleanup.
rlratzel Sep 28, 2023
880ff6b
Initial black run on benchmark files.
rlratzel Sep 29, 2023
ade2937
Merge branch 'branch-23.10-cugraph_nx_benchmarks' of github.com:rlrat…
trxcllnt Sep 29, 2023
8295fd3
bump versions 23.10 -> 23.12
trxcllnt Sep 29, 2023
c8c8dc0
leave one thread free
trxcllnt Sep 29, 2023
f845799
Merge branch 'branch-23.12' of github.com:rapidsai/cugraph into rlrat…
trxcllnt Sep 29, 2023
7d6b350
fix dependencies.yaml merge from 23.10 -> 23.12
trxcllnt Sep 29, 2023
3fdb9ba
update ucx-py, dask, and distributed versions
trxcllnt Sep 29, 2023
c827b81
separate CUDA suffixes for pylibcugraphops
trxcllnt Sep 29, 2023
2ec390e
Merge branch 'branch-23.12' of github.com:rapidsai/cugraph into rlrat…
trxcllnt Sep 29, 2023
c00c3f1
WholeGraph Feature Store for cuGraph-PyG and cuGraph-DGL (#3874)
alexbarghi-nv Sep 30, 2023
dc87455
increase timeout
jnke2016 Oct 3, 2023
a863835
Integrate renumbering and compression to `cugraph-dgl` to accelerate …
tingyu66 Oct 3, 2023
a564957
add env var to the wheel run
jnke2016 Oct 3, 2023
3f1547f
Merge remote-tracking branch 'upstream/branch-23.10' into branch-23.1…
jnke2016 Oct 3, 2023
74c63fb
increase timeouts
jnke2016 Oct 4, 2023
8f00bbd
Adds k as a param for BC bench.
rlratzel Oct 6, 2023
9955b7c
Merge branch 'branch-23.10-cugraph_nx_benchmarks' of https://github.c…
rlratzel Oct 6, 2023
bf74a73
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
rlratzel Oct 6, 2023
cbdbd8a
Merge remote-tracking branch 'jnke2016/branch-23.10_increase-timeout'…
rlratzel Oct 6, 2023
80b9cf2
Updates pylibwholegraph version to 23.12.
rlratzel Oct 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ repos:
- id: black
language_version: python3
args: [--target-version=py38]
files: ^python/
files: ^(python/.*|benchmarks/.*)$
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
Expand Down
213 changes: 213 additions & 0 deletions benchmarks/nx-cugraph/pytest-based/bench_algos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import networkx as nx
import pandas as pd
import pytest
from cugraph import datasets

# FIXME: promote these to cugraph.datasets so the following steps aren't
# necessary
#
# These datasets can be downloaded using the script in the 'datasets' dir:
#
# cd <repo dir>/datasets
# ./get_test_data.sh --benchmark
#
# Then set the following env var so the dataset utils can find their location:
#
# export RAPIDS_DATASET_ROOT_DIR=<repo dir>/datasets
#
from cugraph_benchmarking.params import (
hollywood,
europe_osm,
cit_patents,
soc_livejournal,
)

################################################################################
# Fixtures and helpers
backend_params = ["cugraph", None]

dataset_params = [
pytest.param(datasets.karate, marks=[pytest.mark.small, pytest.mark.undirected]),
pytest.param(datasets.netscience, marks=[pytest.mark.small, pytest.mark.directed]),
pytest.param(
datasets.email_Eu_core, marks=[pytest.mark.small, pytest.mark.directed]
),
pytest.param(cit_patents, marks=[pytest.mark.medium, pytest.mark.directed]),
pytest.param(hollywood, marks=[pytest.mark.medium, pytest.mark.undirected]),
pytest.param(europe_osm, marks=[pytest.mark.medium, pytest.mark.undirected]),
pytest.param(soc_livejournal, marks=[pytest.mark.large, pytest.mark.directed]),
]


def nx_graph_from_dataset(dataset_obj):
"""
Read the dataset specified by the dataset_obj and create and return a
nx.Graph or nx.DiGraph instance based on the dataset is_directed metadata.
"""
create_using = nx.DiGraph if dataset_obj.metadata["is_directed"] else nx.Graph
names = dataset_obj.metadata["col_names"]
dtypes = dataset_obj.metadata["col_types"]
if isinstance(dataset_obj.metadata["header"], int):
header = dataset_obj.metadata["header"]
else:
header = None

pandas_edgelist = pd.read_csv(
dataset_obj.get_path(),
delimiter=dataset_obj.metadata["delim"],
names=names,
dtype=dict(zip(names, dtypes)),
header=header,
)
G = nx.from_pandas_edgelist(
pandas_edgelist, source=names[0], target=names[1], create_using=create_using
)
return G


# Test IDs are generated using the lambda assigned to the ids arg to provide an
# easier-to-read name from the Dataset obj string repr.
# See: https://docs.pytest.org/en/stable/reference/reference.html#pytest-fixture
@pytest.fixture(scope="module", params=dataset_params, ids=lambda ds: f"ds={str(ds)}")
def graph_obj(request):
"""
Returns a NX Graph or DiGraph obj from the dataset instance parameter.
"""
dataset = request.param
return nx_graph_from_dataset(dataset)


# FIXME: this is needed for networkx <3.2, networkx >=3.2 simply allows the
# backend to be specified using a parameter. For now, use the same technique
# for all NX versions
try:
from networkx.classes import backends # NX <3.2

_using_legacy_dispatcher = True
except ImportError:
backends = None
_using_legacy_dispatcher = False


def get_legacy_backend_selector(backend_name):
"""
Returns a callable that wraps an algo function with either the default
dispatch decorator, or the "testing" decorator which unconditionally
dispatches.
This is only supported for NetworkX <3.2
"""
backends.plugin_name = "cugraph"
orig_dispatch = backends._dispatch
testing_dispatch = backends.test_override_dispatch

# Testing with the networkx <3.2 dispatch mechanism is based on decorating
# networkx APIs. The decorator is either one that only uses a backend if
# the input graph type is for that backend (the default decorator), or the
# "testing" decorator, which unconditionally converts a graph type to the
# type needed by the backend then calls the backend. If the cugraph backend
# is specified, create a callable that decorates the benchmarked function
# with the testing decorator.
#
# Because both the default and testing decorators assume they are only
# applied once and do bookkeeping to ensure algos are not registered
# multiple times, the callable also clears bookkeeping so the decorators
# can be reapplied multiple times. This is obviously a hack and networkx
# >=3.2 makes this use case properly supported.
if backend_name == "cugraph":

def wrapper(*args, **kwargs):
backends._registered_algorithms = {}
return testing_dispatch(*args, **kwargs)

else:

def wrapper(*args, **kwargs):
backends._registered_algorithms = {}
return orig_dispatch(*args, **kwargs)

return wrapper


def get_backend_selector(backend_name):
"""
Returns a callable that wraps an algo function in order to set the
"backend" kwarg on it.
This is only supported for NetworkX >= 3.2
"""

def get_callable_for_func(func):
def wrapper(*args, **kwargs):
kwargs["backend"] = backend_name
return func(*args, **kwargs)

return wrapper

return get_callable_for_func


@pytest.fixture(
scope="module", params=backend_params, ids=lambda backend: f"backend={backend}"
)
def backend_selector(request):
"""
Returns a callable that takes a function algo and wraps it in another
function that calls the algo using the appropriate backend.
"""
backend_name = request.param
if _using_legacy_dispatcher:
return get_legacy_backend_selector(backend_name)
else:
return get_backend_selector(backend_name)


################################################################################
# Benchmarks
normalized_params = [True, False]
k_params = [10, 100, 1000]


@pytest.mark.parametrize("normalized", normalized_params, ids=lambda norm: f"{norm=}")
def bench_betweenness_centrality(benchmark, graph_obj, backend_selector, normalized):
result = benchmark(
backend_selector(nx.betweenness_centrality),
graph_obj,
weight=None,
normalized=normalized,
)
assert type(result) is dict


@pytest.mark.parametrize("normalized", normalized_params, ids=lambda norm: f"{norm=}")
def bench_edge_betweenness_centrality(
benchmark, graph_obj, backend_selector, normalized
):
result = benchmark(
backend_selector(nx.edge_betweenness_centrality),
graph_obj,
weight=None,
normalized=normalized,
)
assert type(result) is dict


def bench_louvain_communities(benchmark, graph_obj, backend_selector):
# The cugraph backend for louvain_communities only supports undirected graphs
if isinstance(graph_obj, nx.DiGraph):
G = graph_obj.to_undirected()
else:
G = graph_obj
result = benchmark(backend_selector(nx.community.louvain_communities), G)
assert type(result) is list
4 changes: 3 additions & 1 deletion benchmarks/pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ markers =
managedmem_off: RMM managed memory disabled
poolallocator_on: RMM pool allocator enabled
poolallocator_off: RMM pool allocator disabled
small: small datasets
tiny: tiny datasets
small: small datasets
medium: medium datasets
large: large datasets
directed: directed datasets
undirected: undirected datasets
matrix_types: inputs are matrices
Expand Down
20 changes: 19 additions & 1 deletion benchmarks/shared/python/cugraph_benchmarking/params.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,11 @@

from pylibcugraph.testing.utils import gen_fixture_params
from cugraph.testing import RAPIDS_DATASET_ROOT_DIR_PATH
from cugraph.experimental.datasets import (
from cugraph.datasets import (
Dataset,
karate,
netscience,
email_Eu_core,
)

# Create Dataset objects from .csv files.
Expand All @@ -27,18 +29,22 @@
csv_file=RAPIDS_DATASET_ROOT_DIR_PATH / "csv/undirected/hollywood.csv",
csv_col_names=["src", "dst"],
csv_col_dtypes=["int32", "int32"])
hollywood.metadata["is_directed"] = False
europe_osm = Dataset(
csv_file=RAPIDS_DATASET_ROOT_DIR_PATH / "csv/undirected/europe_osm.csv",
csv_col_names=["src", "dst"],
csv_col_dtypes=["int32", "int32"])
europe_osm.metadata["is_directed"] = False
cit_patents = Dataset(
csv_file=RAPIDS_DATASET_ROOT_DIR_PATH / "csv/directed/cit-Patents.csv",
csv_col_names=["src", "dst"],
csv_col_dtypes=["int32", "int32"])
cit_patents.metadata["is_directed"] = True
soc_livejournal = Dataset(
csv_file=RAPIDS_DATASET_ROOT_DIR_PATH / "csv/directed/soc-LiveJournal1.csv",
csv_col_names=["src", "dst"],
csv_col_dtypes=["int32", "int32"])
soc_livejournal.metadata["is_directed"] = True

# Assume all "file_data" (.csv file on disk) datasets are too small to be useful for MG.
undirected_datasets = [
Expand All @@ -62,6 +68,18 @@
]

directed_datasets = [
pytest.param(netscience,
marks=[pytest.mark.small,
pytest.mark.directed,
pytest.mark.file_data,
pytest.mark.sg,
]),
pytest.param(email_Eu_core,
marks=[pytest.mark.small,
pytest.mark.directed,
pytest.mark.file_data,
pytest.mark.sg,
]),
pytest.param(cit_patents,
marks=[pytest.mark.small,
pytest.mark.directed,
Expand Down