nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators #4629

eriknw · 2024-08-22T21:37:34Z

This is an alternative approach to #4558 for enabling GPU-accelerated NetworkX to "just work". It has similarities to #4558. I opted to make separate classes such as ZeroGraph, which I think makes for cleaner separation and gives us and users more control.

There are a few lingering TODOs and code comments to tidy up, but I don't think there are any show-stoppers. I have not updated methods (such as number_of_nodes) to optimistically try to use GPU if possible, b/c this is not strictly necessary, but we should update these soon.

I have run NetworkX tests with these classes using networkx/networkx#7585 and networkx/networkx#7600. We need the behavior in 7585, and 7600 is only useful for testing.

There are only 3 new failing tests, and 3 tests that hang (I'll run them overnight to see if they finish). Here's a test summary:

5548 passed, 24 skipped, 16 xfailed, 25 xpassed

Note that 25 tests that were failing now pass. I have not investigated test failures, xfails, or xpasses yet. I would like to add tests too.

We rely heavily on the networkx cache. I think this is preferred.

It is late for me. I will describe and show how and why this works later.

I opted for zero= and ZeroGraph, because I find them delightful! Renaming is trivial if other terms are preferred.

CC @quasiben

rlratzel

Hey Erik, thanks for the PR, I do have a few feedback items:

I don't think we should be introducing a new class for this, I think this should just be the expected behavior for nx-cugraph. If we don't feel confident in this being a stable solution and want to give users an escape hatch, then I'd rather rename the original graph classes (perhaps call those GPUGraph?)
- edit: I should clarify that I think the separation the new class provides to the implementation is good. I was referring to exposing a new class to end users, and how I think sticking with one family of Graph classes for users to interact with would be better IMO.
If there's a technical reason we need to keep the new class, I'd prefer to change the name since "ZeroGraph" feels too close to something like a Null graph IMO. I don't think users will make the connection between "zero" and "zero-code-change" here. I've been using the acronym "ZCC" in place of the marketing term "zero-code-change", so maybe "ZCCGraph"? But again, I'd rather just make this the "Graph" class and rename the legacy one if necessary.
- edit: Similar to the above edit, I was referring to if we needed to expose the new class to end users for some reason that we should change the name.
Am I correct to assume that if the user disables caching in NetworkX that this will break? If so, this worries me since NetworkX outputs a warning that one could reasonably interpret as "use caching at your own risk".

When I use this PR with networkx/networkx#7585, I'm still not able to run NX functions that mutate graph inputs. Am I missing a config var setting for this? I've updated the description in #4558 to include the test script I'm using. The test script passes when used with networkx/networkx#7578 and #4558

NotImplementedError: Backend 'cugraph' does not implement `pad_graph'. This call will mutate an input, so automatic conversions between backends (if configured) will not be attempted, because this may change behavior. You may specify a backend to use by passing e.g. `backend='networkx'` keyword, but this may also change behavior by not mutating inputs.

eriknw · 2024-08-25T21:40:49Z

Some examples when using networkx/networkx#7585

In [1]: import networkx as nx
In [2]: import nx_cugraph as nxcg
In [3]: import pandas as pd
In [4]: import logging

In [5]: nxl = logging.getLogger("networkx")
In [6]: nxl.addHandler(logging.StreamHandler())
In [7]: nxl.setLevel(logging.DEBUG)

In [8]: # Configure graph generators to use cugraph.
In [9]: # Can also be set via NETWORKX_BACKEND_PRIORITY_GENERATORS environment variable
In [10]: nx.config.backend_priority.generators = ["cugraph"]

In [11]: # DataFrame with int and string dtypes for edge data
In [12]: df = pd.DataFrame({"source": ['a', 'b'], "target": ['b', 'c'], "x": [1, 2], "y": ["a", "b"]})

In [13]: # Only use int edge data; is on GPU
In [14]: G = nx.from_pandas_edgelist(df, edge_attr=["x"])
Call to `from_pandas_edgelist' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `from_pandas_edgelist' with arguments: (df=  source target  x  y
0      a      b  1  a
1      b      c  2  b, source='source', target='target', edge_attr=['x'], create_using=None, edge_key=None)

In [15]: G._is_on_gpu
Out[15]: True

In [16]: G._is_on_cpu
Out[16]: False

In [17]: # Use both int and str edge data; is on CPU
In [18]: G = nx.from_pandas_edgelist(df, edge_attr=["x", "y"])
Call to `from_pandas_edgelist' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `from_pandas_edgelist' with arguments: (df=  source target  x  y
0      a      b  1  a
1      b      c  2  b, source='source', target='target', edge_attr=['x', 'y'], create_using=None, edge_key=None)
Backend 'cugraph' raised NotImplementedError when calling `from_pandas_edgelist'
Trying next backend: 'networkx'
Call to `empty_graph' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `empty_graph' with arguments: (n=0, create_using=None, default=<class 'networkx.classes.graph.Graph'>)

In [19]: G._is_on_gpu
Out[19]: False

In [20]: G._is_on_cpu
Out[20]: True

In [21]: # This algorithm only uses int edge data; use GPU
In [22]: %time nx.pagerank(G, weight="x")
Call to `pagerank' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `pagerank' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f471355b650>, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight='x', dangling=None)
CPU times: user 55.3 ms, sys: 36.2 ms, total: 91.5 ms
Wall time: 121 ms
Out[22]: {'a': 0.18783806264400482, 'b': 0.48648586869239807, 'c': 0.3256761133670807}

In [23]: # The previous call cached to the GPU, so subsequent calls using this data are faster
In [24]: %time nx.pagerank(G, weight="x")
Call to `pagerank' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `pagerank' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f471355b650>, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight='x', dangling=None)
CPU times: user 10.4 ms, sys: 0 ns, total: 10.4 ms
Wall time: 10.3 ms
Out[24]: {'a': 0.18783806264400482, 'b': 0.48648586869239807, 'c': 0.3256761133670807}

In [25]: # Creating an empty graph also returns an nx-cugraph graph
In [26]: G = nx.empty_graph()
Call to `empty_graph' has inputs from no backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `empty_graph' with arguments: (n=0, create_using=None, default=<class 'networkx.classes.graph.Graph'>)

In [27]: # Mutation always forces us to the CPU (for now)
In [28]: G.add_edge(0, 1, x=1)

In [29]: G._is_on_gpu
Out[29]: False

In [30]: G._is_on_cpu
Out[30]: True

In [31]: # But we can get the GPU graph
In [32]: G._cugraph  # Suggestions for a better name?
Out[32]: <nx_cugraph.classes.graph.Graph at 0x7f471462c790>

In [33]: G._is_on_gpu
Out[33]: True

In [34]: G._is_on_cpu
Out[34]: True

# This graph can be used (with no conversion) by normal networkx algorithms that nx-cugraph does not implement
In [35]: nx.harmonic_centrality(G)
Call to `harmonic_centrality' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Backend 'cugraph' does not implement `harmonic_centrality'
Trying next backend: 'networkx'
Converting input graphs from 'cugraph' backend to 'networkx' backend for call to `harmonic_centrality'
Caching converted graph (from 'cugraph' to 'networkx' backend) in call to `harmonic_centrality' for 'G' argument
Call to `shortest_path_length' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `shortest_path_length' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f4713ebc890>, source=0, target=None, weight=None, method='dijkstra')
Call to `shortest_path_length' has inputs from {'cugraph'} backends, and will try to use backends in the following order: ['cugraph', 'networkx']
Using backend 'cugraph' for call to `shortest_path_length' with arguments: (G=<nx_cugraph.classes.zero.ZeroGraph object at 0x7f4713ebc890>, source=1, target=None, weight=None, method='dijkstra')
Out[35]: {0: 1.0, 1: 1.0}

In [36]: # We can also use string or any Python object as edge data
In [37]: G.add_edge(0, 2, y="hi")
In [38]: G.add_edge(1, 2, z=["x", object()])

# Let's compare performance of using `G.add_edge` in a tight loop
In [39]: %%timeit
    ...: G = nx.empty_graph(backend="networkx")
    ...: for i in range(1_000_000):
    ...:     G.add_edge(i, i+1)
    ...:
1.13 s ± 6.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [40]: %%timeit
    ...: G = nx.empty_graph(backend="cugraph")
    ...: for i in range(1_000_000):
    ...:     G.add_edge(i, i+1)
    ...:
1.67 s ± 6.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Some replies:

We can change names and choose whatever UX we think is best; the approach in this PR is flexible. zero was helpful to implement and will be easy to do find and replace
This will not break if the user disables cache, but we will still use the cache
Functions that mutate the graph are not yet generally supported. Expect a solution for this release, but not by Tuesday

In addition to refining the final UX, this has a couple rough edges to smooth out and we should do more thorough testing (and optimizing), but as the examples above show it is extremely capable today.

I have gone through a couple iterations to try to get CI to pass.

In addition to renaming and moving graph classes around, this smooths out *many* rough edges for zero-code change use cases. There is more to be done, but this is excellent progress and should be good enough to use. Now to see about getting CI to pass!

rlratzel

Thanks, I'm very excited to get this functionality! Comments/suggestions/questions below.

rlratzel · 2024-09-16T20:17:15Z

ci/run_nx_cugraph_pytests.sh

+NX_CUGRAPH_USE_COMPAT_GRAPHS=False pytest --capture=no --cache-clear --benchmark-disable "$@" tests
+NX_CUGRAPH_USE_COMPAT_GRAPHS=True pytest --capture=no --cache-clear --benchmark-disable "$@" tests


I think we should minimize the public API surface area and expose only the "compat" graph. We're not getting backwards compatibility by exposing both classes anyway, so we can expose GPUGraph (which I'm assuming is the non-compat graph) to users later if needed.
Or...do we get additional coverage by testing both compat and non-compat separately? I'd still prefer to start small and expand the API/our support burden as needed, but I can appreciate if it's used to get additional test coverage.

Yes, this is used to get addition test coverage. We can change the environment variable to something like _NX_CUGRAPH_USE_COMPAT_GRAPHS and use it as an internal implementation detail.

rlratzel · 2024-09-16T20:18:40Z

ci/test_python.sh

@@ -108,7 +108,7 @@ echo "nx-cugraph coverage from networkx tests: $_coverage"
 echo $_coverage | awk '{ if ($NF == "0.0%") exit 1 }'
 # Ensure all algorithms were called by comparing covered lines to function lines.
 # Run our tests again (they're fast enough) to add their coverage, then create coverage.json
-pytest \
+NX_CUGRAPH_USE_COMPAT_GRAPHS=False pytest \


Can you explain why this is False? It reads as if we're not getting test coverage from the NX suite for the "compat" Graph, which seems bad, but maybe it's necessary for now?

This should work for both True and False, and it would be great to be able to test both options. False (not "compat" graph) matches how we were testing before.

it would be great to be able to test both options

Why can't we test both? If we have to pick one, shouldn't we pick compat since (I'm assuming) it will be used much more?

python/nx-cugraph/nx_cugraph/classes/graph.py

rlratzel · 2024-09-16T22:09:44Z

python/nx-cugraph/nx_cugraph/classes/graph.py

+    def _reify_networkx(self) -> None:
+        """Copy graph to host (CPU) if necessary."""
+        if self.__dict__["_node"] is None:
+            # After we make this into an nx graph, we rely on the cache being correct


I think explaining this a bit more would help me understand the implementation better. Assuming you're referring to __networkx_cache__, I'm not seeing how that comes into play. Is it because self._cudagraph is actually a graph instance in the cache, or is there some other dependency on the cache here?

If we have graphs in the cache and reified networkx data structured (e.g., _adj and _node), then we want all the graphs to be consistent. Hence, the graph can be on host and device at the same time--it doesn't move back and forth keeping only one. Consistency requires the cache being "correct" such as being cleared when the networkx data structures are mutated. Notably, the inner dicts of the data structures are just dicts. The cache is cleared when using methods such as G.add_edge, but not when doing e.g. G[u][v][key] = val. If we wanted to be super-careful and strict, we could make our own mutable mapping class that clears the cache upon mutation, but this would come at a (possibly significant) performance penalty. This is what the cache warning message discusses.

Right I understand that, but I was asking more because the comment about the cache just seemed out of place. After looking at the code again, I remembered self._cudagraph is actually a property that uses the cache, which I'm assuming is what the comment is referring to. Assuming this is correct, I think I just needed a reminder that self._cudagraph could end up using a cached graph.

python/nx-cugraph/nx_cugraph/classes/graph.py

python/nx-cugraph/nx_cugraph/convert.py

python/nx-cugraph/nx_cugraph/interface.py

rlratzel · 2024-09-17T13:27:19Z

Can you also update the description and title of this PR since the implementation is quite different now?

bdice

Packaging changes look fine, once @rlratzel’s review questions are resolved.

rlratzel · 2024-09-24T15:06:32Z

/merge

nx-cugraph: add ZeroGraph for nx-compatibility (zero-code change)

7cf65b9

eriknw added improvement Improvement / enhancement to an existing function python nx-cugraph labels Aug 22, 2024

eriknw requested a review from a team as a code owner August 22, 2024 21:37

eriknw requested a review from rlratzel August 22, 2024 21:39

rlratzel reviewed Aug 23, 2024

View reviewed changes

Better use of caching and handling of versioning; exploring diapers

e6fb453

eriknw requested a review from a team as a code owner August 23, 2024 22:09

eriknw requested a review from KyleFromNVIDIA August 23, 2024 22:09

github-actions bot added the conda label Aug 23, 2024

eriknw added 3 commits August 24, 2024 13:28

Some clean up; try to make tests in CI pass

0b80ccf

Try to get CI to pass

24a9788

Merge branch 'branch-24.10' into zero

e51955c

eriknw requested a review from a team as a code owner August 25, 2024 20:43

github-actions bot added the ci label Aug 25, 2024

rlratzel mentioned this pull request Aug 27, 2024

[FEA]: nx-cugraph should intercept built-in constructors like from_pandas_edgelist if NETWORKX_BACKEND_PRIORITY=cugraph #4456

Closed

2 tasks

eriknw added 3 commits August 27, 2024 18:21

Try to get CI to pass by setting NX_CUGRAPH_ZERO=False

445a4da

Merge branch 'branch-24.10' into zero

5a56774

Update linting (dropped Python 3.9)

b6f3308

rlratzel added the non-breaking Non-breaking change label Aug 29, 2024

eriknw added 2 commits September 4, 2024 07:58

Merge branch 'branch-24.10' into zero

04f88ff

rlratzel reviewed Sep 17, 2024

View reviewed changes

Merge branch 'branch-24.10' into zero

b58388a

rlratzel changed the title ~~nx-cugraph: add ZeroGraph for nx-compatibility (zero-code change)~~ nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators Sep 21, 2024

Update and comment based on feedback

a8502cb

bdice approved these changes Sep 24, 2024

View reviewed changes

jakirkham removed the request for review from KyleFromNVIDIA September 24, 2024 22:19

rlratzel approved these changes Sep 24, 2024

View reviewed changes

rapids-bot bot merged commit 59217ad into rapidsai:branch-24.10 Sep 24, 2024
131 checks passed

rlratzel mentioned this pull request Sep 26, 2024

nx-cugraph: Fixes dependency on missing feature, adds attrs to allow auto-dispatch for generators #4558

Closed

eriknw mentioned this pull request Nov 18, 2024

Update and test _nx_cugraph._check_networkx_version rapidsai/nx-cugraph#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators #4629

nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators #4629

eriknw commented Aug 22, 2024

rlratzel left a comment •

edited

Loading

eriknw commented Aug 25, 2024

rlratzel left a comment

rlratzel Sep 16, 2024

eriknw Sep 23, 2024

rlratzel Sep 16, 2024

eriknw Sep 23, 2024

rlratzel Sep 24, 2024

rlratzel Sep 16, 2024

eriknw Sep 23, 2024

rlratzel Sep 24, 2024

rlratzel commented Sep 17, 2024

bdice left a comment

rlratzel commented Sep 24, 2024

		NX_CUGRAPH_USE_COMPAT_GRAPHS=False pytest --capture=no --cache-clear --benchmark-disable "$@" tests
		NX_CUGRAPH_USE_COMPAT_GRAPHS=True pytest --capture=no --cache-clear --benchmark-disable "$@" tests

nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators #4629

nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators #4629

Conversation

eriknw commented Aug 22, 2024

rlratzel left a comment • edited Loading

Choose a reason for hiding this comment

eriknw commented Aug 25, 2024

rlratzel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlratzel commented Sep 17, 2024

bdice left a comment

Choose a reason for hiding this comment

rlratzel commented Sep 24, 2024

rlratzel left a comment •

edited

Loading