From 10086dc7f87bf16b129b8533347318cf7dc2832c Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 2 Oct 2023 23:59:15 -0500 Subject: [PATCH] Updates to show nx vs. nx-cugraph for betweenness_centrality. --- notebooks/demo/nx_cugraph_demo.ipynb | 529 +++++++++++++++++++++------ 1 file changed, 417 insertions(+), 112 deletions(-) diff --git a/notebooks/demo/nx_cugraph_demo.ipynb b/notebooks/demo/nx_cugraph_demo.ipynb index 104da2fe1b0..05911564092 100644 --- a/notebooks/demo/nx_cugraph_demo.ipynb +++ b/notebooks/demo/nx_cugraph_demo.ipynb @@ -4,41 +4,136 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# `nx-cugraph`: Accelerating graph analysis on networkX " + "# `nx-cugraph`: a NetworkX backend that provides GPU acceleration with RAPIDS cuGraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In this notebook, we will show you how to use `nx-cugraph`, a collaboration between cuGraph and NetworkX, to accelerate graph analytics for NetworkX users.\n", + "This notebook will demonstrate the `nx-cugraph` NetworkX backend using the NetworkX betweenness_centrality algorithm.\n", "\n", - "Recently, NetworkX introduced a dispatcher that allows users to dispatch their workload to different backends. `nx-cugraph` is one of these backends, and uses GPU compute resources provided by `cugraph` to greatly improve performance.\n", + "## Background\n", + "Networkx version 3.0 introduced a dispatching mechanism that allows users to configure NetworkX to dispatch various algorithms to third-party backends. Backends can provide different implementations of graph algorithms, allowing users to take advantage of capabilities not available in NetworkX. `nx-cugraph` is a NetworkX backend provided by the [RAPIDS](https://rapids.ai) cuGraph project that adds GPU acceleration to greatly improve performance.\n", "\n", - "The main requirements for using `nx-cugraph` are \n", + "## System Requirements\n", + "Using `nx-cugraph` with this notebook requires the following: \n", + "- NVIDIA GPU, Pascal architecture or later\n", + "- CUDA 11.2, 11.4, 11.5, 11.8, or 12.0\n", "- Python versions 3.9, 3.10, or 3.11\n", - "- The latest development version of NetworkX\n", - "- cuGraph version 23.10 (nightly) \n", + "- NetworkX >= version 3.2\n", + " - _NetworkX 3.0 supports dispatching and is compatible with `nx-cugraph`, but this notebook will demostrate features added in 3.2_\n", + " - At the time of this writing, NetworkX 3.2 is only available from source and can be installed by following the [development version install instructions](https://github.com/networkx/networkx/blob/main/INSTALL.rst#install-the-development-version).\n", + "- Pandas\n", "\n", - "Follow the instructions from `INSTALL.rst` on the NetworkX repository for instructions on installing the development version of `networkx`, which is `networkx-3.2rc0-dev0`." + "More details about system requirements can be found in the [RAPIDS System Requirements documentation](https://docs.rapids.ai/install#system-req)." ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Installation" + ] + }, + { + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "!pip show networkx" + "Assuming NetworkX >= 3.2 has been installed using the [development version install instructions](https://github.com/networkx/networkx/blob/main/INSTALL.rst#install-the-development-version), `nx-cugraph` can be installed using either `conda` or `pip`. \n", + "\n", + "#### conda\n", + "```\n", + "conda install -c rapidsai-nightly nx-cugraph\n", + "```\n", + "#### pip\n", + "```\n", + "python -m pip install nx-cugraph-cu11 --extra-index-url https://rapidsdev:WeArePyPI@pypi.k8s.rapids.ai/simple\n", + "```\n", + "\n", + "If you installed any of the packages described here since running this notebook, you may need to restart the kernel to have them visible to this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Notebook Helper Functions\n", + "\n", + "A few helper functions will be defined here that will be used in order to help keep this notebook easy to read." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ - "!pip show cugraph" + "import sys\n", + "def reimport_networkx():\n", + " \"\"\"\n", + " Re-imports networkx for demonstrating different backend configuration\n", + " options applied at import-time. This is only needed for demonstration\n", + " purposes since other mechanisms are available for runtime configuration.\n", + " \"\"\"\n", + " # Using importlib.reload(networkx) has several caveats (described here:\n", + " # https://docs.python.org/3/library/imp.html?highlight=reload#imp.reload)\n", + " # which result in backend configuration not being re-applied correctly.\n", + " # Instead, manually remove all modules and re-import\n", + " nx_mods = [m for m in sys.modules.keys()\n", + " if (m.startswith(\"networkx\") or m.startswith(\"nx_cugraph\"))]\n", + " for m in nx_mods:\n", + " sys.modules.pop(m)\n", + " import networkx\n", + " return networkx\n", + "\n", + "\n", + "from pathlib import Path\n", + "import requests\n", + "import gzip\n", + "import pandas as pd\n", + "def create_cit_patents_graph(verbose=True):\n", + " \"\"\"\n", + " Downloads the cit-Patents dataset (if not previously downloaded), reads\n", + " it, and creates a nx.DiGraph from it and returns it.\n", + " cit-Patents is described here:\n", + " https://snap.stanford.edu/data/cit-Patents.html\n", + " \"\"\"\n", + " url = \"https://snap.stanford.edu/data/cit-Patents.txt.gz\"\n", + " gz_file_name = Path(url.split(\"/\")[-1])\n", + " csv_file_name = Path(gz_file_name.stem)\n", + " if csv_file_name.exists():\n", + " if verbose: print(f\"{csv_file_name} already exists, not downloading.\")\n", + " else:\n", + " if verbose: print(f\"downloading {url}...\", end=\"\", flush=True)\n", + " req = requests.get(url)\n", + " open(gz_file_name, \"wb\").write(req.content)\n", + " if verbose: print(\"done\")\n", + " if verbose: print(f\"unzipping {gz_file_name}...\", end=\"\", flush=True)\n", + " with gzip.open(gz_file_name, \"rb\") as gz_in:\n", + " with open(csv_file_name, \"wb\") as txt_out:\n", + " txt_out.write(gz_in.read())\n", + " if verbose: print(\"done\")\n", + "\n", + " if verbose: print(\"reading csv to dataframe...\", end=\"\", flush=True)\n", + " pandas_edgelist = pd.read_csv(\n", + " csv_file_name.name,\n", + " skiprows=4,\n", + " delimiter=\"\\t\",\n", + " names=[\"src\", \"dst\"],\n", + " dtype={\"src\":\"int32\", \"dst\":\"int32\"},\n", + " )\n", + " if verbose: print(\"done\")\n", + " if verbose: print(\"creating NX graph from dataframe...\", end=\"\", flush=True)\n", + " G = nx.from_pandas_edgelist(\n", + " pandas_edgelist, source=\"src\", target=\"dst\", create_using=nx.DiGraph\n", + " )\n", + " if verbose: print(\"done\")\n", + " return G" ] }, { @@ -47,60 +142,97 @@ "tags": [] }, "source": [ - "## Installing the latest development version of networkx" + "## Running `betweeness_centrality`\n", + "Let's start by running `betweenness_centrality` on the Karate Club graph using the default NetworkX implementation." ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "The following cells will\n", - "- uninstall the current version of networkx \n", - "- clone the networkx repository one directory below the current working directory\n", - "- upgrade networkx to be the current development version with `pip install`\n", + "### Zachary's Karate Club\n", "\n", - "You may need to restart the kernel to have the development version of networkX be visible." + "Zachary's Karate Club is a small dataset consisting of 34 nodes and 78 edges which represent the friendships between members of a karate club. This dataset is small enough to make comparing results between NetworkX and `nx-cugraph` easy." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ - "!pip uninstall --yes networkx" + "import networkx as nx\n", + "karate_club_graph = nx.karate_club_graph()" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "metadata": { + "tags": [] + }, "source": [ - "# Skip if current networkx repository already exists\n", - "!(cd ..; \\\n", - "git clone https://github.com/networkx/networkx.git)" + "Having `networkx` compute the `betweenness_centrality` values for each node on this graph is quick and easy." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.48 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], "source": [ - "!(cd ../networkx; \\\n", - "pip install --upgrade --target=/opt/conda/envs/rapids/lib/python3.10/site-packages -e .[default])\n", - "# below: old pip install command,\n", - "# pip install --target=/opt/conda/envs/rapids/lib/python3.10/site-packages -e .[default])" + "%%timeit global karate_nx_bc_results\n", + "karate_nx_bc_results = nx.betweenness_centrality(karate_club_graph)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### Automatic GPU acceleration\n", + "When `nx-cugraph` is installed, `networkx` will detect it on import and make it available as a backend for APIs supported by that backend. However, `networkx` does not assume the user always wants to use a particular backend, and instead looks at various configuration mechanisms in place for users to specify how `networkx` should use installed backends. Since `networkx` was not configured to use a backend for the above `betweenness_centrality` call, it used the default implementation provided by NetworkX.\n", + "\n", + "The first configuration mechanism to be demonstrated below is the `NETWORKX_AUTOMATIC_BACKENDS` environment variable. This environment variable directs `networkx` to use the backend specified everywhere it's supported, and does not require the user to modify any of their existing NetworkX code.\n", + "\n", + "To use it, a user sets `NETWORKX_AUTOMATIC_BACKENDS` in their shell to the backend they'd like to use. If a user has more than one backend installed, the environment variable can also accept a comma-separated list of backends, ordered by priority in which `networkx` should use them, where the first backend that supports a particular API call will be used. For example:\n", + "```\n", + "bash> export NETWORKX_AUTOMATIC_BACKENDS=cugraph\n", + "bash> python my_nx_app.py # uses nx-cugraph wherever possible, then falls back to default implementation where it's not.\n", + "```\n", + "or in the case of multiple backends installed\n", + "```\n", + "bash> export NETWORKX_AUTOMATIC_BACKENDS=cugraph,graphblas\n", + "bash> python my_nx_app.py # uses nx-cugraph if possible, then nx-graphblas if possible, then default implementation.\n", + "```\n", + "\n", + "`networkx` looks at the environment variable and the installed backends at import time, and will not re-examine the environment after that. Because `networkx` was already imported in this notebook, the `reimport_nx()` utility will be called after the `os.environ` disctionary is updated to simulate an environment variable being set in the shell.\n", + "\n", + "**Please note, this is only needed for demonstration purposes to compare runs both with and without fully-automatic backend use enabled.**" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "!pip show networkx" + "import os\n", + "os.environ[\"NETWORKX_AUTOMATIC_BACKENDS\"] = \"cugraph\"\n", + "nx = reimport_networkx()\n", + "# reimporting nx requires reinstantiating Graphs since python consideres\n", + "# types from the prior nx import != types from the reimported nx\n", + "karate_club_graph = nx.karate_club_graph()" ] }, { @@ -109,198 +241,371 @@ "tags": [] }, "source": [ - "## Using networkx" + "Once the environment is updated, re-running the same `betweenness_centrality` call on the same graph requires no code changes." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "47.6 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], "source": [ - "import networkx as nx\n", - "import pandas as pd\n", - "# from cugraph.datasets import hollywood" + "%%timeit global karate_cg_bc_results\n", + "karate_cg_bc_results = nx.betweenness_centrality(karate_club_graph)" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "metadata": { + "tags": [] + }, "source": [ - "nx.__version__, pd.__version__" + "We may see that the same computation actually took *longer* using `nx-cugraph`. This is not too surprising given how small the graph is, since there's a small amount of overhead to copy data to and from the GPU which becomes more obvious on very small graphs. We'll see with a larger graph how this overhead becomes negligable." ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "The 'email-Eu-core' dataset, made up of anonymized email data from a large European institution, is a directed graph with 25571 edges and 1005 nodes. The following block reads this dataset into a networkx.Graph object. We then measure the performance of the call to betweenness centrality with an approximation of k=50 node samples, about 5% of the nodes." + "### Results Comparison\n", + "\n", + "Let's examine the results of each run to see how they compare. \n", + "The `betweenness_centrality` results are a dictionary mapping vertex IDs to betweenness_centrality scores. The score itself is usually not as important as the relative rank of each vertex ID (eg. vertex A is ranked higher than vertex B in both sets of results)." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "NX: (0, 0.43763528138528146), CG: (0, 0.4376353323459625)\n", + "NX: (33, 0.30407497594997596), CG: (33, 0.3040749728679657)\n", + "NX: (32, 0.145247113997114), CG: (32, 0.14524714648723602)\n", + "NX: (2, 0.14365680615680618), CG: (2, 0.14365680515766144)\n", + "NX: (31, 0.13827561327561325), CG: (31, 0.138275608420372)\n", + "NX: (8, 0.05592682780182781), CG: (8, 0.055926837027072906)\n", + "NX: (1, 0.053936688311688304), CG: (1, 0.05393669009208679)\n", + "NX: (13, 0.04586339586339586), CG: (13, 0.04586339741945267)\n", + "NX: (19, 0.03247504810004811), CG: (19, 0.032475050538778305)\n", + "NX: (5, 0.02998737373737374), CG: (5, 0.02998737432062626)\n", + "NX: (6, 0.029987373737373736), CG: (6, 0.02998737432062626)\n", + "NX: (27, 0.02233345358345358), CG: (27, 0.022333452478051186)\n", + "NX: (23, 0.017613636363636363), CG: (23, 0.017613636329770088)\n", + "NX: (30, 0.014411976911976909), CG: (30, 0.014411977492272854)\n", + "NX: (3, 0.011909271284271283), CG: (3, 0.011909271590411663)\n", + "NX: (25, 0.0038404882154882154), CG: (25, 0.0038404883816838264)\n", + "NX: (29, 0.0029220779220779218), CG: (29, 0.0029220778960734606)\n", + "NX: (24, 0.0022095959595959595), CG: (24, 0.0022095961030572653)\n", + "NX: (28, 0.0017947330447330447), CG: (28, 0.0017947331070899963)\n", + "NX: (9, 0.0008477633477633478), CG: (9, 0.0008477633818984032)\n", + "NX: (4, 0.0006313131313131313), CG: (4, 0.0006313131307251751)\n", + "NX: (10, 0.0006313131313131313), CG: (10, 0.0006313131307251751)\n", + "NX: (7, 0.0), CG: (7, 0.0)\n", + "NX: (11, 0.0), CG: (11, 0.0)\n", + "NX: (12, 0.0), CG: (12, 0.0)\n", + "NX: (14, 0.0), CG: (14, 0.0)\n", + "NX: (15, 0.0), CG: (15, 0.0)\n", + "NX: (16, 0.0), CG: (16, 0.0)\n", + "NX: (17, 0.0), CG: (17, 0.0)\n", + "NX: (18, 0.0), CG: (18, 0.0)\n", + "NX: (20, 0.0), CG: (20, 0.0)\n", + "NX: (21, 0.0), CG: (21, 0.0)\n", + "NX: (22, 0.0), CG: (22, 0.0)\n", + "NX: (26, 0.0), CG: (26, 0.0)\n" + ] + } + ], + "source": [ + "# The lists contain tuples of (vertex ID, betweenness_centrality score),\n", + "# sorted based on the score.\n", + "nx_sorted = sorted(karate_nx_bc_results.items(), key=lambda t:t[1], reverse=True)\n", + "cg_sorted = sorted(karate_cg_bc_results.items(), key=lambda t:t[1], reverse=True)\n", + "\n", + "for i in range(len(nx_sorted)):\n", + " print(f\"NX: {nx_sorted[i]}, CG: {cg_sorted[i]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, "source": [ - "df = pd.read_csv(\"datasets/email-Eu-core.csv\", header=None, names=['src', 'dst', 'wgt'], sep=' ')\n", - "Gnx = nx.from_pandas_edgelist(df, source='src', target='dst', create_using=nx.Graph)" + "Here we can see that the results match exactly as expected. \n", + "\n", + "For larger graphs, results are harder to compare given that `betweenness_centrality` is an approximation algorithm influenced by the random selection of paths used to compute the betweenness_centrality score of each vertex. The argument `k` is used for limiting the number of paths used in the computation, since using every path for every vertex would be prohibitively expensive for large graphs. For small graphs, `k` need not be specified, which allows `betweenness_centrality` to use all paths for all vertices and makes for an easier comparison." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### `betweenness_centrality` on larger graphs - The U.S. Patent Citation Network\n", + "\n", + "The U.S. Patent Citation Network dataset is much larger with over 3.7M nodes and over 16.5M edges, and demonstrates how `nx-cugraph` enables NetworkX to run `betweenness_centrality` on graphs this large (and larger) in seconds instead of minutes.\n", + "\n", + "#### NetworkX default implementation" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, - "outputs": [], - "source": [ - "# Use instead of above cell if benchmark datasets are added\n", - "Gnx = hollywood.get_graph(create_using=nx.Graph)" + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "downloading https://snap.stanford.edu/data/cit-Patents.txt.gz...done\n", + "unzipping cit-Patents.txt.gz...done\n", + "reading csv to dataframe...done\n", + "creating NX graph from dataframe...done\n" + ] + } + ], + "source": [ + "import os\n", + "# Unset NETWORKX_AUTOMATIC_BACKENDS so the default NetworkX implementation is used\n", + "os.environ.pop(\"NETWORKX_AUTOMATIC_BACKENDS\", None)\n", + "nx = reimport_networkx()\n", + "# Create the cit-Patents graph - this will also download the dataset if not previously downloaded\n", + "cit_patents_graph = create_cit_patents_graph()" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "%%timeit\n", - "bc = nx.betweenness_centrality(Gnx, k=50)" + "# Since this is a large graph, a k value must be set so the computation returns in a reasonable time\n", + "k = 40" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "Running betweenness_centrality using native networkX on the email-Eu-core dataset took around XXX. You, as the user, can choose the backend as an optional argument, providing you've installing your specific library, as seen below. A dispatchable networkX algorithm will use the chosen dispatcher to convert networkx Graph to the backend equivalent, dispatch the work, then convert back to the required nx form." + "Because this run will take time, `%%timeit` is restricted to a single pass.\n", + "\n", + "*NOTE: this run may take approximately 1 minute*" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1min 5s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], "source": [ - "import nx_cugraph as nxcg" + "%%timeit -n 1 -r 1\n", + "results = nx.betweenness_centrality(cit_patents_graph, k=k)" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "metadata": { + "tags": [] + }, "source": [ - "nx.__version__, pd.__version__, nxcg.__version__" + "Something to note is that `%%timeit` disables garbage collection by default, which may not be something a user is able to do. To see a more realistic real-world run time, `gc` can be enabled.\n", + "\n", + "*NOTE: this run may take approximately 7 minutes!*" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "%%timeit\n", - "bc = nx.betweenness_centrality(Gnx, k=50, backend='cugraph')" + "import gc" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 11, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6min 49s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], "source": [ - "## Customization" + "%%timeit -r 1 gc.enable()\n", + "nx.betweenness_centrality(cit_patents_graph, k=k)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "You can also use certain environment variable to control your backend choice, fallback settings, and testing options\n", - "- NETWORKX_TEST_BACKEND - (was previously NETWORKX_GRAPH_CONVERT) \n", - "- NETWORKX_FALLBACK_TO_NX - if True, then networkx will be used if an algorithm isn't supported by the chosen backend\n", - "- NETWORKX_AUTOMATIC_BACKENDS - Enables automatic conversions to backends" + "#### `nx-cugraph`\n", + "\n", + "Running on a GPU using `nx-cugraph` can result in a tremendous speedup, especially when graph sizes get larger than a few thousand nodes or `k` values become larger to increase accuracy.\n", + "\n", + "Rather than setting the `NETWORKX_AUTOMATIC_BACKENDS` environment variable and re-importing again, this example will demonstrate the `backend=` keyword argument to explicitly direct the NetworkX dispatcher to use the `cugraph` backend." ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 12, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], "source": [ - "This approach is easy to use, since you change only the algorithmic calls in your workload without restructuring the rest of your work. You can specify when and where to use the backend this way. While an optional argument is convenient, there might be scenarios where you don't want to edit your script or pipeline. Fortunately, the dispatcher allows for environment variables to configure your backend, fallback, and acceleration preferences." + "%%timeit -r 1 gc.enable()\n", + "nx.betweenness_centrality(cit_patents_graph, k=k, backend=\"cugraph\")" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 13, "metadata": {}, + "outputs": [], "source": [ - "### Utilities" + "k = 150" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 14, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "11.6 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], "source": [ - "The `nx-cugraph` package allows conversion from networkx Graph objects to cugraph-compatible Graph objects. They are distinct from cugraph Graph objects and are more limited, but as the backend grows, so will its capabilities." + "%%timeit -r 1 gc.enable()\n", + "nx.betweenness_centrality(cit_patents_graph, k=k, backend=\"cugraph\")" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "metadata": { + "tags": [] + }, "source": [ - "%%timeit\n", - "Gcg = nxcg.from_networkx(Gnx)" + "For the same graph and the same `k` value, the `\"cugraph\"` backend returns results in seconds instead of minutes. Increasing the `k` value has very little relative impact to runtime due to the high parallel processing ability of the GPU, allowing the user to get improved accuracy for virtually no additional cost." ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "Gcg = nxcg.from_networkx(Gnx)" + "### Type-based dispatching\n", + "\n", + "NetworkX also supports automatically dispatching to backends associated with specific graph types. This requires the user to write code for a specific backend, and therefore requires the backend to be installed, but has the advantage of ensuring a particular behavior without the potential for runtime conversions.\n", + "\n", + "To use type-based dispatching with `nx-cugraph`, the user must import the backend directly in their code to access the utilities provided to create a Graph instance specifically for the `nx-cugraph` backend." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ - "%%timeit\n", - "res = nx.betweenness_centrality(Gcg, k=50)" + "import nx_cugraph as nxcg" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "## What algorithms are supported?" + "The `from_networkx()` API will copy the data from the NetworkX graph instance to the GPU and return a new `nx-cugraph` graph instance. By passing an explicit `nx-cugraph` graph, the NetworkX dispatcher will automatically call the `\"cugraph\"` backend (and only the `\"cugraph\"` backend) without requiring future conversions to copy data to the GPU." ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 16, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7.9 s ± 11.4 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)\n" + ] + } + ], "source": [ - "| Algorithm | Type | Implemented | Notes\n", - "| ----------------------------|---------------|-------------| ---------\n", - "| Betweeneess Centrality | Centrality | X | cuGraph 23.10 nightly \n", - "| Edge Betweenness Centrality | Centrality | X | cuGraph 23.10 nightly\n", - "| Louvain | Community | X | cuGraph 23.10 nightly" + "%%timeit -r 2 global nxcg_cit_patents_graph\n", + "nxcg_cit_patents_graph = nxcg.from_networkx(cit_patents_graph)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 17, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3.32 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], "source": [ - "## Future plans" + "%%timeit -r 1 gc.enable()\n", + "nx.betweenness_centrality(nxcg_cit_patents_graph, k=k)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We want to get your feedback! As this project develops, we want you to voice your opinions on which algorithms we should prioritize." + "## Conclusion\n", + "\n", + "This notebook demostarated `nx-cugraph`'s support for `betweenness_centrality`. At the time of this writing, `nx-cugraph` also provides support for `edge_netweenness_centrality` and `louvain_communities`. Other algorithms are scheduled to be supported based on their availability in the cuGraph [pylibcugraph](https://github.com/rapidsai/cugraph/tree/branch-23.10/python/pylibcugraph/pylibcugraph) package and demand by the NetworkX community.\n", + "\n", + "#### Benchmark Results\n", + "The results included in this notebook were generated on a workstation with a Quatro RTX 8000 with 50GB, and a \"Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz\" with 45 GB RAM.\n", + "\n" ] } ], @@ -320,7 +625,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.12" } }, "nbformat": 4,