-
Notifications
You must be signed in to change notification settings - Fork 309
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
created a combined similarity notebook
- Loading branch information
Showing
1 changed file
with
197 additions
and
0 deletions.
There are no files selected for viewing
197 changes: 197 additions & 0 deletions
197
notebooks/algorithms/link_prediction/similarity_combined.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Similarity Compared\n", | ||
"----\n", | ||
"\n", | ||
"In this notebook we will execute all the link prediction algorithms available in cuGraph and compare the results. These algorithms by default look at each possible pair of vertices in the graph and compare them based on the number of neigbors they share in common normalized by differing calculations of their individual neigborhoods :\n", | ||
"\n", | ||
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n", | ||
"| --------------|------------|------------------|-----------------|-----------------------|\n", | ||
"| Don Acosta | 09/21/2023 | created | 23.10 nightly | AMPERE A6000 CUDA 11.7|\n", | ||
"\n", | ||
"\n", | ||
"**Note: On large graphs these algorithms can take prohibitive time or memory. The notebook will show how to run on defined pairs instead.**" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Load the required dependencies." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import cugraph\n", | ||
"from cugraph.datasets import dining_prefs\n", | ||
"# only needed to display results in a table \n", | ||
"from IPython.display import display_html " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Function that calls all the cuGraph similarity/link prediction algorithms " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def compute_similarity(G,pairs=None):\n", | ||
" _jdf = cugraph.jaccard(G,pairs)\n", | ||
" _odf = cugraph.overlap(G,pairs)\n", | ||
" _sdf = cugraph.sorensen_coefficient(G,pairs)\n", | ||
" return _jdf, _odf, _sdf" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Function to put all the results in a convenient table" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Print function\n", | ||
"def print_similarity(jdf,odf,sdf,num_records=5):\n", | ||
"\n", | ||
" js_top = jdf.sort_values(by='jaccard_coeff', ascending=False).head(num_records).to_pandas()\n", | ||
" os_top = odf.sort_values(by='overlap_coeff', ascending=False).head(num_records).to_pandas()\n", | ||
" ss_top = sdf.sort_values(by='sorensen_coeff', ascending=False).head(num_records).to_pandas()\n", | ||
" \n", | ||
" df1_styler = js_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Jaccard').hide(axis='index')\n", | ||
" df2_styler = os_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Overlap').hide(axis='index')\n", | ||
" df3_styler = ss_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Sørensen').hide(axis='index')\n", | ||
"\n", | ||
" display_html(df1_styler._repr_html_()+df2_styler._repr_html_()+df3_styler._repr_html_(), raw=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import cugraph\n", | ||
"from cugraph.datasets import dining_prefs\n", | ||
"gdf = dining_prefs.get_edgelist()\n", | ||
"G = cugraph.Graph()\n", | ||
"G = cugraph.from_cudf_edgelist(gdf, source='src', destination='dst', edge_attr = 'wgt')\n", | ||
"jdf = cugraph.jaccard(G)\n", | ||
"odf = cugraph.overlap(G)\n", | ||
"sdf = cugraph.sorensen_coefficient(G)\n", | ||
"print(jdf.head())\n", | ||
"print(odf.head())\n", | ||
"print(sdf.head())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Create the graph from the Dining preferences data set." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"G = dining_prefs.get_graph()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Run the three similarity Algorithms and print out the five links with the highest scores." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"jdf, odf, sdf = compute_similarity(G)\n", | ||
"print_similarity(jdf,odf,sdf)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now find the the complete set of two-hop neigbors and compare them instead of just using the existion one-hop edges. In a larger graph, this will run considerably faster since the default " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# this cugraph algorithm pulls a set containing every pair of vertices\n", | ||
"# that are within 2-hops of each other\n", | ||
"two_hops_pairs = G.get_two_hop_neighbors()\n", | ||
"\n", | ||
"jdf_hops, odf_hops, sdf_hops = compute_similarity(G,pairs=two_hops_pairs)\n", | ||
"print_similarity(jdf_hops,odf_hops,sdf_hops)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"### It's that easy with cuGraph\n", | ||
"\n", | ||
"Copyright (c) 2023, NVIDIA CORPORATION.\n", | ||
"\n", | ||
"Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n", | ||
"\n", | ||
"Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n", | ||
"___" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "cugraph_0802", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
}, | ||
"orig_nbformat": 4 | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |