Skip to content

Commit

Permalink
created a combined similarity notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
acostadon committed Sep 21, 2023
1 parent d930321 commit 69b7ae2
Showing 1 changed file with 197 additions and 0 deletions.
197 changes: 197 additions & 0 deletions notebooks/algorithms/link_prediction/similarity_combined.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Similarity Compared\n",
"----\n",
"\n",
"In this notebook we will execute all the link prediction algorithms available in cuGraph and compare the results. These algorithms by default look at each possible pair of vertices in the graph and compare them based on the number of neigbors they share in common normalized by differing calculations of their individual neigborhoods :\n",
"\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------|------------|------------------|-----------------|-----------------------|\n",
"| Don Acosta | 09/21/2023 | created | 23.10 nightly | AMPERE A6000 CUDA 11.7|\n",
"\n",
"\n",
"**Note: On large graphs these algorithms can take prohibitive time or memory. The notebook will show how to run on defined pairs instead.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the required dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cugraph\n",
"from cugraph.datasets import dining_prefs\n",
"# only needed to display results in a table \n",
"from IPython.display import display_html "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function that calls all the cuGraph similarity/link prediction algorithms "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def compute_similarity(G,pairs=None):\n",
" _jdf = cugraph.jaccard(G,pairs)\n",
" _odf = cugraph.overlap(G,pairs)\n",
" _sdf = cugraph.sorensen_coefficient(G,pairs)\n",
" return _jdf, _odf, _sdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function to put all the results in a convenient table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print function\n",
"def print_similarity(jdf,odf,sdf,num_records=5):\n",
"\n",
" js_top = jdf.sort_values(by='jaccard_coeff', ascending=False).head(num_records).to_pandas()\n",
" os_top = odf.sort_values(by='overlap_coeff', ascending=False).head(num_records).to_pandas()\n",
" ss_top = sdf.sort_values(by='sorensen_coeff', ascending=False).head(num_records).to_pandas()\n",
" \n",
" df1_styler = js_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Jaccard').hide(axis='index')\n",
" df2_styler = os_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Overlap').hide(axis='index')\n",
" df3_styler = ss_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Sørensen').hide(axis='index')\n",
"\n",
" display_html(df1_styler._repr_html_()+df2_styler._repr_html_()+df3_styler._repr_html_(), raw=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cugraph\n",
"from cugraph.datasets import dining_prefs\n",
"gdf = dining_prefs.get_edgelist()\n",
"G = cugraph.Graph()\n",
"G = cugraph.from_cudf_edgelist(gdf, source='src', destination='dst', edge_attr = 'wgt')\n",
"jdf = cugraph.jaccard(G)\n",
"odf = cugraph.overlap(G)\n",
"sdf = cugraph.sorensen_coefficient(G)\n",
"print(jdf.head())\n",
"print(odf.head())\n",
"print(sdf.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the graph from the Dining preferences data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"G = dining_prefs.get_graph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the three similarity Algorithms and print out the five links with the highest scores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"jdf, odf, sdf = compute_similarity(G)\n",
"print_similarity(jdf,odf,sdf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now find the the complete set of two-hop neigbors and compare them instead of just using the existion one-hop edges. In a larger graph, this will run considerably faster since the default "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this cugraph algorithm pulls a set containing every pair of vertices\n",
"# that are within 2-hops of each other\n",
"two_hops_pairs = G.get_two_hop_neighbors()\n",
"\n",
"jdf_hops, odf_hops, sdf_hops = compute_similarity(G,pairs=two_hops_pairs)\n",
"print_similarity(jdf_hops,odf_hops,sdf_hops)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"### It's that easy with cuGraph\n",
"\n",
"Copyright (c) 2023, NVIDIA CORPORATION.\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
"Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n",
"___"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "cugraph_0802",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 69b7ae2

Please sign in to comment.