Skip to content

Commit

Permalink
descriptions added per review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
acostadon committed Nov 20, 2024
1 parent 18ea124 commit b814c44
Showing 1 changed file with 74 additions and 18 deletions.
92 changes: 74 additions & 18 deletions notebooks/demo/centrality_patentsview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,21 @@
},
"source": [
"# Downloading the data\n",
"\n",
"Citation: U.S. Patent and Trademark Office. “Data Download Tables.” PatentsView. Accessed [10/06/2024]. https://patentsview.org/download/data-download-tables.\n",
"\n",
" Both files are used under the Creative Commons license https://creativecommons.org/licenses/by/4.0/\n",
"\n",
"\n",
"The first file, g_patent.tsv.zip, contains summary data for each patent such as id, title and the location of the original patent document. The table description is available on the [PatentsView site](https://patentsview.org/download/data-download-dictionary).\n",
"\n",
"The second file, g_us_patent_citation.tsv.zip, contains a record for every citation between USPatents. The description of this table is also available on the [PatentsView site](https://patentsview.org/download/data-download-dictionary)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Removing the comment character \"#\" and running the below lines will download and expand the data into the directory the notebook expects it to be in."
]
},
Expand All @@ -29,24 +44,18 @@
"outputs": [],
"source": [
"#!wget https://s3.amazonaws.com/data.patentsview.org/download/g_patent.tsv.zip\n",
"#!unzip ./_patent.tsv.zip\n",
"#!unzip ./g_patent.tsv.zip\n",
"#!wget https://s3.amazonaws.com/data.patentsview.org/download/g_us_patent_citation.tsv.zip\n",
"#!unzip ./g_us_patent_citation.tsv.zip"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will create the dataframes using cudf and create the graphs with cuGraph."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We will create the dataframes using cudf and create the graphs with cuGraph\n",
"import cudf\n",
"import cugraph"
]
Expand Down Expand Up @@ -273,6 +282,13 @@
"first_hop_df, first_set = next_hop(seed_series)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show how many patents cite or are cited by the starting one(s)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -282,6 +298,13 @@
"len(first_hop_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case we will just use the second hop edge/patent list but for demonstation purposes. However the next_hop function can go out as many hops as necessary to build a relevant graph when desired for different data sets. Here is how in this case we could go out four levels of separation."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -290,7 +313,7 @@
"source": [
"second_hop_df, second_hop_seeds = next_hop(first_set)\n",
"third_hop_df, third_hop_seeds = next_hop(second_hop_seeds)\n",
"fourth_hop_df, fourth_hop_seeds = next_hop(third_hop_seeds)\n"
"fourth_hop_df, fourth_hop_seeds = next_hop(third_hop_seeds)"
]
},
{
Expand Down Expand Up @@ -329,7 +352,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The contents of the dataframe at 2 hops"
"The contents of the dataframe we will use which contains 2 hops."
]
},
{
Expand All @@ -341,6 +364,13 @@
"second_hop_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will build a directed Graph in cuGraph from the second hop dataframe created above"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -351,6 +381,13 @@
"G = cugraph.from_cudf_edgelist(second_hop_df,create_using=cugraph.Graph(directed=True),source='source', destination='target')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use the compute_centrality function above to calculate and note the execution time"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -361,6 +398,13 @@
"dc, bc, kc, pr, ev = compute_centrality(G)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We import the formatting package and print out the top 10 patents for each centrality measure"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -376,7 +420,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Calls the function that draws the graph with the specified number of the most central nodes labeled"
"Now call the function that draws the graph with the specified number of the most central nodes labeled.\n",
"The final parameter, pr in this case, for PageRank sends in the particular algorithm results to graph."
]
},
{
Expand All @@ -388,6 +433,24 @@
"draw_centrality_graph(second_hop_df,12, pr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets run edge betweenness centrality to find the central edges in the graph."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"G_2_hops = cugraph.from_cudf_edgelist(second_hop_df,create_using=cugraph.Graph(directed=True),source='source', destination='target')\n",
"results=cugraph.edge_betweenness_centrality(G_2_hops).sort_values(ascending=False,by=['betweenness_centrality'])\n",
"results.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -411,13 +474,6 @@
"len(title_df)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets run edge betweenness centrality to find the central edges in the graph."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down

0 comments on commit b814c44

Please sign in to comment.