-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] from_cudf_edgelist
should have a num_nodes
argument
#1206
Comments
Hi Francesco, Thanks for your message. The good news is that you can bypass this limitation and keep singletons if you set renumbering to >>> df = cudf.DataFrame([[0, 9, 1], [9, 0, 1]], columns=['source', 'destination', 'weight'])
>>> G.from_cudf_edgelist(df, edge_attr='weight', renumber=False)
>>> G.number_of_nodes()
10 In this case vertices 1 to 8 would be isolated vertices If you need both renumbering and resizing/growing the graph, the fastest way is probably to try adding it, the cugraph team will be happy to review your PR and assist. We'll also keep the issue on the radar and prioritize for future releases. Hope that helps, and thanks again. |
Ps. disconnected graphs support is not well tested in cuGraph at the moment. It is quite possible that some analytics still fail when such inputs are passed. This is something we have been tracking in #305 and plan on continuing to improve in the next release. |
Thank you for your quick reply! Renumbering a vertex with the largest index is indeed a hack, but still needs some user action to manage it. Anyway, thanks for your suggestion! |
@flandolfi thanks for the great question. We are just getting ready to start improving a lot of basic function around graph creation. You are correct that an isolated node will not appear in an edge list and therefore not be created. |
@flandolfi The issue that I'm running into with specifying the number of nodes that should be in a graph when using Now if your data is integers, just not contiguous, then just setting Renumbering to False will fill in all missed values with isolated nodes. Those nodes just need to be within the range and not at the end. The number of nodes created in the graph is equal to the max node ID + 1. Renumbering simply packs values to be contiguous and to be integers. |
Many Graph/Network file formats (e.g., Graph Markup Language) first define all the nodes by their ID, attributes, labels, etc., then specify all the edges of the graph only by the IDs of their end-nodes. This avoids the repetition of known information (similarly to the Bakus-Naur form in databases). So, in your example, I would find all the IPs in the "header", along with their unique node-IDs, then the list of edges (that can be empty). Again, renumbering is a trick and it works in most cases. I just believe there should be a way to specify the nodes in advance. Notice that this does not mean that you have to change the signature of import cudf
import cugraph as cx
ips = [
'192.168.1.1',
'192.168.1.2',
...
'192.168.1.254'
]
... # Load the edge list to `df`
G = cx.Graph()
G.add_nodes_from(ips)
G.from_cudf_edgelist(df, renumber=False)
print(G.number_of_nodes()) # Prints `len(ips)` This approach will separate the two tasks (adding nodes and adding edges), thus allowing an incremental definition of the graphs (also, one could "update" the current graph by multiple calls of |
issue is being addressed with enhancement issue #1372 |
Hi,
I noticed that
from_cudf_edgelist
does not allow the user to specify the number of nodes in the input graph. This could lead to problems while converting empty graphs or graphs with isolated nodes (with identifier greater than the maximum value in theDataFrame
). Renumbering is of no help.How can we specify such value?
I tried changing the value of
G.node_count
or passing a list of identifiers toG.add_nodes_from()
(both before and afterfrom_cudf_edgelist
), but executinglouvain
/leiden
/ecg
produces an error. For example:The only workaround at the moment is to add self-loops to every (missing) node.
Kind regards,
Francesco
The text was updated successfully, but these errors were encountered: