Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update C API graph creation function signatures #3982

Merged
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,8 @@ set(CUGRAPH_SOURCES
src/community/detail/mis_mg.cu
src/detail/utility_wrappers.cu
src/structure/graph_view_mg.cu
src/structure/remove_self_loops.cu
src/structure/sort_and_remove_multi_edges.cu
src/utilities/path_retrieval.cu
src/structure/legacy/graph.cu
src/linear_assignment/legacy/hungarian.cu
Expand Down
65 changes: 65 additions & 0 deletions cpp/include/cugraph/graph_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -973,4 +973,69 @@ renumber_sampled_edgelist(
label_offsets,
bool do_expensive_check = false);

/**
* @brief Remove self loops from an edge list
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weight. Currently float and double are supported.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
*
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param edgelist_srcs List of source vertex ids
* @param edgelist_dsts List of destination vertex ids
* @param edgelist_weights Optional list of edge weights
* @param edgelist_edge_ids Optional list of edge ids
* @param edgelist_edge_types Optional list of edge types
* @return Tuple of vectors storing edge sources, destinations, optional weights,
* optional edge ids, optional edge types.
*/
template <typename vertex_t, typename edge_t, typename weight_t, typename edge_type_t>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>>
remove_self_loops(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_srcs,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);

/**
* @brief Sort the edges and remove all but one edge when a multi-edge exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is no longer correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

*
* In an MG context it is assumed that edges have been shuffled to the proper GPU,
* in which case any multi-edges will be on the same GPU.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weight. Currently float and double are supported.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
*
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param edgelist_srcs List of source vertex ids
* @param edgelist_dsts List of destination vertex ids
* @param edgelist_weights Optional list of edge weights
* @param edgelist_edge_ids Optional list of edge ids
* @param edgelist_edge_types Optional list of edge types
* @return Tuple of vectors storing edge sources, destinations, optional weights,
* optional edge ids, optional edge types.
*/
template <typename vertex_t, typename edge_t, typename weight_t, typename edge_type_t>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>>
sort_and_remove_multi_edges(raft::handle_t const& handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this but now it seems like "sort" here is a misnomer. We are not really globally sorting anymore, so I think we'd better rename this function to remove_multi_edges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And may better document that this function is not stable (which means this does not preserve the order in contrast to std::remove or thrust::remove).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

rmm::device_uvector<vertex_t>&& edgelist_srcs,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);

} // namespace cugraph
193 changes: 178 additions & 15 deletions cpp/include/cugraph_c/graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ typedef struct {
bool_t is_multigraph;
} cugraph_graph_properties_t;

// FIXME: Add support for specifying isolated vertices
/**
* @brief Construct an SG graph
*
* @deprecated This API will be deleted, use cugraph_graph_create_sg instead
*
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] src Device array containing the source vertex ids.
Expand All @@ -51,11 +52,11 @@ typedef struct {
argument that can be NULL if edge types are not used.
* @param [in] store_transposed If true create the graph initially in transposed format
* @param [in] renumber If true, renumber vertices to make an efficient data structure.
* If false, do not renumber. Renumbering is required if the vertices are not sequential
* integer values from 0 to num_vertices.
* If false, do not renumber. Renumbering enables some significant optimizations within
* the graph primitives library, so it is strongly encouraged. Renumbering is required if
* the vertices are not sequential integer values from 0 to num_vertices.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [in] properties Properties of the graph
* @param [out] graph A pointer to the graph object
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
Expand All @@ -76,9 +77,63 @@ cugraph_error_code_t cugraph_sg_graph_create(
cugraph_graph_t** graph,
cugraph_error_t** error);

/**
* @brief Construct an SG graph
*
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] vertices Optional device array containing a list of vertex ids
* (specify NULL if we should create vertex ids from the
* unique contents of @p src and @p dst)
Comment on lines +85 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does NULL vertices behave with and w/o renumbering? What if edge data only contain unique ids of e.g. 1, 3, and 5? Are 0, 2, and 4 implied if renumber is False?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vertices is NULL, this will provide the same behavior as today. That is:

  • Renumber enabled will ignore vertices 0, 2, 4... they will not exist in the graph. We will create a graph with 3 vertices which will be internally numbered 0, 1, 2. Any function that is called will translate any vertex id inputs to the mapping 0, 1, 2 (e.g. if you call BFS and specify a seed of 1, it will be mapped to whichever id that vertex 1 was mapped to), and any vertex ids outputs will be renumbered from the internal mapping (0, 1, 2) to the external mapping (1, 3, 5)
  • Renumber disabled will use the vertex ids exactly as specified, so vertices 0, 2 and 4 will be isolated vertices in the graph. However, if your graph also had a vertex 6... that vertex would be lost even with renumber disabled, since there is no mechanism to discover that vertex 6 exists in your graph with the current (and proposed API).

* @param [in] src Device array containing the source vertex ids.
* @param [in] dst Device array containing the destination vertex ids
* @param [in] weights Device array containing the edge weights. Note that an unweighted
* graph can be created by passing weights == NULL.
* @param [in] edge_ids Device array containing the edge ids for each edge. Optional
argument that can be NULL if edge ids are not used.
* @param [in] edge_type_ids Device array containing the edge types for each edge. Optional
argument that can be NULL if edge types are not used.
* @param [in] store_transposed If true create the graph initially in transposed format
* @param [in] renumber If true, renumber vertices to make an efficient data structure.
* If false, do not renumber. Renumbering enables some significant optimizations within
* the graph primitives library, so it is strongly encouraged. Renumbering is required if
* the vertices are not sequential integer values from 0 to num_vertices.
* @param [in] drop_self_loops If true, drop any self loops that exist in the provided edge list.
* @param [in] drop_multi_edges If true, drop any multi edges that exist in the provided edge list.
* Note that setting this flag will arbitrarily select one instance of a multi edge to be the
* edge that survives. If the edges have properties that should be honored (e.g. sum the
weights,
* or take the maximum weight), the caller should do that on not rely on this flag.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [out] graph A pointer to the graph object
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
*
* @return error code
*/
cugraph_error_code_t cugraph_graph_create_sg(
const cugraph_resource_handle_t* handle,
const cugraph_graph_properties_t* properties,
const cugraph_type_erased_device_array_view_t* vertices,
const cugraph_type_erased_device_array_view_t* src,
const cugraph_type_erased_device_array_view_t* dst,
const cugraph_type_erased_device_array_view_t* weights,
const cugraph_type_erased_device_array_view_t* edge_ids,
const cugraph_type_erased_device_array_view_t* edge_type_ids,
bool_t store_transposed,
bool_t renumber,
bool_t drop_self_loops,
bool_t drop_multi_edges,
bool_t do_expensive_check,
cugraph_graph_t** graph,
cugraph_error_t** error);

/**
* @brief Construct an SG graph from a CSR input
*
* @deprecated This API will be deleted, use cugraph_graph_create_sg_from_csr instead
*
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] offsets Device array containing the CSR offsets array
Expand All @@ -91,11 +146,11 @@ cugraph_error_code_t cugraph_sg_graph_create(
argument that can be NULL if edge types are not used.
* @param [in] store_transposed If true create the graph initially in transposed format
* @param [in] renumber If true, renumber vertices to make an efficient data structure.
* If false, do not renumber. Renumbering is required if the vertices are not sequential
* integer values from 0 to num_vertices.
* If false, do not renumber. Renumbering enables some significant optimizations within
* the graph primitives library, so it is strongly encouraged. Renumbering is required if
* the vertices are not sequential integer values from 0 to num_vertices.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [in] properties Properties of the graph
* @param [out] graph A pointer to the graph object
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
Expand All @@ -117,18 +172,50 @@ cugraph_error_code_t cugraph_sg_graph_create_from_csr(
cugraph_error_t** error);

/**
* @brief Destroy an SG graph
* @brief Construct an SG graph from a CSR input
*
* @param [in] graph A pointer to the graph object to destroy
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] offsets Device array containing the CSR offsets array
* @param [in] indices Device array containing the destination vertex ids
* @param [in] weights Device array containing the edge weights. Note that an unweighted
* graph can be created by passing weights == NULL.
* @param [in] edge_ids Device array containing the edge ids for each edge. Optional
argument that can be NULL if edge ids are not used.
* @param [in] edge_type_ids Device array containing the edge types for each edge. Optional
argument that can be NULL if edge types are not used.
* @param [in] store_transposed If true create the graph initially in transposed format
* @param [in] renumber If true, renumber vertices to make an efficient data structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does renumber=True work for CSR? Wouldn't offsets already imply 0..N-1 vertex ids?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renumber=True would still work for CSR input. Yes, offsets does imply vertex ids in the range [0, N). But our renumbering logic adds some performance optimizations by putting the vertex ids in an order that makes the CUDA kernels more efficient. So there is value in specifying renumber=True even if the vertex ids are densely packed.

I will be fixing the C API implementation of this function so that it will properly reflect the isolated vertices that can be implied in CSR format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should update the documentation to better reflect that renumbering is more than just packing integers and highly recommended in most practical use cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated documentation for each of the renumber parameters in this file in the next push.

* If false, do not renumber. Renumbering enables some significant optimizations within
* the graph primitives library, so it is strongly encouraged. Renumbering is required if
* the vertices are not sequential integer values from 0 to num_vertices.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [out] graph A pointer to the graph object
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
*
* @return error code
*/
// FIXME: This should probably just be cugraph_graph_free
// but didn't want to confuse with original cugraph_free_graph
void cugraph_sg_graph_free(cugraph_graph_t* graph);
cugraph_error_code_t cugraph_graph_create_sg_from_csr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since cugraph_graph_create_sg supports flags to drop both self loops and multi edges through the methods cugraph::remove_self_loops and cugraph::sort_and_remove_multi_edges, shouldn't cugraph_graph_create_sg_from_csr also support those flags? In fact, cugraph_graph_create_sg_from_csr extracts the COO from the CSR before calling cugraph::create_graph_from_edgelist therefore, we can drop self loops and multi edges (if set to True) right before the graph question.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add that feature. I was assuming that if you already have a CSR as input then you have already done whatever ETL steps are desired.

const cugraph_resource_handle_t* handle,
const cugraph_graph_properties_t* properties,
const cugraph_type_erased_device_array_view_t* offsets,
const cugraph_type_erased_device_array_view_t* indices,
const cugraph_type_erased_device_array_view_t* weights,
const cugraph_type_erased_device_array_view_t* edge_ids,
const cugraph_type_erased_device_array_view_t* edge_type_ids,
bool_t store_transposed,
bool_t renumber,
bool_t do_expensive_check,
cugraph_graph_t** graph,
cugraph_error_t** error);

// FIXME: Add support for specifying isolated vertices
/**
* @brief Construct an MG graph
*
* @deprecated This API will be deleted, use cugraph_graph_create_mg instead
*
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] src Device array containing the source vertex ids
Expand Down Expand Up @@ -165,13 +252,89 @@ cugraph_error_code_t cugraph_mg_graph_create(
cugraph_graph_t** graph,
cugraph_error_t** error);

/**
* @brief Construct an MG graph
*
* @param [in] handle Handle for accessing resources
* @param [in] properties Properties of the constructed graph
* @param [in] vertices List of device arrays containing the unique vertex ids.
* If NULL we will construct this internally using the unique
* entries specified in src and dst
* All entries in this list will be concatenated on this GPU
* into a single array.
ChuckHastings marked this conversation as resolved.
Show resolved Hide resolved
* @param [in] src List of device array containing the source vertex ids
* All entries in this list will be concatenated on this GPU
* into a single array.
* @param [in] dst List of device array containing the destination vertex ids
* All entries in this list will be concatenated on this GPU
* into a single array.
* @param [in] weights List of device array containing the edge weights. Note that an
* unweighted graph can be created by passing weights == NULL. If a weighted graph is to be
* created, the weights device array should be created on each rank, but the pointer can be NULL and
* the size 0 if there are no inputs provided by this rank All entries in this list will be
* concatenated on this GPU into a single array.
* @param [in] edge_ids List of device array containing the edge ids for each edge. Optional
* argument that can be NULL if edge ids are not used.
* All entries in this list will be concatenated on this GPU
* into a single array.
* @param [in] edge_type_ids List of device array containing the edge types for each edge.
* Optional argument that can be NULL if edge types are not used. All entries in this list will be
* concatenated on this GPU into a single array.
* @param [in] store_transposed If true create the graph initially in transposed format
* @param [in] num_arrays The number of arrays specified in @p vertices, @p src, @p dst, @p
* weights, @p edge_ids and @p edge_type_ids
* @param [in] drop_self_loops If true, drop any self loops that exist in the provided edge list.
* @param [in] drop_multi_edges If true, drop any multi edges that exist in the provided edge list.
* Note that setting this flag will arbitrarily select one instance of a multi edge to be the
* edge that survives. If the edges have properties that should be honored (e.g. sum the
* weights, or take the maximum weight), the caller should do that on not rely on this flag.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [out] graph A pointer to the graph object
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_graph_create_mg(
cugraph_resource_handle_t const* handle,
cugraph_graph_properties_t const* properties,
cugraph_type_erased_device_array_view_t const* const* vertices,
cugraph_type_erased_device_array_view_t const* const* src,
cugraph_type_erased_device_array_view_t const* const* dst,
cugraph_type_erased_device_array_view_t const* const* weights,
cugraph_type_erased_device_array_view_t const* const* edge_ids,
cugraph_type_erased_device_array_view_t const* const* edge_type_ids,
bool_t store_transposed,
size_t num_arrays,
bool_t drop_self_loops,
bool_t drop_multi_edges,
bool_t do_expensive_check,
cugraph_graph_t** graph,
cugraph_error_t** error);

/**
* @brief Destroy an graph
*
* @param [in] graph A pointer to the graph object to destroy
*/
void cugraph_graph_free(cugraph_graph_t* graph);

/**
* @brief Destroy an SG graph
*
* @deprecated This API will be deleted, use cugraph_graph_free instead
*
* @param [in] graph A pointer to the graph object to destroy
*/
void cugraph_sg_graph_free(cugraph_graph_t* graph);

/**
* @brief Destroy an MG graph
*
* @deprecated This API will be deleted, use cugraph_graph_free instead
*
* @param [in] graph A pointer to the graph object to destroy
*/
// FIXME: This should probably just be cugraph_graph_free
// but didn't want to confuse with original cugraph_free_graph
void cugraph_mg_graph_free(cugraph_graph_t* graph);

/**
Expand Down
12 changes: 12 additions & 0 deletions cpp/include/cugraph_c/resource_handle.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ typedef struct cugraph_resource_handle_ {
*/
cugraph_resource_handle_t* cugraph_create_resource_handle(void* raft_handle);

/**
* @brief get comm_size from resource handle
*
* If the resource handle has been configured for multi-gpu, this will return
* the comm_size for this cluster. If the resource handle has not been configured for
* multi-gpu this will always return 1.
*
* @param [in] handle Handle for accessing resources
* @return comm_size
*/
int cugraph_resource_handle_get_comm_size(const cugraph_resource_handle_t* handle);

/**
* @brief get rank from resource handle
*
Expand Down
Loading
Loading