Skip to content

Commit

Permalink
add new all-pairs similarity algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
ChuckHastings committed Feb 8, 2024
1 parent 9ad7389 commit 880f1d6
Show file tree
Hide file tree
Showing 22 changed files with 2,062 additions and 31 deletions.
120 changes: 120 additions & 0 deletions cpp/include/cugraph/algorithms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2137,6 +2137,126 @@ rmm::device_uvector<weight_t> overlap_coefficients(
std::tuple<raft::device_span<vertex_t const>, raft::device_span<vertex_t const>> vertex_pairs,
bool do_expensive_check = false);

/**
* @brief Compute Jaccard all pairs similarity coefficient
*
* Similarity is computed for all pairs of vertices. If the vertices
* variable is specified it will be all pairs based on two hop neighbors
* of these seeds. If the vertices variable is not specified it will be
* all pairs of all two hop neighbors.
*
* If topk is specified only the top scoring vertex pairs will be returned,
* if not specified then all vertex pairs will be returned.
*
* @throws cugraph::logic_error when an error occurs.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph view object.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view. If @p
* edge_weight_view.has_value() == true, use the weights associated with the graph. If false, assume
* a weight of 1 for all edges.
* @param vertices optional device span defining the seed vertices.
* @param topk optional specification of the how many of the top scoring vertex pairs should be
* returned
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple containing the tuples (t1, t2, similarity score)
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::
tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>>
jaccard_all_pairs_coefficients(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<raft::device_span<vertex_t const>> vertices,
std::optional<size_t> topk,
bool do_expensive_check = false);

/**
* @brief Compute Sorensen similarity coefficient
*
* Similarity is computed for all pairs of vertices. If the vertices
* variable is specified it will be all pairs based on two hop neighbors
* of these seeds. If the vertices variable is not specified it will be
* all pairs of all two hop neighbors.
*
* If topk is specified only the top scoring vertex pairs will be returned,
* if not specified then all vertex pairs will be returned.
*
* @throws cugraph::logic_error when an error occurs.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph view object.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view. If @p
* edge_weight_view.has_value() == true, use the weights associated with the graph. If false, assume
* a weight of 1 for all edges.
* @param vertices optional device span defining the seed vertices.
* @param topk optional specification of the how many of the top scoring vertex pairs should be
* returned
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple containing the tuples (t1, t2, similarity score)
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::
tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>>
sorensen_all_pairs_coefficients(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<raft::device_span<vertex_t const>> vertices,
std::optional<size_t> topk,
bool do_expensive_check = false);

/**
* @brief Compute overlap similarity coefficient
*
* Similarity is computed for all pairs of vertices. If the vertices
* variable is specified it will be all pairs based on two hop neighbors
* of these seeds. If the vertices variable is not specified it will be
* all pairs of all two hop neighbors.
*
* If topk is specified only the top scoring vertex pairs will be returned,
* if not specified then all vertex pairs will be returned.
*
* @throws cugraph::logic_error when an error occurs.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph view object.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view. If @p
* edge_weight_view.has_value() == true, use the weights associated with the graph. If false, assume
* a weight of 1 for all edges.
* @param vertices optional device span defining the seed vertices.
* @param topk optional specification of the how many of the top scoring vertex pairs should be
* returned
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple containing the tuples (t1, t2, similarity score)
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::
tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>, rmm::device_uvector<weight_t>>
overlap_all_pairs_coefficients(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<raft::device_span<vertex_t const>> vertices,
std::optional<size_t> topk,
bool do_expensive_check = false);

/*
* @brief Enumerate K-hop neighbors
*
Expand Down
2 changes: 1 addition & 1 deletion cpp/include/cugraph_c/graph_functions.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
126 changes: 125 additions & 1 deletion cpp/include/cugraph_c/similarity_algorithms.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -36,6 +36,16 @@ typedef struct {
int32_t align_;
} cugraph_similarity_result_t;

/**
* @ingroup similarity
* @brief Get vertex pair from the similarity result.
*
* @param [in] result The result from a similarity algorithm
* @return vertex pairs
*/
cugraph_vertex_pairs_t* cugraph_similarity_result_get_vertex_pairs(
cugraph_similarity_result_t* result);

/**
* @ingroup similarity
* @brief Get the similarity coefficient array
Expand Down Expand Up @@ -135,6 +145,120 @@ cugraph_error_code_t cugraph_overlap_coefficients(const cugraph_resource_handle_
cugraph_similarity_result_t** result,
cugraph_error_t** error);

/**
* @brief Perform All-Pairs Jaccard similarity computation
*
* Compute the similarity for all vertex pairs derived from an optional specified
* vertex list. This function will identify the two-hop neighbors of the specified
* vertices (all vertices in the graph if not specified) and compute similarity
* for those vertices.
*
* If the topk parameter is specified then the result will only contain the top k
* highest scoring results.
*
* Note that Jaccard similarity must run on a symmetric graph.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph
* @param [in] vertices Vertex list for input. If null then compute based on
* all vertices in the graph.
* @param [in] use_weight If true consider the edge weight in the graph, if false use an
* edge weight of 1
* @param [in] topk Specify how many answers to return. Specifying SIZE_MAX
* will return all values.
* @param [in] do_expensive_check A flag to run expensive checks for input arguments (if set to
* `true`).
* @param [out] result Opaque pointer to similarity results
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_all_pairs_jaccard_coefficients(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* vertices,
bool_t use_weight,
size_t topk,
bool_t do_expensive_check,
cugraph_similarity_result_t** result,
cugraph_error_t** error);

/**
* @brief Perform All Pairs Sorensen similarity computation
*
* Compute the similarity for all vertex pairs derived from an optional specified
* vertex list. This function will identify the two-hop neighbors of the specified
* vertices (all vertices in the graph if not specified) and compute similarity
* for those vertices.
*
* If the topk parameter is specified then the result will only contain the top k
* highest scoring results.
*
* Note that Sorensen similarity must run on a symmetric graph.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph
* @param [in] vertices Vertex list for input. If null then compute based on
* all vertices in the graph.
* @param [in] use_weight If true consider the edge weight in the graph, if false use an
* edge weight of 1
* @param [in] topk Specify how many answers to return. Specifying SIZE_MAX
* will return all values.
* @param [in] do_expensive_check A flag to run expensive checks for input arguments (if set to
* `true`).
* @param [out] result Opaque pointer to similarity results
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_all_pairs_sorensen_coefficients(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* vertices,
bool_t use_weight,
size_t topk,
bool_t do_expensive_check,
cugraph_similarity_result_t** result,
cugraph_error_t** error);

/**
* @brief Perform All Pairs overlap similarity computation
*
* Compute the similarity for all vertex pairs derived from an optional specified
* vertex list. This function will identify the two-hop neighbors of the specified
* vertices (all vertices in the graph if not specified) and compute similarity
* for those vertices.
*
* If the topk parameter is specified then the result will only contain the top k
* highest scoring results.
*
* Note that overlap similarity must run on a symmetric graph.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph
* @param [in] vertices Vertex list for input. If null then compute based on
* all vertices in the graph.
* @param [in] use_weight If true consider the edge weight in the graph, if false use an
* edge weight of 1
* @param [in] topk Specify how many answers to return. Specifying SIZE_MAX
* will return all values.
* @param [in] do_expensive_check A flag to run expensive checks for input arguments (if set to
* `true`).
* @param [out] result Opaque pointer to similarity results
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_all_pairs_overlap_coefficients(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* vertices,
bool_t use_weight,
size_t topk,
bool_t do_expensive_check,
cugraph_similarity_result_t** result,
cugraph_error_t** error);

#ifdef __cplusplus
}
#endif
2 changes: 1 addition & 1 deletion cpp/src/c_api/graph_functions.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
Loading

0 comments on commit 880f1d6

Please sign in to comment.