Skip to content

Commit

Permalink
update neighbor sample API
Browse files Browse the repository at this point in the history
  • Loading branch information
jnke2016 committed Aug 30, 2024
1 parent 8c17009 commit 7b95c5e
Show file tree
Hide file tree
Showing 13 changed files with 1,355 additions and 215 deletions.
131 changes: 117 additions & 14 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ enum class prior_sources_behavior_t { DEFAULT = 0, CARRY_OVER, EXCLUDE };
/**
* @brief Uniform Neighborhood Sampling.
*
* @deprecated This API will be deleted, use neighbor_sample instead
* @deprecated This API will be deleted, use cugraph_homogeneous_neighbor_sample with
* 'is_biased' set to false instead
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
Expand Down Expand Up @@ -142,7 +143,8 @@ uniform_neighbor_sample(
/**
* @brief Biased Neighborhood Sampling.
*
* @deprecated This API will be deleted, use neighbor_sample instead
* @deprecated This API will be deleted, use cugraph_homogeneous_neighbor_sample with
* 'is_biased' set to true instead
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases) from these outgoing neighbors to extract a subgraph.
Expand Down Expand Up @@ -244,14 +246,12 @@ biased_neighbor_sample(
bool dedupe_sources = false,
bool do_expensive_check = false);


/**
* @brief Neighborhood Sampling.
* @brief Homogeneous Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* When branching out to select outgoing neighbors, either fan_out or heterogeneous_fan_out must
* be provided but not both.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
Expand Down Expand Up @@ -301,12 +301,10 @@ biased_neighbor_sample(
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level. When fan_out is provided, the sampling method uses the same fanout value for each type.
* @param heterogeneous_fan_out Tuple of host spans defining branching out (fan-out) degree per
* source vertex for each level in CSR style format. The first element of the tuple is the offset
* array per edge type id and the second element correspond to the fanout values.
* When heterogeneous_fan_out is provided, different fan_out values can be used for each edge type.
* array per edge type id and the second element corresponds to the fanout values.
* The sampling method can use different fan_out values for each edge type.
* The fan-out offsets size must be proportional to the number of edge types and fan_out values.
* @param return_hops boolean flag specifying if the hop information should be returned.
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
Expand All @@ -321,7 +319,7 @@ biased_neighbor_sample(
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
// FIXME: Add flag for bias=True/False

template <typename vertex_t,
typename edge_t,
typename weight_t,
Expand All @@ -338,7 +336,7 @@ std::tuple<rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
neighbor_sample(
heterogeneous_neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
Expand All @@ -350,15 +348,120 @@ neighbor_sample(
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
std::optional<raft::host_span<int32_t const>> fan_out,
std::optional<std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>>
std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>
heterogeneous_fan_out,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/**
* @brief Homogeneous Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* * @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view Optional view object holding edge biases (to be used in biased sampling) for @p
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected. passing std::nullopt as the edge biases will result in
* uniform sampling.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level. The sampling method uses the same fanout value for each type.
* @param return_hops boolean flag specifying if the hop information should be returned.
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/

template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename bias_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
homogeneous_neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
std::optional<edge_property_view_t<edge_t, bias_t const*>> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
raft::host_span<int32_t const> fan_out,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/*
* @brief renumber sampled edge list and compress to the (D)CSR|(D)CSC format.
*
Expand Down
86 changes: 75 additions & 11 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -366,12 +366,13 @@ cugraph_error_code_t cugraph_create_heterogeneous_fan_out(
*
* @param [in] heterogeneous_fanout The edge type size and fanout values
*/
void cugraph_heterogeneous_fanout_free(cugraph_sample_heterogeneous_fan_out_t* heterogeneous_fanout);
void cugraph_heterogeneous_fan_out_free(cugraph_sample_heterogeneous_fan_out_t* heterogeneous_fanout);

/**
* @brief Uniform Neighborhood Sampling
*
* @deprecated This API will be deleted, use cugraph_neighbor_sample instead
* @deprecated This API will be deleted, use cugraph_homogeneous_neighbor_sample with
* 'is_biased' set to false instead
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
Expand Down Expand Up @@ -428,8 +429,9 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(
/**
* @brief Biased Neighborhood Sampling
*
* @deprecated This API will be deleted, use cugraph_neighbor_sample instead
*
* @deprecated This API will be deleted, use cugraph_homogeneous_neighbor_sample with
* 'is_biased' set to true instead
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
* of sampling requests in the same function call - which should improve GPU utilization.
Expand Down Expand Up @@ -487,9 +489,10 @@ cugraph_error_code_t cugraph_biased_neighbor_sample(
cugraph_error_t** error);

/**
* @brief Neighborhood Sampling
* @brief Heterogeneous Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices with edge biases or not.
* Returns a sample of the neighborhood around specified start vertices with edge biases or not
* and homogeneous fanout types.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
Expand Down Expand Up @@ -518,11 +521,9 @@ cugraph_error_code_t cugraph_biased_neighbor_sample(
* label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.
* @param [in] label_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fanout Host array defining the fan out at each step in the sampling algorithm.
* We only support fanout values of type INT32
* @param [in] heterogeneous_fanout Tuple of host arrays defining the fan out at each step in the
* sampling algorithm. in CSR style format. The first element of the tuple is the offset array per
* edge type id and the second element correspond to the fanout values.
* edge type id and the second element corresponds to the fanout values.
* We only support type INT32 for both the offsets and the fanout values array.
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
Expand All @@ -536,7 +537,7 @@ cugraph_error_code_t cugraph_biased_neighbor_sample(
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_neighbor_sample(
cugraph_error_code_t cugraph_heterogeneous_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
Expand All @@ -546,14 +547,77 @@ cugraph_error_code_t cugraph_neighbor_sample(
const cugraph_type_erased_device_array_view_t* label_list,
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sample_heterogeneous_fan_out_t* heterogeneous_fanout,
const cugraph_sampling_options_t* options,
bool_t is_biased,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @brief Homogeneous Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices with edge biases or not
* and homogeneous fanout types.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] edge_biases Device array of edge biases to use for sampling. If NULL
* use the edge weight as the bias. If set to NULL, edges will be sampled uniformly.
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_labels Device array of start vertex labels for the sampling. The
* labels associated with each start vertex will be included in the output associated with results
* that were derived from that start vertex. We only support label of type INT32. If label is
* NULL, the return data will not be labeled.
* @param [in] label_list Device array of the labels included in @p start_vertex_labels. If
* @p label_to_comm_rank is not specified this parameter is ignored. If specified, label_list
* must be sorted in ascending order.
* @param [in] label_to_comm_rank Device array identifying which comm rank the output for a
* particular label should be shuffled in the output. If not specifed the data is not organized in
* output. If specified then the all data from @p label_list[i] will be shuffled to rank @p. This
* cannot be specified unless @p start_vertex_labels is also specified
* label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.
* @param [in] label_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fanout Host array defining the fan out at each step in the sampling algorithm.
* We only support fanout values of type INT32
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] is_biased
* A flag specifying whether to run biased neighborhood sampling
* (if set to true) or uniform neighbor sampling.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_homogeneous_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_edge_property_view_t* edge_biases,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_labels,
const cugraph_type_erased_device_array_view_t* label_list,
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sampling_options_t* options,
bool_t is_biased,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @deprecated This call should be replaced with cugraph_sample_result_get_majors
* @brief Get the source vertices from the sampling algorithm result
Expand Down
Loading

0 comments on commit 7b95c5e

Please sign in to comment.