Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define and Implement C++ API for negative sampling #4523

Merged
merged 25 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
29cbe97
Define C++ API for negative sampling
ChuckHastings Jul 5, 2024
983a881
first cut at negative sampling implementation (untested)... fixed API
ChuckHastings Jul 10, 2024
5504c74
rename utility_wrapper.cuh
ChuckHastings Jul 10, 2024
912ae6f
Working SG negative sampling tests
ChuckHastings Jul 15, 2024
0ce0712
add MG tests
ChuckHastings Jul 17, 2024
0d89269
Merge branch 'branch-24.08' into negative_sampling_api
ChuckHastings Jul 17, 2024
a31f5a9
Add C API and PLC for negative sampling
ChuckHastings Jul 23, 2024
fdad347
Merge branch 'branch-24.08' into negative_sampling_api
ChuckHastings Jul 23, 2024
6a90844
Fix filename change lost in merge
ChuckHastings Jul 23, 2024
2f23ac1
Negative sampling now working for SG, MG with 1/2/4 GPUs
ChuckHastings Aug 5, 2024
51d30db
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 5, 2024
06c3d5d
Refactor to do biased sampling by vertex partitions instead of exposi…
ChuckHastings Aug 9, 2024
b9ab33a
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 9, 2024
4de3bda
address other PR comments
ChuckHastings Aug 10, 2024
b38d4c6
Fix a few straggling references to remove_false_negatives, refactor a…
ChuckHastings Aug 13, 2024
16bbd7b
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 13, 2024
905d1b6
refactor negative sampling based on PR comments
ChuckHastings Aug 15, 2024
dbc0b38
start refactoring to make tests .cpp files
ChuckHastings Aug 15, 2024
5f35987
move MG validation code into validation_utilitices.cu
ChuckHastings Aug 17, 2024
06e71c4
rename sampling file
ChuckHastings Aug 17, 2024
94990e1
remove reference of device structure from host API
ChuckHastings Aug 17, 2024
3305835
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 17, 2024
996b9ac
update to accomodate GPUs with no bias
ChuckHastings Aug 19, 2024
8a28b0b
move num_samples parameter, add tests for edge masking, some cosmetic…
ChuckHastings Aug 20, 2024
b381784
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,12 @@ set(CUGRAPH_SOURCES
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/negative_sampling_sg_v32_e64.cu
src/sampling/negative_sampling_sg_v32_e32.cu
src/sampling/negative_sampling_sg_v64_e64.cu
src/sampling/negative_sampling_mg_v32_e64.cu
src/sampling/negative_sampling_mg_v32_e32.cu
src/sampling/negative_sampling_mg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v32_e32.cu
src/sampling/sampling_post_processing_sg_v64_e64.cu
Expand Down Expand Up @@ -656,6 +662,7 @@ add_library(cugraph_c
src/c_api/louvain.cpp
src/c_api/triangle_count.cpp
src/c_api/neighbor_sampling.cpp
src/c_api/negative_sampling.cpp
src/c_api/labeling_result.cpp
src/c_api/weakly_connected_components.cpp
src/c_api/strongly_connected_components.cpp
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cugraph/graph_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
/* (edge_srcs, edge_dsts) should be pre-shuffled */
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(
raft::handle_t const& handle,
Expand Down Expand Up @@ -945,7 +945,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
rmm::device_uvector<bool> has_edge(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
Expand Down
57 changes: 57 additions & 0 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -743,4 +743,61 @@ lookup_endpoints_from_edge_ids_and_types(
raft::device_span<edge_t const> edge_ids_to_lookup,
raft::device_span<edge_type_t const> edge_types_to_lookup);

/**
* @brief Negative Sampling
*
* This function generates negative samples for graph.
*
* Negative sampling is done by generating a random graph according to the specified
* parameters and optionally removing samples that represent actual edges in the graph
*
* Sampling occurs by creating a list of source vertex ids from biased samping
* of the source vertex space, and destination vertex ids from biased sampling of the
* destination vertex space, and using this as the putative list of edges. We
* then can optionally remove duplicates and remove actual edges in the graph to generate
* the final list. If necessary we will repeat the process to end with a resulting
* edge list of the appropriate size.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
*
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling for
* @param rng_state RNG state
* @param src_biases Optional bias for randomly selecting source vertices. If std::nullopt vertices
* will be selected uniformly. In multi-GPU environment the biases should be partitioned based
* on the vertex partitions.
* @param dst_biases Optional bias for randomly selecting destination vertices. If std::nullopt
* vertices will be selected uniformly. In multi-GPU environment the biases should be partitioned
* based on the vertex partitions.
* @param num_samples Number of negative samples to generate
* @param remove_duplicates If true, remove duplicate samples
* @param remove_existing_edges If true, remove samples that are actually edges in the graph
* @param exact_number_of_samples If true, repeat generation until we get the exact number of
* negative samples
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
*
* @return tuple containing source vertex ids and destination vertex ids for the negative samples
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> negative_sampling(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<raft::device_span<weight_t const>> src_biases,
std::optional<raft::device_span<weight_t const>> dst_biases,
size_t num_samples,
bool remove_duplicates,
bool remove_existing_edges,
bool exact_number_of_samples,
bool do_expensive_check);

} // namespace cugraph
115 changes: 115 additions & 0 deletions cpp/include/cugraph_c/coo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>

#ifdef __cplusplus
extern "C" {
#endif

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

#ifdef __cplusplus
}
#endif
86 changes: 1 addition & 85 deletions cpp/include/cugraph_c/graph_generators.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/coo.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>
Expand All @@ -27,91 +28,6 @@ extern "C" {

typedef enum { POWER_LAW = 0, UNIFORM } cugraph_generator_distribution_t;

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

/**
* @brief Generate RMAT edge list
*
Expand Down
52 changes: 52 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

#pragma once

#include <cugraph_c/coo.h>
#include <cugraph_c/error.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/properties.h>
Expand Down Expand Up @@ -674,6 +675,57 @@ cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handl
cugraph_type_erased_device_array_t** vertices,
cugraph_error_t** error);

/**
* @ingroup samplingC
* @brief Perform negative sampling
*
* Negative sampling generates a COO structure defining edges according to the specified parameters
*
* @param [in] handle Handle for accessing resources
* @param [in,out] rng_state State of the random number generator, updated with each
* call
* @param [in] graph Pointer to graph
* @param [in] vertices Vertex ids for the source biases. If @p src_bias and
* @p dst_bias are not specified this is ignored. If
* @p vertices is specified then vertices[i] is the vertex
* id of src_biases[i] and dst_biases[i]. If @p vertices
* is not specified then i is the vertex id if src_biases[i]
* and dst_biases[i]
* @param [in] src_biases Bias for selecting source vertices. If NULL, do uniform
* sampling, if provided probability of vertex i will be
* src_bias[i] / (sum of all source biases)
* @param [in] dst_biases Bias for selecting destination vertices. If NULL, do
* uniform sampling, if provided probability of vertex i
* will be dst_bias[i] / (sum of all destination biases)
* @param [in] num_samples Number of negative samples to generate
* @param [in] remove_duplicates If true, remove duplicates from sampled edges
* @param [in] remove_existing_edges If true, remove sampled edges that actually exist in
* the graph
* @param [in] exact_number_of_samples If true, result should contain exactly @p num_samples. If
* false the code will generate @p num_samples and then do
* any filtering as specified
* @param [in] do_expensive_check A flag to run expensive checks for input arguments (if
* set to true)
* @param [out] result Opaque pointer to generated coo list
* @param [out] error Pointer to an error object storing details of any error.
* Will be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_negative_sampling(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* vertices,
const cugraph_type_erased_device_array_view_t* src_biases,
const cugraph_type_erased_device_array_view_t* dst_biases,
size_t num_samples,
bool_t remove_duplicates,
bool_t remove_existing_edges,
bool_t exact_number_of_samples,
bool_t do_expensive_check,
cugraph_coo_t** result,
cugraph_error_t** error);

#ifdef __cplusplus
}
#endif
Loading
Loading