Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define and Implement C++ API for negative sampling #4523

Merged
merged 25 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
29cbe97
Define C++ API for negative sampling
ChuckHastings Jul 5, 2024
983a881
first cut at negative sampling implementation (untested)... fixed API
ChuckHastings Jul 10, 2024
5504c74
rename utility_wrapper.cuh
ChuckHastings Jul 10, 2024
912ae6f
Working SG negative sampling tests
ChuckHastings Jul 15, 2024
0ce0712
add MG tests
ChuckHastings Jul 17, 2024
0d89269
Merge branch 'branch-24.08' into negative_sampling_api
ChuckHastings Jul 17, 2024
a31f5a9
Add C API and PLC for negative sampling
ChuckHastings Jul 23, 2024
fdad347
Merge branch 'branch-24.08' into negative_sampling_api
ChuckHastings Jul 23, 2024
6a90844
Fix filename change lost in merge
ChuckHastings Jul 23, 2024
2f23ac1
Negative sampling now working for SG, MG with 1/2/4 GPUs
ChuckHastings Aug 5, 2024
51d30db
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 5, 2024
06c3d5d
Refactor to do biased sampling by vertex partitions instead of exposi…
ChuckHastings Aug 9, 2024
b9ab33a
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 9, 2024
4de3bda
address other PR comments
ChuckHastings Aug 10, 2024
b38d4c6
Fix a few straggling references to remove_false_negatives, refactor a…
ChuckHastings Aug 13, 2024
16bbd7b
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 13, 2024
905d1b6
refactor negative sampling based on PR comments
ChuckHastings Aug 15, 2024
dbc0b38
start refactoring to make tests .cpp files
ChuckHastings Aug 15, 2024
5f35987
move MG validation code into validation_utilitices.cu
ChuckHastings Aug 17, 2024
06e71c4
rename sampling file
ChuckHastings Aug 17, 2024
94990e1
remove reference of device structure from host API
ChuckHastings Aug 17, 2024
3305835
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 17, 2024
996b9ac
update to accomodate GPUs with no bias
ChuckHastings Aug 19, 2024
8a28b0b
move num_samples parameter, add tests for edge masking, some cosmetic…
ChuckHastings Aug 20, 2024
b381784
Merge branch 'branch-24.10' into negative_sampling_api
ChuckHastings Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,12 @@ set(CUGRAPH_SOURCES
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/negative_sampling_sg_v32_e64.cu
src/sampling/negative_sampling_sg_v32_e32.cu
src/sampling/negative_sampling_sg_v64_e64.cu
src/sampling/negative_sampling_mg_v32_e64.cu
src/sampling/negative_sampling_mg_v32_e32.cu
src/sampling/negative_sampling_mg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v32_e32.cu
src/sampling/sampling_post_processing_sg_v64_e64.cu
Expand Down Expand Up @@ -653,6 +659,7 @@ add_library(cugraph_c
src/c_api/louvain.cpp
src/c_api/triangle_count.cpp
src/c_api/neighbor_sampling.cpp
src/c_api/negative_sampling.cpp
src/c_api/labeling_result.cpp
src/c_api/weakly_connected_components.cpp
src/c_api/strongly_connected_components.cpp
Expand Down
23 changes: 23 additions & 0 deletions cpp/include/cugraph/detail/utility_wrappers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,29 @@ void uniform_random_fill(rmm::cuda_stream_view const& stream_view,
value_t max_value,
raft::random::RngState& rng_state);

/**
* @brief Fill a buffer with biased random values
*
* Fills a buffer with values based on the specified biases.
* The probability of selecting the value `i` is determined by
* `biases[i] / sum(biases)`.
*
* @tparam value_t type of the value to operate on
* @tparam bias_t type of the bias
*
* @param[in] handle RAFT handle object to encapsulate resources (e.g. CUDA stream,
* communicator, and handles to various CUDA libraries) to run graph algorithms.
* @param[in] rng_state The RngState instance holding pseudo-random number generator state.
* @param[out] output The random values
* @param[in] biases The biased values
*
*/
template <typename value_t, typename bias_t>
void biased_random_fill(raft::handle_t const& handle,
raft::random::RngState& rng_state,
raft::device_span<value_t> output,
raft::device_span<bias_t const> biases);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something very minor, but should we place input vectors (e.g. biases) before output vectors (e.g. output)? AFAIK, that is a convention in the C++ API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may delete this function. I implemented a different way. After I've refactored the implementation I'll revisit whether we need this function or not. If we keep it I'll make that change.


/**
* @brief Fill a buffer with a constant value
*
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cugraph/graph_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
/* (edge_srcs, edge_dsts) should be pre-shuffled */
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(
raft::handle_t const& handle,
Expand Down Expand Up @@ -945,7 +945,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
rmm::device_uvector<bool> has_edge(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
Expand Down
56 changes: 56 additions & 0 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -743,4 +743,60 @@ lookup_endpoints_from_edge_ids_and_types(
raft::device_span<edge_t const> edge_ids_to_lookup,
raft::device_span<edge_type_t const> edge_types_to_lookup);

/**
* @brief Negative Sampling
*
* This function generates negative samples for graph.
*
* Negative sampling is done by generating a random graph according to the specified
* parameters and optionally removing the false negatives.
*
* Sampling occurs by creating a list of source vertex ids from biased samping
* of the source vertex space, and destination vertex ids from biased sampling of the
* destination vertex space, and using this as the putative list of edges. We
* then can optionally remove duplicates and remove false negatives to generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is false negative here a right terminology? AFAIK, false negative is something that should be reported is missing. Here, isn't it the opposite? (edges that shouldn't appear actually appear).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to remove_existing_edges

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you commit the change? I still see false negatives.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed the change in the documentation... I'll search the documentation and push a fix once I resolve the testing issue on one of the build configurations.

* the final list. If necessary we will repeat the process to end with a resulting
* edge list of the appropriate size.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
*
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling for
* @param rng_state RNG state
* @param num_samples Number of negative samples to generate
* @param src_bias Optional bias for randomly selecting source vertices. If std::nullopt vertices
* will be selected uniformly
* @param dst_bias Optional bias for randomly selecting destination vertices. If std::nullopt
* vertices will be selected uniformly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the ranges for multi-GPU?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment on that.

* @param remove_duplicates If true, remove duplicate samples
* @param remove_false_negatives If true, remove false negatives (samples that are actually edges in
* the graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these false negatives? Should we better say something like @param remove_positive_samples If true, remove positive samples (edges that exist in the input graph).

False negatives can be mis-interpreted as something that should be reported but missing. I guess here false negatives mean something that is reported as negative samples but should not be reported (as it is positive samples, this actually means false positive negative samples).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to remove_existing_edges

* @param exact_number_of_samples If true, repeat generation until we get the exact number of
* negative samples
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
*
* @return tuple containing source vertex ids and destination vertex ids for the negative samples
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> negative_sampling(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
size_t num_samples,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of our convention is to list scalar parameters at the end. We may better move num_samples after dst_biases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved in next push

std::optional<raft::device_span<weight_t const>> src_bias,
std::optional<raft::device_span<weight_t const>> dst_bias,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking but better be src_biases and dst_biases to be consistent with the reset of C++ API? (use plural forms for vectors with multiple elements?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be plural.

bool remove_duplicates,
bool remove_false_negatives,
bool exact_number_of_samples,
bool do_expensive_check);

} // namespace cugraph
115 changes: 115 additions & 0 deletions cpp/include/cugraph_c/coo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>

#ifdef __cplusplus
extern "C" {
#endif

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

#ifdef __cplusplus
}
#endif
86 changes: 1 addition & 85 deletions cpp/include/cugraph_c/graph_generators.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/coo.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>
Expand All @@ -27,91 +28,6 @@ extern "C" {

typedef enum { POWER_LAW = 0, UNIFORM } cugraph_generator_distribution_t;

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

/**
* @brief Generate RMAT edge list
*
Expand Down
Loading
Loading