-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define and Implement C++ API for negative sampling #4523
Changes from 9 commits
29cbe97
983a881
5504c74
912ae6f
0ce0712
0d89269
a31f5a9
fdad347
6a90844
2f23ac1
51d30db
06c3d5d
b9ab33a
4de3bda
b38d4c6
16bbd7b
905d1b6
dbc0b38
5f35987
06e71c4
94990e1
3305835
996b9ac
8a28b0b
b381784
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -743,4 +743,60 @@ lookup_endpoints_from_edge_ids_and_types( | |
raft::device_span<edge_t const> edge_ids_to_lookup, | ||
raft::device_span<edge_type_t const> edge_types_to_lookup); | ||
|
||
/** | ||
* @brief Negative Sampling | ||
* | ||
* This function generates negative samples for graph. | ||
* | ||
* Negative sampling is done by generating a random graph according to the specified | ||
* parameters and optionally removing the false negatives. | ||
* | ||
* Sampling occurs by creating a list of source vertex ids from biased samping | ||
* of the source vertex space, and destination vertex ids from biased sampling of the | ||
* destination vertex space, and using this as the putative list of edges. We | ||
* then can optionally remove duplicates and remove false negatives to generate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you commit the change? I still see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missed the change in the documentation... I'll search the documentation and push a fix once I resolve the testing issue on one of the build configurations. |
||
* the final list. If necessary we will repeat the process to end with a resulting | ||
* edge list of the appropriate size. | ||
* | ||
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type. | ||
* @tparam edge_t Type of edge identifiers. Needs to be an integral type. | ||
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if | ||
* true) are major indices | ||
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false) | ||
* | ||
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and | ||
* handles to various CUDA libraries) to run graph algorithms. | ||
* @param graph_view Graph View object to generate NBR Sampling for | ||
* @param rng_state RNG state | ||
* @param num_samples Number of negative samples to generate | ||
* @param src_bias Optional bias for randomly selecting source vertices. If std::nullopt vertices | ||
* will be selected uniformly | ||
* @param dst_bias Optional bias for randomly selecting destination vertices. If std::nullopt | ||
* vertices will be selected uniformly | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the ranges for multi-GPU? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added comment on that. |
||
* @param remove_duplicates If true, remove duplicate samples | ||
* @param remove_false_negatives If true, remove false negatives (samples that are actually edges in | ||
* the graph | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these false negatives? Should we better say something like False negatives can be mis-interpreted as something that should be reported but missing. I guess here false negatives mean something that is reported as negative samples but should not be reported (as it is positive samples, this actually means false positive negative samples). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to |
||
* @param exact_number_of_samples If true, repeat generation until we get the exact number of | ||
* negative samples | ||
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`). | ||
* | ||
* @return tuple containing source vertex ids and destination vertex ids for the negative samples | ||
*/ | ||
template <typename vertex_t, | ||
typename edge_t, | ||
typename weight_t, | ||
bool store_transposed, | ||
bool multi_gpu> | ||
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> negative_sampling( | ||
raft::handle_t const& handle, | ||
raft::random::RngState& rng_state, | ||
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view, | ||
size_t num_samples, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sort of our convention is to list scalar parameters at the end. We may better move There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved in next push |
||
std::optional<raft::device_span<weight_t const>> src_bias, | ||
std::optional<raft::device_span<weight_t const>> dst_bias, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for nitpicking but better be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to be plural. |
||
bool remove_duplicates, | ||
bool remove_false_negatives, | ||
bool exact_number_of_samples, | ||
bool do_expensive_check); | ||
|
||
} // namespace cugraph |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#pragma once | ||
|
||
#include <cugraph_c/array.h> | ||
#include <cugraph_c/graph.h> | ||
#include <cugraph_c/random.h> | ||
#include <cugraph_c/resource_handle.h> | ||
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
|
||
/** | ||
* @brief Opaque COO definition | ||
*/ | ||
typedef struct { | ||
int32_t align_; | ||
} cugraph_coo_t; | ||
|
||
/** | ||
* @brief Opaque COO list definition | ||
*/ | ||
typedef struct { | ||
int32_t align_; | ||
} cugraph_coo_list_t; | ||
|
||
/** | ||
* @brief Get the source vertex ids | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
* @return type erased array view of source vertex ids | ||
*/ | ||
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Get the destination vertex ids | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
* @return type erased array view of destination vertex ids | ||
*/ | ||
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Get the edge weights | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
* @return type erased array view of edge weights, NULL if no edge weights in COO | ||
*/ | ||
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Get the edge id | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
* @return type erased array view of edge id, NULL if no edge ids in COO | ||
*/ | ||
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Get the edge type | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
* @return type erased array view of edge type, NULL if no edge types in COO | ||
*/ | ||
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Get the number of coo object in the list | ||
* | ||
* @param [in] coo_list Opaque pointer to COO list | ||
* @return number of elements | ||
*/ | ||
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list); | ||
|
||
/** | ||
* @brief Get a COO from the list | ||
* | ||
* @param [in] coo_list Opaque pointer to COO list | ||
* @param [in] index Index of desired COO from list | ||
* @return a cugraph_coo_t* object from the list | ||
*/ | ||
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index); | ||
|
||
/** | ||
* @brief Free coo object | ||
* | ||
* @param [in] coo Opaque pointer to COO | ||
*/ | ||
void cugraph_coo_free(cugraph_coo_t* coo); | ||
|
||
/** | ||
* @brief Free coo list | ||
* | ||
* @param [in] coo_list Opaque pointer to list of COO objects | ||
*/ | ||
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list); | ||
|
||
#ifdef __cplusplus | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something very minor, but should we place input vectors (e.g.
biases
) before output vectors (e.g.output
)? AFAIK, that is a convention in the C++ API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may delete this function. I implemented a different way. After I've refactored the implementation I'll revisit whether we need this function or not. If we keep it I'll make that change.