Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves more MG graph ETL to libcugraph and re-enables MG tests in CI #3941

Merged
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b59c94f
placeholder for re-enabling the mg testing
jnke2016 Oct 17, 2023
d07af7d
propose C API changes
ChuckHastings Nov 7, 2023
c66a50e
new graph creation implementation in C API
ChuckHastings Nov 13, 2023
37278d1
Merge branch 'branch-23.12' into new_graph_creation_methods
ChuckHastings Nov 13, 2023
d470bc6
add proper test for multiple input lists in MG
ChuckHastings Nov 14, 2023
0dbea1a
Merge branch 'branch-23.12' into new_graph_creation_methods
ChuckHastings Nov 14, 2023
67e7383
update branch with the latest changes
jnke2016 Nov 15, 2023
a56b30a
fetch and merge CAPI graph update
jnke2016 Nov 15, 2023
d0bf54c
support isolated vertices for sg graph
jnke2016 Nov 15, 2023
52b3162
add support for dropping self loops and removing multi edges to C API…
ChuckHastings Nov 16, 2023
0cca2cd
Merge branch 'branch-23.12' into new_graph_creation_methods
ChuckHastings Nov 16, 2023
d96ba62
refactor remove_self_loops and sort_and_remove_multi_edges to reduce …
ChuckHastings Nov 17, 2023
f1ab784
add support for isolated vertices, list of edges
jnke2016 Nov 17, 2023
9242207
Merge remote-tracking branch 'upstream/new_graph_creation_methods' in…
jnke2016 Nov 17, 2023
d1b104a
check weights before extracting its type
jnke2016 Nov 17, 2023
dfafeec
remove deprecated parameter 'num_edges'
jnke2016 Nov 17, 2023
e7c0cf9
support list of edges
jnke2016 Nov 17, 2023
a44a1ae
remove debug print
jnke2016 Nov 17, 2023
12bf121
fix style
jnke2016 Nov 17, 2023
0fc84bd
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
jnke2016 Nov 17, 2023
76e5c15
pass list of edgelist to the plc graph creation
jnke2016 Nov 18, 2023
172ea07
update check
jnke2016 Nov 20, 2023
42913b8
update data persistence
jnke2016 Nov 20, 2023
3e233f1
cleanup code and fix bugs
jnke2016 Nov 20, 2023
a612a0a
fix style
jnke2016 Nov 20, 2023
4513d19
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
jnke2016 Nov 20, 2023
cc113db
pass keyword argument to accommodate for the plc graph creation signa…
jnke2016 Nov 20, 2023
0ceeb4f
update doctests
jnke2016 Nov 20, 2023
e757ff9
update doctest examples
jnke2016 Nov 20, 2023
68e76b8
re-enable single gpu dask python tests
jnke2016 Nov 20, 2023
64bb371
fix style
jnke2016 Nov 20, 2023
9915497
update copyright
jnke2016 Nov 20, 2023
08c0d05
update copyright
jnke2016 Nov 20, 2023
6db1050
lower tolerance
jnke2016 Nov 20, 2023
39bded6
fix docstring examples
jnke2016 Nov 20, 2023
129a226
Merge branch 'branch-23.12' into branch-23.12_re-enable-mg-testing
naimnv Nov 20, 2023
e275714
Remove another persist and decrease memory footprint of drop_duplicates
VibhuJawa Nov 21, 2023
6069f3c
decrease memory footprint of drop_duplicates
VibhuJawa Nov 21, 2023
64ec881
decrease memory footprint of drop_duplicates
VibhuJawa Nov 21, 2023
a32fd3d
Revert bulk sampling changes
VibhuJawa Nov 21, 2023
5f7d4e5
Revert bulk sampling changes
VibhuJawa Nov 21, 2023
008e1d0
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
rlratzel Nov 22, 2023
8b3b051
properly handle smaller graphs
jnke2016 Nov 22, 2023
51b5a90
remove extra persist
jnke2016 Nov 22, 2023
c8c0abe
Merge remote-tracking branch 'upstream/branch-23.12_re-enable-mg-test…
jnke2016 Nov 22, 2023
3b3d0c6
fix style
jnke2016 Nov 22, 2023
2d4e5ac
undo changes when resolving merge conflict
jnke2016 Nov 22, 2023
e4db01d
clean up code
jnke2016 Nov 22, 2023
d29c10d
update docstrings
jnke2016 Nov 22, 2023
3a4255d
update docstrings
jnke2016 Nov 22, 2023
c607a58
properly handle list of device arrays and clean up code
jnke2016 Nov 22, 2023
7f869b3
explicitly increase the timeout per worker
jnke2016 Nov 22, 2023
71fb5e5
temporarily lower the timeout value
jnke2016 Nov 22, 2023
9f8b131
fix style and add comment
jnke2016 Nov 22, 2023
912701a
Merge remote-tracking branch 'upstream/branch-23.12' into branch-23.1…
jnke2016 Nov 22, 2023
35a6feb
fix style
jnke2016 Nov 22, 2023
fdeaa57
refactor distribution of dask objects across workers
jnke2016 Nov 27, 2023
07bcd2e
fix style
jnke2016 Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ set(CUGRAPH_SOURCES
src/detail/utility_wrappers.cu
src/structure/graph_view_mg.cu
src/structure/remove_self_loops.cu
src/structure/sort_and_remove_multi_edges.cu
src/structure/remove_multi_edges.cu
src/utilities/path_retrieval.cu
src/structure/legacy/graph.cu
src/linear_assignment/legacy/hungarian.cu
Expand Down
16 changes: 9 additions & 7 deletions cpp/include/cugraph/graph_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1005,7 +1005,9 @@ remove_self_loops(raft::handle_t const& handle,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);

/**
* @brief Sort the edges and remove all but one edge when a multi-edge exists
* @brief Remove all but one edge when a multi-edge exists. Note that this function does not use
* stable methods. When a multi-edge exists, one of the edges will remain, there is no
* guarantee on which one will remain.
*
* In an MG context it is assumed that edges have been shuffled to the proper GPU,
* in which case any multi-edges will be on the same GPU.
Expand All @@ -1031,11 +1033,11 @@ std::tuple<rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>>
sort_and_remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_srcs,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);
remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_srcs,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);

} // namespace cugraph
3 changes: 2 additions & 1 deletion cpp/include/cugraph_c/graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@ cugraph_error_code_t cugraph_sg_graph_create(
* Note that setting this flag will arbitrarily select one instance of a multi edge to be the
* edge that survives. If the edges have properties that should be honored (e.g. sum the
weights,
* or take the maximum weight), the caller should do that on not rely on this flag.
* or take the maximum weight), the caller should remove specific edges themselves and not rely
* on this flag.
* @param [in] do_expensive_check If true, do expensive checks to validate the input data
* is consistent with software assumptions. If false bypass these checks.
* @param [out] graph A pointer to the graph object
Expand Down
14 changes: 8 additions & 6 deletions cpp/src/c_api/graph_mg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,12 +212,12 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor {
if (drop_multi_edges_) {
std::tie(
edgelist_srcs, edgelist_dsts, edgelist_weights, edgelist_edge_ids, edgelist_edge_types) =
cugraph::sort_and_remove_multi_edges(handle_,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::move(edgelist_weights),
std::move(edgelist_edge_ids),
std::move(edgelist_edge_types));
cugraph::remove_multi_edges(handle_,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::move(edgelist_weights),
std::move(edgelist_edge_ids),
std::move(edgelist_edge_types));
}

std::tie(*graph, new_edge_weights, new_edge_ids, new_edge_types, new_number_map) =
Expand Down Expand Up @@ -324,6 +324,7 @@ extern "C" cugraph_error_code_t cugraph_graph_create_mg(
//
cugraph_data_type_id_t vertex_type{cugraph_data_type_id_t::NTYPES};
cugraph_data_type_id_t weight_type{cugraph_data_type_id_t::NTYPES};

for (size_t i = 0; i < num_arrays; ++i) {
CAPI_EXPECTS(p_src[i]->size_ == p_dst[i]->size_,
CUGRAPH_INVALID_INPUT,
Expand All @@ -334,6 +335,7 @@ extern "C" cugraph_error_code_t cugraph_graph_create_mg(
CUGRAPH_INVALID_INPUT,
"Invalid input arguments: src type != dst type.",
*error);

CAPI_EXPECTS((p_vertices == nullptr) || (p_src[i]->type_ == p_vertices[i]->type_),
CUGRAPH_INVALID_INPUT,
"Invalid input arguments: src type != vertices type.",
Expand Down
12 changes: 6 additions & 6 deletions cpp/src/c_api/graph_sg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,12 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor {
if (drop_multi_edges_) {
std::tie(
edgelist_srcs, edgelist_dsts, edgelist_weights, edgelist_edge_ids, edgelist_edge_types) =
cugraph::sort_and_remove_multi_edges(handle_,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::move(edgelist_weights),
std::move(edgelist_edge_ids),
std::move(edgelist_edge_types));
cugraph::remove_multi_edges(handle_,
std::move(edgelist_srcs),
std::move(edgelist_dsts),
std::move(edgelist_weights),
std::move(edgelist_edge_ids),
std::move(edgelist_edge_types));
}

std::tie(*graph, new_edge_weights, new_edge_ids, new_edge_types, new_number_map) =
Expand Down
62 changes: 6 additions & 56 deletions cpp/src/structure/detail/structure_utils.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -498,63 +498,13 @@ void sort_adjacency_list(raft::handle_t const& handle,
}
}

template <typename T>
struct indirect_array_reference {
T const* array_;

T operator() __host__ __device__(size_t index) { return array_[index]; }
};

template <typename vertex_t, typename comparison_t>
std::tuple<size_t, rmm::device_uvector<uint32_t>> mark_edges_for_removal(
raft::handle_t const& handle,
raft::device_span<vertex_t const> src,
raft::device_span<vertex_t const> dst,
comparison_t comparison)
{
rmm::device_uvector<uint32_t> remove_flags(packed_bool_size(src.size()), handle.get_stream());
thrust::fill(handle.get_thrust_policy(),
remove_flags.begin(),
remove_flags.end(),
cugraph::packed_bool_empty_mask());

size_t remove_count = thrust::count_if(
handle.get_thrust_policy(),
thrust::make_counting_iterator(size_t{0}),
thrust::make_counting_iterator(src.size()),
[comparison, d_remove_flags = remove_flags.data()] __device__(size_t i) {
if (comparison(i)) {
atomicOr(d_remove_flags + cugraph::packed_bool_offset(i), cugraph::packed_bool_mask(i));
return true;
}
return false;
});

return std::make_tuple(remove_count, std::move(remove_flags));
}

template <typename T>
rmm::device_uvector<T> remove_flagged_elements(raft::handle_t const& handle,
rmm::device_uvector<T>&& vector,
rmm::device_uvector<uint32_t> const& remove_flags,
size_t remove_count)
template <typename comparison_t>
std::tuple<size_t, rmm::device_uvector<uint32_t>> mark_entries(raft::handle_t const& handle,
size_t num_entries,
comparison_t comparison)
{
rmm::device_uvector<T> result(vector.size() - remove_count, handle.get_stream());

thrust::copy_if(
handle.get_thrust_policy(),
thrust::make_counting_iterator(size_t{0}),
thrust::make_counting_iterator(vector.size()),
thrust::make_transform_output_iterator(result.begin(),
indirect_array_reference<T>{vector.data()}),
[d_remove_flags = remove_flags.data()] __device__(size_t i) {
return !(d_remove_flags[cugraph::packed_bool_offset(i)] & cugraph::packed_bool_mask(i));
});

return result;
}

} // namespace detail
rmm::device_uvector<uint32_t> marked_entries(cugraph::packed_bool_size(num_entries),
handle.get_stream());

thrust::tabulate(handle.get_thrust_policy(),
marked_entries.begin(),
Expand Down
43 changes: 28 additions & 15 deletions cpp/src/structure/remove_self_loops_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -44,31 +44,44 @@ remove_self_loops(raft::handle_t const& handle,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types)
{
auto [remove_count, remove_flags] = detail::mark_edges_for_removal(
handle,
raft::device_span<vertex_t const>{edgelist_srcs.data(), edgelist_srcs.size()},
raft::device_span<vertex_t const>{edgelist_dsts.data(), edgelist_dsts.size()},
[d_srcs = edgelist_srcs.data(), d_dsts = edgelist_dsts.data()] __device__(size_t i) {
return d_srcs[i] == d_dsts[i];
});
auto [self_loop_count, self_loops_to_delete] =
detail::mark_entries(handle,
edgelist_srcs.size(),
[d_srcs = edgelist_srcs.data(), d_dsts = edgelist_dsts.data()] __device__(
size_t i) { return d_srcs[i] == d_dsts[i]; });

if (remove_count > 0) {
edgelist_srcs =
detail::remove_flagged_elements(handle, std::move(edgelist_srcs), remove_flags, remove_count);
edgelist_dsts =
detail::remove_flagged_elements(handle, std::move(edgelist_dsts), remove_flags, remove_count);
if (self_loop_count > 0) {
edgelist_srcs = detail::remove_flagged_elements(
handle,
std::move(edgelist_srcs),
raft::device_span<uint32_t const>{self_loops_to_delete.data(), self_loops_to_delete.size()},
self_loop_count);
edgelist_dsts = detail::remove_flagged_elements(
handle,
std::move(edgelist_dsts),
raft::device_span<uint32_t const>{self_loops_to_delete.data(), self_loops_to_delete.size()},
self_loop_count);

if (edgelist_weights)
edgelist_weights = detail::remove_flagged_elements(
handle, std::move(*edgelist_weights), remove_flags, remove_count);
handle,
std::move(*edgelist_weights),
raft::device_span<uint32_t const>{self_loops_to_delete.data(), self_loops_to_delete.size()},
self_loop_count);

if (edgelist_edge_ids)
edgelist_edge_ids = detail::remove_flagged_elements(
handle, std::move(*edgelist_edge_ids), remove_flags, remove_count);
handle,
std::move(*edgelist_edge_ids),
raft::device_span<uint32_t const>{self_loops_to_delete.data(), self_loops_to_delete.size()},
self_loop_count);

if (edgelist_edge_types)
edgelist_edge_types = detail::remove_flagged_elements(
handle, std::move(*edgelist_edge_types), remove_flags, remove_count);
handle,
std::move(*edgelist_edge_types),
raft::device_span<uint32_t const>{self_loops_to_delete.data(), self_loops_to_delete.size()},
self_loop_count);
}

return std::make_tuple(std::move(edgelist_srcs),
Expand Down
92 changes: 0 additions & 92 deletions cpp/src/structure/sort_and_remove_multi_edges.cu

This file was deleted.

Loading