Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improve multi-GPU BFS performance #4619

Closed
wants to merge 137 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
a0d1f01
add a create_graph_from_edgelist function that takes edge list in mul…
seunghwak Jul 15, 2024
55513ae
update R-mat graph generators to generate edge list in multiple chunks
seunghwak Jul 15, 2024
2163fd8
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jul 15, 2024
a9dfb92
fix build error
seunghwak Jul 15, 2024
e7b33ca
delete unused functions
seunghwak Jul 15, 2024
27ea550
fix build errors
seunghwak Jul 16, 2024
e5e8257
add temporary performance measurement code
seunghwak Jul 19, 2024
7ec5b08
add code to broadcast frontier using a bitmap
seunghwak Jul 19, 2024
d6123ba
resolve merge conflicts
seunghwak Jul 19, 2024
81f51c1
fix build error
seunghwak Jul 19, 2024
69cb4f9
update dataframe buffer utilities
seunghwak Jul 21, 2024
6adcccb
reduce # resizes
seunghwak Jul 21, 2024
bfe21fc
remove debug statement
seunghwak Jul 22, 2024
446435b
rename VertexFrontierBucketType to KeyBucketType
seunghwak Jul 22, 2024
df463ce
update per_v_transform_reduce_incoming|outgoing_e to support reduce_o…
seunghwak Jul 25, 2024
222148d
update kernels to take KeyIterator key_first & key_last
seunghwak Jul 26, 2024
effc69c
update per_v_transform_reduce_incoming_outgoing_e to support key list
seunghwak Jul 28, 2024
d537290
remove pred_op.cuh
seunghwak Jul 29, 2024
cf92885
update per_v_transform_reduce_incoming|outgoing_e to take a predicate
seunghwak Jul 29, 2024
b8d846c
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Aug 9, 2024
50ffc33
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Aug 9, 2024
a6476b9
split per_v_transform_reduce_incoming_outgoing_e implementation to tw…
seunghwak Aug 9, 2024
ec24758
implement per_v_transform_reduce_if_incoming|outgoing_e
seunghwak Aug 10, 2024
df751e7
update BFS to use per_v_transform_reduce_if_outoging_e
seunghwak Aug 13, 2024
4661b9b
file rename
seunghwak Aug 13, 2024
0951741
remove transform_reduce_v_frontier_outgoing_e_by_src (this can be bet…
seunghwak Aug 13, 2024
796b928
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Aug 13, 2024
7b98e3a
code cleanup, add few FIXMEs to improve performance, and add performa…
seunghwak Aug 15, 2024
3f77ee1
performance tuning for BFS
seunghwak Aug 16, 2024
bb75771
add a utility to find iteator type in dataframe buffer
seunghwak Aug 17, 2024
cfce7bc
minor performance tuning
seunghwak Aug 17, 2024
75d6151
delete unused code
seunghwak Aug 18, 2024
0f88988
add an option to skip edge shuffling in R-mat edge list generation
seunghwak Aug 18, 2024
180ece1
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Aug 18, 2024
2efb51e
fix build error
seunghwak Aug 19, 2024
106a6ad
fix documentation error
seunghwak Aug 19, 2024
98419cb
add a query function
seunghwak Aug 19, 2024
8193a91
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Aug 20, 2024
9625e0c
bug fix
seunghwak Aug 20, 2024
03cfe0d
bug fix
seunghwak Aug 20, 2024
c15305f
bug fixes
seunghwak Aug 21, 2024
4a1f150
bug fix
seunghwak Aug 21, 2024
29b6834
replace offset vector communication with local computing
seunghwak Aug 21, 2024
fcc75e0
add tmp perf measurement code
seunghwak Aug 21, 2024
710eb88
map GPUs on minor_comm to consecutive GPUs
seunghwak Aug 21, 2024
d040110
additional performance tuning
seunghwak Aug 22, 2024
ca816dd
add a utility function
seunghwak Aug 22, 2024
7712c38
fix build error
seunghwak Aug 22, 2024
31a5955
fix build error
seunghwak Aug 22, 2024
ac33784
bug fix
seunghwak Aug 23, 2024
6d8c7ef
perf experiment
seunghwak Aug 23, 2024
6bcdbe7
perf measurement code update
seunghwak Aug 23, 2024
3a950a5
rename [vertex_first, vertex_last) in fill|update_edge_src|dst_proper…
seunghwak Aug 23, 2024
d27a5e3
update fill|update_edge_minor_property to optionally use bitmap to br…
seunghwak Aug 24, 2024
97022f5
add missing includes
seunghwak Aug 24, 2024
ecf76f8
specialization for bool
seunghwak Aug 24, 2024
350f17e
add asynchronous copy_if
seunghwak Aug 27, 2024
93f726f
fix implicit synchronization in multi-stream execution
seunghwak Aug 27, 2024
d022c30
fix implicit synchronizations for multi-stream execution
seunghwak Aug 28, 2024
e53b3b8
delete debug prints
seunghwak Aug 28, 2024
b6e4f28
fix erroneous comments (numbering)
seunghwak Aug 29, 2024
be504cc
reduce memory footprint in graph creation
seunghwak Aug 30, 2024
3b151e0
undo temporary change for benchmarking
seunghwak Aug 30, 2024
ad0c879
update comments
seunghwak Aug 30, 2024
ce4ea93
cosmetic updates
seunghwak Aug 30, 2024
743ebf1
resolve merge conflicts
seunghwak Aug 30, 2024
9445027
update renumbering to use binning in more places
seunghwak Sep 3, 2024
bea1498
Merge branch 'branch-24.10' of https://github.com/rapidsai/cugraph in…
seunghwak Sep 3, 2024
70b1108
Merge branch 'upstream_pr4642' into enh_bfs_mg
seunghwak Sep 3, 2024
28641a6
update detail::extract_transform_v_frontier_e to use multiple CUDA st…
seunghwak Sep 3, 2024
05df778
exec_policy=>exec_policy_nosync
seunghwak Sep 3, 2024
5c4e3bd
performance-tweak detail::extract_transform_v_frontier_e
seunghwak Sep 4, 2024
dc44a7d
update comments
seunghwak Sep 4, 2024
ebcbfb7
improve stream concurrency
seunghwak Sep 4, 2024
3652c33
update copy_if_nosync to take a pointer to store the counter
seunghwak Sep 5, 2024
c689e35
temporary paramter setting for benchmarking
seunghwak Sep 9, 2024
f7b061b
bug fix
seunghwak Sep 9, 2024
20e1c74
add sun_nosync for multi stream execution
seunghwak Sep 9, 2024
9fa4fb4
pre-filter keys
seunghwak Sep 9, 2024
b6a1fb0
multi-stream execution
seunghwak Sep 9, 2024
3f71304
more performance logs
seunghwak Sep 9, 2024
3577699
update logging
seunghwak Sep 10, 2024
42c4d0b
use global comm to shuffle in compute_renumber_map (to avoid P2P buff…
seunghwak Sep 10, 2024
2557668
reduce small memory allocations
seunghwak Sep 11, 2024
0381f22
bug fix
seunghwak Sep 11, 2024
eb822da
temporarily store vertex IDs in 48 bit to cut peak memory usage
seunghwak Sep 12, 2024
a067f08
update v_list bitmap bcast
seunghwak Sep 13, 2024
6c9118e
undo a flag
seunghwak Sep 15, 2024
20721e6
peak memory usage
seunghwak Sep 19, 2024
9d002c5
use approximation in swithcing between topdown & bottomup
seunghwak Sep 23, 2024
9e3574e
update logging
seunghwak Sep 23, 2024
07749f4
peak memory usage
seunghwak Sep 25, 2024
4ddd0a1
improve logging
seunghwak Sep 25, 2024
3bb6602
NCCL bug workaround
seunghwak Sep 25, 2024
8be2a3f
temporary parameter tweaks for testing
seunghwak Sep 25, 2024
7da6fe5
bug fix (when major_comm_size == 1 && minor_comm_size > 1)
seunghwak Sep 26, 2024
eb5354e
undo temporary workarounds
seunghwak Sep 26, 2024
376c028
optimize prep step in fill_edge_minor_property()
seunghwak Sep 27, 2024
453d8df
minor tweaks
seunghwak Sep 27, 2024
171c2b5
update fill_edge_minor_property
seunghwak Sep 28, 2024
4122652
update to use more than one block to process very high degree vertices
seunghwak Sep 29, 2024
146dbaf
additional primitive performance optimizations
seunghwak Sep 30, 2024
1ddb533
bug fix(when major_comm_size == 1)
seunghwak Oct 1, 2024
c741f2b
store hypersparse-segment degree offsets as graph metadata
seunghwak Oct 1, 2024
6474f84
update key list retrieval code
seunghwak Oct 2, 2024
547fb9b
reduce allreduce size
seunghwak Oct 5, 2024
969a943
performance-optimize shuffle&global-reduce in trasnform_reduce_v_fron…
seunghwak Oct 8, 2024
498e54e
pre-compute dcs range bitmap
seunghwak Oct 9, 2024
b911cdd
bug fix
seunghwak Oct 10, 2024
4687949
update unrenumber functions
seunghwak Oct 11, 2024
6c91fba
merge multiple allreduce opreations to a single allreduce operation
seunghwak Oct 12, 2024
0bd734f
update detail::per_v_transform_reudce_e
seunghwak Oct 13, 2024
f1fb13c
added temporary code to configure NCCL to SHARP accelerate minor_comm
seunghwak Oct 18, 2024
f6d2b25
reduce printouts
seunghwak Oct 18, 2024
ecb0c8e
bug fix in per_v_transform_reduce_e
seunghwak Oct 19, 2024
b419afb
update detail::extract_transform_v_frontier_e
seunghwak Oct 20, 2024
7e718b1
fine-tune direction optimizing alpha based on average vertex degree
seunghwak Oct 21, 2024
58f3ac6
performance otimize local computing part of fill_edge_minor_property
seunghwak Oct 21, 2024
b2dca9b
parameter tuning in graph creation
seunghwak Oct 21, 2024
ea0907d
reduce kernel launches in fill_edge_src_minor_property
seunghwak Oct 21, 2024
8a78131
fix a build error
seunghwak Oct 22, 2024
aa13925
minor performance tuning
seunghwak Oct 25, 2024
2db13e9
reduce comm. sync
seunghwak Oct 30, 2024
a51b708
kernel fusion
seunghwak Oct 30, 2024
1526dcf
kernel fusion
seunghwak Oct 31, 2024
a236533
fix regarding concurrent streams
seunghwak Oct 31, 2024
2ab3232
tune direction optimizing alpha
seunghwak Oct 31, 2024
65e36e1
update comments
seunghwak Oct 31, 2024
d690435
perf optimize direction optimizing aux info routines
seunghwak Nov 1, 2024
8ecc412
modify multi-stream kernel launching orders
seunghwak Nov 1, 2024
34c243c
perf. opt. compute_vertex_list_bitmap_info
seunghwak Nov 1, 2024
afa53fc
minor refactor
seunghwak Nov 4, 2024
6265f13
additional performance optimizations
seunghwak Nov 6, 2024
6961fcd
Graph 500 benchmark specific parameter tuning
seunghwak Nov 6, 2024
f570034
misc updates
seunghwak Nov 6, 2024
c22f95c
Graph 500 benchmark driver
seunghwak Nov 6, 2024
c96c72b
build only necessary files
seunghwak Nov 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
686 changes: 340 additions & 346 deletions cpp/CMakeLists.txt

Large diffs are not rendered by default.

139 changes: 127 additions & 12 deletions cpp/include/cugraph/edge_partition_device_view.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
edge_partition_device_view_t(edge_partition_view_t<vertex_t, edge_t, multi_gpu> view)
: detail::edge_partition_device_view_base_t<vertex_t, edge_t>(view.offsets(), view.indices()),
dcs_nzd_vertices_(detail::to_thrust_optional(view.dcs_nzd_vertices())),
dcs_nzd_range_bitmap_(detail::to_thrust_optional(view.dcs_nzd_range_bitmap())),
major_hypersparse_first_(detail::to_thrust_optional(view.major_hypersparse_first())),
major_range_first_(view.major_range_first()),
major_range_last_(view.major_range_last()),
Expand All @@ -218,6 +219,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
MajorIterator major_last,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) return size_t{0};
return dcs_nzd_vertices_ ? thrust::transform_reduce(
rmm::exec_policy(stream),
major_first,
Expand Down Expand Up @@ -250,12 +252,72 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
thrust::plus<size_t>());
}

template <typename MajorIterator>
__host__ void compute_number_of_edges_async(MajorIterator major_first,
MajorIterator major_last,
raft::device_span<size_t> count /* size = 1 */,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) {
RAFT_CUDA_TRY(cudaMemsetAsync(count.data(), 0, sizeof(size_t), stream));
}

rmm::device_uvector<std::byte> d_tmp_storage(0, stream);
size_t tmp_storage_bytes{0};

if (dcs_nzd_vertices_) {
auto local_degree_first = thrust::make_transform_iterator(
major_first,
detail::local_degree_op_t<vertex_t,
edge_t,
size_t /* no limit on majors.size(), so edge_t can overflow */,
multi_gpu,
true>{
this->offsets_, major_range_first_, *dcs_nzd_vertices_, *major_hypersparse_first_});
cub::DeviceReduce::Sum(static_cast<void*>(nullptr),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
d_tmp_storage.resize(tmp_storage_bytes, stream);
cub::DeviceReduce::Sum(d_tmp_storage.data(),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
} else {
auto local_degree_first = thrust::make_transform_iterator(
major_first,
detail::local_degree_op_t<vertex_t,
edge_t,
size_t /* no limit on majors.size(), so edge_t can overflow */,
multi_gpu,
false>{
this->offsets_, major_range_first_, std::byte{0} /* dummy */, std::byte{0} /* dummy */});
cub::DeviceReduce::Sum(static_cast<void*>(nullptr),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
d_tmp_storage.resize(tmp_storage_bytes, stream);
cub::DeviceReduce::Sum(d_tmp_storage.data(),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
}
}

__host__ rmm::device_uvector<edge_t> compute_local_degrees(rmm::cuda_stream_view stream) const
{
rmm::device_uvector<edge_t> local_degrees(this->major_range_size(), stream);
if (dcs_nzd_vertices_) {
assert(major_hypersparse_first_);
thrust::transform(rmm::exec_policy(stream),
thrust::transform(rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -266,7 +328,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
major_hypersparse_first_.value_or(vertex_t{0})});
} else {
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -284,7 +346,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
rmm::device_uvector<edge_t> local_degrees(thrust::distance(major_first, major_last), stream);
if (dcs_nzd_vertices_) {
assert(major_hypersparse_first_);
thrust::transform(rmm::exec_policy(stream),
thrust::transform(rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand All @@ -295,7 +357,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
major_hypersparse_first_.value_or(vertex_t{0})});
} else {
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand All @@ -311,6 +373,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
MajorIterator major_last,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) return size_t{0};
return dcs_nzd_vertices_ ? thrust::transform_reduce(
rmm::exec_policy(stream),
major_first,
Expand Down Expand Up @@ -355,7 +418,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
if (dcs_nzd_vertices_) {
assert(major_hypersparse_first_);
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -368,7 +431,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
mask_first});
} else {
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -394,7 +457,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
if (dcs_nzd_vertices_) {
assert(major_hypersparse_first_);
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand All @@ -407,7 +470,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
mask_first});
} else {
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand Down Expand Up @@ -515,6 +578,8 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
return minor_range_first_ + minor_offset;
}

// FIxME: better return thrust::optional<raft::device_span<vertex_t const>> for consistency (see
// dcs_nzd_range_bitmap())
__host__ __device__ thrust::optional<vertex_t const*> dcs_nzd_vertices() const
{
return dcs_nzd_vertices_ ? thrust::optional<vertex_t const*>{(*dcs_nzd_vertices_).data()}
Expand All @@ -528,10 +593,20 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
: thrust::nullopt;
}

__host__ __device__ thrust::optional<raft::device_span<uint32_t const>> dcs_nzd_range_bitmap()
const
{
return dcs_nzd_range_bitmap_
? thrust::make_optional<raft::device_span<uint32_t const>>(
(*dcs_nzd_range_bitmap_).data(), (*dcs_nzd_range_bitmap_).size())
: thrust::nullopt;
}

private:
// should be trivially copyable to device

thrust::optional<raft::device_span<vertex_t const>> dcs_nzd_vertices_{thrust::nullopt};
thrust::optional<raft::device_span<uint32_t const>> dcs_nzd_range_bitmap_{thrust::nullopt};
thrust::optional<vertex_t> major_hypersparse_first_{thrust::nullopt};

vertex_t major_range_first_{0};
Expand All @@ -558,6 +633,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
MajorIterator major_last,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) return size_t{0};
return thrust::transform_reduce(
rmm::exec_policy(stream),
major_first,
Expand All @@ -574,10 +650,48 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
thrust::plus<size_t>());
}

template <typename MajorIterator>
__host__ void compute_number_of_edges_async(MajorIterator major_first,
MajorIterator major_last,
raft::device_span<size_t> count /* size = 1 */,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) {
RAFT_CUDA_TRY(cudaMemsetAsync(count.data(), 0, sizeof(size_t), stream));
}

rmm::device_uvector<std::byte> d_tmp_storage(0, stream);
size_t tmp_storage_bytes{0};

auto local_degree_first = thrust::make_transform_iterator(
major_first,
detail::local_degree_op_t<vertex_t,
edge_t,
size_t /* no limit on majors.size(), so edge_t can overflow */,
multi_gpu,
false>{this->offsets_,
std::byte{0} /* dummy */,
std::byte{0} /* dummy */,
std::byte{0} /* dummy */});
cub::DeviceReduce::Sum(static_cast<void*>(nullptr),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
d_tmp_storage.resize(tmp_storage_bytes, stream);
cub::DeviceReduce::Sum(d_tmp_storage.data(),
tmp_storage_bytes,
local_degree_first,
count.data(),
thrust::distance(major_first, major_last),
stream);
}

__host__ rmm::device_uvector<edge_t> compute_local_degrees(rmm::cuda_stream_view stream) const
{
rmm::device_uvector<edge_t> local_degrees(this->major_range_size(), stream);
thrust::transform(rmm::exec_policy(stream),
thrust::transform(rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -595,7 +709,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
rmm::cuda_stream_view stream) const
{
rmm::device_uvector<edge_t> local_degrees(thrust::distance(major_first, major_last), stream);
thrust::transform(rmm::exec_policy(stream),
thrust::transform(rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand All @@ -613,6 +727,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
MajorIterator major_last,
rmm::cuda_stream_view stream) const
{
if (thrust::distance(major_first, major_last) == 0) return size_t{0};
return thrust::transform_reduce(
rmm::exec_policy(stream),
major_first,
Expand All @@ -638,7 +753,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
{
rmm::device_uvector<edge_t> local_degrees(this->major_range_size(), stream);
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
thrust::make_counting_iterator(this->major_range_first()),
thrust::make_counting_iterator(this->major_range_last()),
local_degrees.begin(),
Expand All @@ -660,7 +775,7 @@ class edge_partition_device_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t
{
rmm::device_uvector<edge_t> local_degrees(thrust::distance(major_first, major_last), stream);
thrust::transform(
rmm::exec_policy(stream),
rmm::exec_policy_nosync(stream),
major_first,
major_last,
local_degrees.begin(),
Expand Down
8 changes: 8 additions & 0 deletions cpp/include/cugraph/edge_partition_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ class edge_partition_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t<multi_
edge_partition_view_t(raft::device_span<edge_t const> offsets,
raft::device_span<vertex_t const> indices,
std::optional<raft::device_span<vertex_t const>> dcs_nzd_vertices,
std::optional<raft::device_span<uint32_t const>> dcs_nzd_range_bitmap,
std::optional<vertex_t> major_hypersparse_first,
vertex_t major_range_first,
vertex_t major_range_last,
Expand All @@ -64,6 +65,7 @@ class edge_partition_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t<multi_
vertex_t major_value_start_offset)
: detail::edge_partition_view_base_t<vertex_t, edge_t>(offsets, indices),
dcs_nzd_vertices_(dcs_nzd_vertices),
dcs_nzd_range_bitmap_(dcs_nzd_range_bitmap),
major_hypersparse_first_(major_hypersparse_first),
major_range_first_(major_range_first),
major_range_last_(major_range_last),
Expand All @@ -78,6 +80,11 @@ class edge_partition_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t<multi_
return dcs_nzd_vertices_;
}

std::optional<raft::device_span<uint32_t const>> dcs_nzd_range_bitmap() const
{
return dcs_nzd_range_bitmap_;
}

std::optional<vertex_t> major_hypersparse_first() const { return major_hypersparse_first_; }

vertex_t major_range_first() const { return major_range_first_; }
Expand All @@ -90,6 +97,7 @@ class edge_partition_view_t<vertex_t, edge_t, multi_gpu, std::enable_if_t<multi_
private:
// relevant only if we use the CSR + DCSR (or CSC + DCSC) hybrid format
std::optional<raft::device_span<vertex_t const>> dcs_nzd_vertices_{std::nullopt};
std::optional<raft::device_span<uint32_t const>> dcs_nzd_range_bitmap_{std::nullopt};
std::optional<vertex_t> major_hypersparse_first_{std::nullopt};

vertex_t major_range_first_{0};
Expand Down
Loading
Loading