-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MST optimization to improve the connectivity of CAGRA graphs. #2218
Conversation
…proximate MST graph as much as possible
4e787d4
to
440d97d
Compare
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Akira for this PR! Looks good overall, but it would be good to add some more comments and simplify at a few places, so that the code will be easier to maintain. I have some questions and suggestion along these lines.
@@ -224,9 +224,10 @@ template <typename IdxT = uint32_t, | |||
host_device_accessor<std::experimental::default_accessor<IdxT>, memory_type::host>> | |||
void optimize(raft::resources const& res, | |||
mdspan<IdxT, matrix_extent<int64_t>, row_major, g_accessor> knn_graph, | |||
raft::host_matrix_view<IdxT, int64_t, row_major> new_graph) | |||
raft::host_matrix_view<IdxT, int64_t, row_major> new_graph, | |||
const bool use_MST = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enable by default, is it costly to run MST, or can it lead to any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running an initial mst on a knn graph for the purposes of finding connected components should be really fast.
I also wonder if we should be exposing the fact that we are running an MST to guarantee connectivity- that's an implementation detail AFAIC. I suggest we instead name this argument something along the lines of "guarantee_connectivity".
@@ -90,12 +90,12 @@ void search_main_core( | |||
CagraSampleFilterT sample_filter = CagraSampleFilterT()) | |||
{ | |||
RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n", | |
RAFT_LOG_DEBUG("# graph size = %lu, dim = %lu\n", |
@@ -316,12 +571,468 @@ void sort_knn_graph(raft::resources const& res, | |||
RAFT_LOG_DEBUG("# Sorting kNN graph time: %.1lf sec\n", time_sort_end - time_sort_start); | |||
} | |||
|
|||
template <typename IdxT = uint32_t> | |||
void mst_optimization(raft::resources const& res, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a short docstring that describes what this optimization step does, and what approximations we have. Something along the lines:
- We have a kNN graph as input
- We have an approximate MST as output. Approximate, because the number of outgoing edges are limited.
- Additionally the construction method is an approximation (seems similar to Kruskal's algorithm, but we solve in parallel which does not guarantee a global minimum).
- If the input kNN graph is disconnected then random connection is added to the largest cluster.
The last point is confusing: why do we need an MST, if we just add random edges to the main cluster? (After reading further down how the MST is used, it becomes clearer: if we want to keep a set of edges that guarantees connectivity, then it is better to keep the MST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note- we have an mst implementation in RAFT already but it does not preserve the fixed degree constraint. MST is inherently coupled to neighborhood methods and we are planning to move the existing MST over to cuVS. It would be nice if we could also expose a public API for this MST implementation, since it's generally useful for cases like this where we want to guarantee (or increase) connectedness.
We do this trick in our single-linkage clustering algorithm also, with the only caveat that we need to guarantee the smallest edges are added by the MST because the single-linkage has an exact solution.
#pragma omp parallel for | ||
for (uint64_t i = 0; i < graph_size; i++) { | ||
for (uint64_t k = 0; k < mst_graph_degree; k++) { | ||
mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an array access operator that makes this simpler
mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size; | |
mst_graph(i, k) = graph_size; |
uint32_t flag_update = 1; | ||
while (flag_update) { | ||
flag_update = 0; | ||
if (use_gpu) { | ||
stats_ptr[0] = flag_update; | ||
raft::copy(d_stats_ptr, stats_ptr, 1, resource::get_cuda_stream(res)); | ||
|
||
constexpr uint64_t n_threads = 256; | ||
const dim3 threads(n_threads, 1, 1); | ||
const dim3 blocks((graph_size + n_threads - 1) / n_threads, 1, 1); | ||
kern_mst_opt_labeling<<<blocks, threads, 0, resource::get_cuda_stream(res)>>>( | ||
d_label_ptr, d_mst_graph_ptr, graph_size, mst_graph_degree, d_stats_ptr); | ||
|
||
raft::copy(stats_ptr, d_stats_ptr, 1, resource::get_cuda_stream(res)); | ||
resource::sync_stream(res); | ||
flag_update = stats_ptr[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to the existing weak_cc function
raft/cpp/include/raft/sparse/csr.hpp
Line 147 in e977d2e
* @brief Compute weakly connected components. Note that the resulting labels |
|
||
for (uint64_t k = 0; k < output_graph_degree; k++) { | ||
#pragma omp parallel for | ||
for (uint64_t i = 0; i < graph_size; i++) { | ||
dest_nodes.data_handle()[i] = output_graph_ptr[k + (output_graph_degree * i)]; | ||
dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing extra pointers, one could use the array access operator (once compiler is finished, it will boil down to the same pointer arithmetic)
dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)]; | |
dest_nodes(i) = pruned_graph(i, k); |
auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i); | ||
auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i); | ||
auto my_out_graph = output_graph_ptr + (output_graph_degree * i); | ||
uint32_t kf = 0; | ||
uint32_t kr = 0; | ||
uint32_t k = mst_graph_num_edges_ptr[i]; | ||
while (kf < output_graph_degree || kr < output_graph_degree) { | ||
if (kf < output_graph_degree) { | ||
if (my_fwd_graph[kf] < graph_size) { | ||
auto flag_match = false; | ||
for (uint32_t kk = 0; kk < k; kk++) { | ||
if (my_out_graph[kk] == my_fwd_graph[kf]) { | ||
flag_match = true; | ||
break; | ||
} | ||
} | ||
if (!flag_match) { | ||
my_out_graph[k] = my_fwd_graph[kf]; | ||
k += 1; | ||
} | ||
} | ||
kf += 1; | ||
if (k >= output_graph_degree) break; | ||
} | ||
if (kr < output_graph_degree) { | ||
if (my_rev_graph[kr] < graph_size) { | ||
auto flag_match = false; | ||
for (uint32_t kk = 0; kk < k; kk++) { | ||
if (my_out_graph[kk] == my_rev_graph[kr]) { | ||
flag_match = true; | ||
break; | ||
} | ||
} | ||
if (!flag_match) { | ||
my_out_graph[k] = my_rev_graph[kr]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more concise with mdspan array access operator, and a helper for finding duplicate edges:
auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i); | |
auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i); | |
auto my_out_graph = output_graph_ptr + (output_graph_degree * i); | |
uint32_t kf = 0; | |
uint32_t kr = 0; | |
uint32_t k = mst_graph_num_edges_ptr[i]; | |
while (kf < output_graph_degree || kr < output_graph_degree) { | |
if (kf < output_graph_degree) { | |
if (my_fwd_graph[kf] < graph_size) { | |
auto flag_match = false; | |
for (uint32_t kk = 0; kk < k; kk++) { | |
if (my_out_graph[kk] == my_fwd_graph[kf]) { | |
flag_match = true; | |
break; | |
} | |
} | |
if (!flag_match) { | |
my_out_graph[k] = my_fwd_graph[kf]; | |
k += 1; | |
} | |
} | |
kf += 1; | |
if (k >= output_graph_degree) break; | |
} | |
if (kr < output_graph_degree) { | |
if (my_rev_graph[kr] < graph_size) { | |
auto flag_match = false; | |
for (uint32_t kk = 0; kk < k; kk++) { | |
if (my_out_graph[kk] == my_rev_graph[kr]) { | |
flag_match = true; | |
break; | |
} | |
} | |
if (!flag_match) { | |
my_out_graph[k] = my_rev_graph[kr]; | |
/* | |
* template <typename IdxT> | |
* RAFT_INLINE_FUNCTION bool check_duplicate_edge(IdxT* neighbors_ptr, uint32_t k, IdxT node_id) | |
* { | |
* for (uint32_t i = 0; i < k; i++) | |
* if (neighbors_ptr[i] == node_id) return true; | |
* return false; | |
* } | |
*/ | |
uint32_t kf = 0; | |
uint32_t kr = 0; | |
uint32_t k = mst_graph_num_edges_ptr[i]; | |
while (kf < output_graph_degree || kr < output_graph_degree) { | |
if (kf < output_graph_degree) { | |
if (pruned_graph(i, kf) < graph_size) { | |
if (!check_duplicate_edges(&output_graph(i,0), k, pruned_graph(i, kf)) { | |
output_graph(i, k) = pruned_graph(i, kf); | |
k += 1; | |
} | |
} | |
kf += 1; | |
if (k >= output_graph_degree) break; | |
} | |
if (kr < output_graph_degree) { | |
if (rev_graph(i, kr) < graph_size) { | |
if (!check_duplicate_edges(&output_graph(i,0), k, rev_graph(i, kr) { | |
my_out_graph[k] = my_rev_graph[kr]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly, that we leave the MST in output_graph
, and append edges from the pruned and reverse graph? Can we add this to the PR description, and possibly as a comment here as well.
auto my_out_graph = output_graph_ptr + (output_graph_degree * i); | ||
uint32_t kf = 0; | ||
uint32_t kr = 0; | ||
uint32_t k = mst_graph_num_edges_ptr[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code below keeps k
edges from the MST, and fills the rest from the forward and reverse graph. Typically how large is k
compared to output_graph_degree
?
@anaruse now that CAGRA has been migrated to cuVS, this PR will also have to be moved to cuVS when ready (no rush, obviously) |
Cleaining up stale PRs. Please feel free to reopen if you would prefer this isn't closed yet. |
Close #2217
Improve connectivity of CAGRA graphs using approximate degree constrained MST (Minimum Spanning Tree).