Address Leiden clustering generating too many clusters #4730

ChuckHastings · 2024-10-18T18:37:08Z

Our implementation of Leiden was generating too many clusters. This was not obvious in smaller graphs, but as the graphs get larger the problem became more noticeable.

The Leiden loop was terminating if the modularity stopped improving. But the Leiden algorithm as defined in the paper allows the refinement phase to reduce modularity in order to improve the quality of the clusters. The convergence criteria defined in the paper was based on making no changes on the iteration rather than strictly monitoring modularity change.

Updating this criteria results in the Leiden algorithm running more iterations and converging on better answers.

Closes #4529

…a requirement during the refinement phase

naimnv

Look good to me.
It simplify the termination logic and compute the final modularity ate the end.

jnke2016

Looks good to me

seunghwak

LGTM, any reason to defer addressing the newly added FIXME statement?

seunghwak · 2024-10-22T22:53:06Z

cpp/src/community/detail/refine_impl.cuh

@@ -230,6 +229,7 @@ refine_clustering(
    cugraph::reduce_op::plus<weight_t>{},
    weighted_cut_of_vertices_to_louvain.begin());

+  // FIXME: Consider using bit mask logic here.  Would reduce memory by 8x


Yes, and also a higher chance to fit in L2 cache.

We have multiple utility functions to support this.

https://github.com/rapidsai/cugraph/blob/branch-24.12/cpp/include/cugraph/utilities/packed_bool_utils.hpp

seunghwak · 2024-10-22T22:54:12Z

cpp/src/community/detail/refine_impl.cuh

+  //  a direct lookup in louvain_assignment_of_vertices using
+  //     leiden - graph_view.local_vertex_partition_range_first() as the
+  //     index?
+  // Changing this would save memory and time


Any reason to defer the update?

Trying to get a fix out to a user question. General Leiden improvements isn't a priority right now.

ChuckHastings · 2024-10-23T17:16:05Z

/merge

change convergence criteria for Leiden. Increasing modularity is not …

b479e31

…a requirement during the refinement phase

github-actions bot added the cuGraph label Oct 18, 2024

ChuckHastings marked this pull request as ready for review October 18, 2024 18:37

ChuckHastings requested a review from a team as a code owner October 18, 2024 18:37

ChuckHastings requested review from naimnv and seunghwak October 18, 2024 18:37

ChuckHastings self-assigned this Oct 18, 2024

ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels Oct 18, 2024

ChuckHastings added this to the 24.12 milestone Oct 18, 2024

naimnv approved these changes Oct 22, 2024

View reviewed changes

jnke2016 approved these changes Oct 22, 2024

View reviewed changes

seunghwak approved these changes Oct 22, 2024

View reviewed changes

rapids-bot bot merged commit 7390ae2 into rapidsai:branch-24.12 Oct 23, 2024
132 checks passed

beckernick mentioned this pull request Nov 5, 2024

Leiden clustering yielding too many clusters scverse/rapids_singlecell#286

Open

abs51295 mentioned this pull request Nov 27, 2024

[BUG]: Leiden clustering numbering is off #4791

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address Leiden clustering generating too many clusters #4730

Address Leiden clustering generating too many clusters #4730

ChuckHastings commented Oct 18, 2024 •

edited

Loading

naimnv left a comment

jnke2016 left a comment

seunghwak left a comment

seunghwak Oct 22, 2024

seunghwak Oct 22, 2024

seunghwak Oct 22, 2024

ChuckHastings Oct 23, 2024

ChuckHastings commented Oct 23, 2024

Address Leiden clustering generating too many clusters #4730

Address Leiden clustering generating too many clusters #4730

Conversation

ChuckHastings commented Oct 18, 2024 • edited Loading

naimnv left a comment

Choose a reason for hiding this comment

jnke2016 left a comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

seunghwak Oct 22, 2024

Choose a reason for hiding this comment

seunghwak Oct 22, 2024

Choose a reason for hiding this comment

seunghwak Oct 22, 2024

Choose a reason for hiding this comment

ChuckHastings Oct 23, 2024

Choose a reason for hiding this comment

ChuckHastings commented Oct 23, 2024

ChuckHastings commented Oct 18, 2024 •

edited

Loading