Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Leiden clustering generating too many clusters #4730

Merged

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Oct 18, 2024

Our implementation of Leiden was generating too many clusters. This was not obvious in smaller graphs, but as the graphs get larger the problem became more noticeable.

The Leiden loop was terminating if the modularity stopped improving. But the Leiden algorithm as defined in the paper allows the refinement phase to reduce modularity in order to improve the quality of the clusters. The convergence criteria defined in the paper was based on making no changes on the iteration rather than strictly monitoring modularity change.

Updating this criteria results in the Leiden algorithm running more iterations and converging on better answers.

Closes #4529

@ChuckHastings ChuckHastings marked this pull request as ready for review October 18, 2024 18:37
@ChuckHastings ChuckHastings requested a review from a team as a code owner October 18, 2024 18:37
@ChuckHastings ChuckHastings self-assigned this Oct 18, 2024
@ChuckHastings ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels Oct 18, 2024
@ChuckHastings ChuckHastings added this to the 24.12 milestone Oct 18, 2024
Copy link
Contributor

@naimnv naimnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me.
It simplify the termination logic and compute the final modularity ate the end.

Copy link
Contributor

@jnke2016 jnke2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, any reason to defer addressing the newly added FIXME statement?

@@ -230,6 +229,7 @@ refine_clustering(
cugraph::reduce_op::plus<weight_t>{},
weighted_cut_of_vertices_to_louvain.begin());

// FIXME: Consider using bit mask logic here. Would reduce memory by 8x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and also a higher chance to fit in L2 cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// a direct lookup in louvain_assignment_of_vertices using
// leiden - graph_view.local_vertex_partition_range_first() as the
// index?
// Changing this would save memory and time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to defer the update?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to get a fix out to a user question. General Leiden improvements isn't a priority right now.

@ChuckHastings
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 7390ae2 into rapidsai:branch-24.12 Oct 23, 2024
132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuGraph non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QST]: Leiden clustering before and after 23.02
4 participants