-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address Leiden clustering generating too many clusters #4730
Address Leiden clustering generating too many clusters #4730
Conversation
…a requirement during the refinement phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me.
It simplify the termination logic and compute the final modularity ate the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, any reason to defer addressing the newly added FIXME statement?
@@ -230,6 +229,7 @@ refine_clustering( | |||
cugraph::reduce_op::plus<weight_t>{}, | |||
weighted_cut_of_vertices_to_louvain.begin()); | |||
|
|||
// FIXME: Consider using bit mask logic here. Would reduce memory by 8x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and also a higher chance to fit in L2 cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple utility functions to support this.
// a direct lookup in louvain_assignment_of_vertices using | ||
// leiden - graph_view.local_vertex_partition_range_first() as the | ||
// index? | ||
// Changing this would save memory and time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to defer the update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to get a fix out to a user question. General Leiden improvements isn't a priority right now.
/merge |
Our implementation of Leiden was generating too many clusters. This was not obvious in smaller graphs, but as the graphs get larger the problem became more noticeable.
The Leiden loop was terminating if the modularity stopped improving. But the Leiden algorithm as defined in the paper allows the refinement phase to reduce modularity in order to improve the quality of the clusters. The convergence criteria defined in the paper was based on making no changes on the iteration rather than strictly monitoring modularity change.
Updating this criteria results in the Leiden algorithm running more iterations and converging on better answers.
Closes #4529