You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a great start at reducing the memory footprint of Leiden :). I ran several tests using the europe_osm.mtx graph with the main branch. When using 16 threads, I see a reduction in memory (~16.23 GB for default Leiden vs ~9.71 GB for the low memory version). However, there is an implementation problem - I am seeing disconnected communities in the final output of community memberships ☹️.
If you add a print statement to calculate and display the number of disconnected communities, similar to what you did in your other repo, you should be able to reproduce the problem.
I validated this issue using two approaches.
I modified the code to save off the community memberships to a file and read the results back in via python. I use NetworkX.is_connected to check the connectedness of each community identified by Leiden.
I added calls to your own function to checks for disconnected communities and it reports a non-zero number.
The behavior I'm seeing is not reproducible in every experimental run of low memory gve leiden. The extra checks I performed above pass sometimes but also fail randomly. If I run the same experiment multiple times, I have found disconnected communities in ~2/5 times.
What is the best way set a random number generator seed to have reproducible output? (I see where the RNG is defined but it doesn’t appear to actually get used).
Any thoughts on where the bug might be? Happy to share more information if there's anything I can provide.
P.S. If possible, I recommend updating the mtx code to read in .mtx data and treat vertex ID's as 0-based instead of 1-based. I've noticed other traditional graph tools convert mtx files to 0-based vertex ID's as a standard. It reduces the likelihood of an off-by-1 error and allows for better integration with other tools in the OSS community.
The text was updated successfully, but these errors were encountered:
jgbradley1
changed the title
Implementation issue - finding disconnected communities
Implementation issue - found disconnected communities
Nov 26, 2024
One possibility I can think of is race conditions in the refinement phase. If this is the case, then suitable atomic operations need to be used (in the refinement phase at least). Another possibility is that the input graph is not exactly undirected - the parallel symmetrize may be incorrect.
@jgbradley1 Thanks for a detailed report on this. Do post an update here if you find the bug. I will keep looking as well.
This is a great start at reducing the memory footprint of Leiden :). I ran several tests using the☹️ .
europe_osm.mtx
graph with themain
branch. When using 16 threads, I see a reduction in memory (~16.23 GB for default Leiden vs ~9.71 GB for the low memory version). However, there is an implementation problem - I am seeing disconnected communities in the final output of community membershipsIf you add a print statement to calculate and display the number of disconnected communities, similar to what you did in your other repo, you should be able to reproduce the problem.
I validated this issue using two approaches.
The behavior I'm seeing is not reproducible in every experimental run of low memory gve leiden. The extra checks I performed above pass sometimes but also fail randomly. If I run the same experiment multiple times, I have found disconnected communities in ~2/5 times.
What is the best way set a random number generator seed to have reproducible output? (I see where the RNG is defined but it doesn’t appear to actually get used).
Any thoughts on where the bug might be? Happy to share more information if there's anything I can provide.
P.S. If possible, I recommend updating the mtx code to read in .mtx data and treat vertex ID's as 0-based instead of 1-based. I've noticed other traditional graph tools convert mtx files to 0-based vertex ID's as a standard. It reduces the likelihood of an off-by-1 error and allows for better integration with other tools in the OSS community.
The text was updated successfully, but these errors were encountered: