Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation issue - found disconnected communities #2

Open
jgbradley1 opened this issue Nov 25, 2024 · 2 comments
Open

Implementation issue - found disconnected communities #2

jgbradley1 opened this issue Nov 25, 2024 · 2 comments

Comments

@jgbradley1
Copy link

jgbradley1 commented Nov 25, 2024

This is a great start at reducing the memory footprint of Leiden :). I ran several tests using the europe_osm.mtx graph with the main branch. When using 16 threads, I see a reduction in memory (~16.23 GB for default Leiden vs ~9.71 GB for the low memory version). However, there is an implementation problem - I am seeing disconnected communities in the final output of community memberships ☹️.

If you add a print statement to calculate and display the number of disconnected communities, similar to what you did in your other repo, you should be able to reproduce the problem.

I validated this issue using two approaches.

  1. I modified the code to save off the community memberships to a file and read the results back in via python. I use NetworkX.is_connected to check the connectedness of each community identified by Leiden.
  2. I added calls to your own function to checks for disconnected communities and it reports a non-zero number.

The behavior I'm seeing is not reproducible in every experimental run of low memory gve leiden. The extra checks I performed above pass sometimes but also fail randomly. If I run the same experiment multiple times, I have found disconnected communities in ~2/5 times.

What is the best way set a random number generator seed to have reproducible output? (I see where the RNG is defined but it doesn’t appear to actually get used).

Any thoughts on where the bug might be? Happy to share more information if there's anything I can provide.

P.S. If possible, I recommend updating the mtx code to read in .mtx data and treat vertex ID's as 0-based instead of 1-based. I've noticed other traditional graph tools convert mtx files to 0-based vertex ID's as a standard. It reduces the likelihood of an off-by-1 error and allows for better integration with other tools in the OSS community.

@jgbradley1 jgbradley1 changed the title Implementation issue - finding disconnected communities Implementation issue - found disconnected communities Nov 26, 2024
@wolfram77
Copy link
Member

One possibility I can think of is race conditions in the refinement phase. If this is the case, then suitable atomic operations need to be used (in the refinement phase at least). Another possibility is that the input graph is not exactly undirected - the parallel symmetrize may be incorrect.

@jgbradley1 Thanks for a detailed report on this. Do post an update here if you find the bug. I will keep looking as well.

@wolfram77
Copy link
Member

I am seeing the disconnected communities on road networks and protien k-mer graphs (see Logs).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants