GitHub - puzzlef/rak-lowmem-communities-cuda: Design of memory-efficient CUDA-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

Design of memory-efficient CUDA-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

Community detection involves grouping nodes in a graph with dense connections within groups, than between them. We previously proposed efficient multicore (GVE-LPA) and GPU-based (ν-LPA) implementations of Label Propagation Algorithm (LPA) for community detection. However, these methods incur high memory overhead due to their per-thread/per-vertex hashtables. This makes it challenging to process large graphs on shared memory systems. In this report, we introduce memory-efficient GPU-based LPA implementations, using weighted Boyer-Moore (BM) and Misra-Gries (MG) sketches. Our new implementation, νMG8-LPA, using an 8-slot MG sketch, reduces memory usage by 98x and 44x compared to GVE-LPA and ν-LPA, respectively. It is also 2.4x faster than GVE-LPA and only 1.1x slower than ν-LPA, with minimal quality loss (4.7%/2.9% drop compared to GVE-LPA/ν-LPA).

Below we plot the memory usage of NetworKit LPA, GVE-LPA, ν-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA on 13 different graphs. 𝜈MG8-LPA and 𝜈BM-LPA achieve, on average, 2.2x, 98x, and 44x lower memory usage than NetworKit LPA, GVE-LPA, and 𝜈-LPA.

Next, we plot the time taken by NetworKit LPA, GVE-LPA, ν-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA on 13 different graphs. On average, 𝜈BM-LPA is 186x, 9.0x, 3.5x, and 3.7x faster than NetworKit LPA, GVE-LPA, 𝜈-LPA, and 𝜈MG8-LPA, respectively; while 𝜈MG8-LPA is 51x and 2.4x faster than NetworKit LPA and GVE-LPA, but 1.1x and 3.7x slower than 𝜈-LPA and 𝜈BM-LPA.

Further, we plot the speedup of 𝜈MG8-LPA over NetworKit LPA, GVE-LPA, and 𝜈-LPA.

Finally, we plot the modularity of communities identified by NetworKit LPA, GVE-LPA, 𝜈-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA. 𝜈BM-LPA identifies communities that are of 27%, 24%, 23%, and 20% lower quality, respectively. However, the communities identified by 𝜈MG8-LPA are only 8.4%, 4.7%, and 2.9% lower in quality than NetworKit LPA, GVE-LPA, and 𝜈-LPA, and 25% higher than 𝜈BM-LPA.

Refer to our technical report for more details:
Memory Efficient GPU-based Label Propagation Algorithm (LPA) for Community Detection on Large Graphs.

Note

You can just copy main.sh to your system and run it.
For the code, refer to main.cxx.

Code structure

The code structure of νMG-LPA / νBM-LPA is as follows:

# Main files
- main.sh: Shell script for running experiments
- process.js: Node.js script for processing output logs
- main.cxx: Experimentation code

# Key algorithms
- rakLowmemCuda.hxx: Memory-efficient GPU-based LPA, i.e., νMG-LPA, νBM-LPA
- rakLowmem.hxx: Memory-efficient CPU-based LPA, i.e., MG-LPA, BM-LPA
- rakCuda.hxx: GPU-based LPA, i.e., ν-LPA
- rak.hxx: CPU-based LPA, i.e., GVE-LPA
- sketchCuda.hxx: Misra-Gries sketch for νMG-LPA
- hashtableCuda.hxx: Hashtable for ν-LPA

# Common graph operations
- inc/main.hxx: Main header
- inc/Graph.hxx: Graph data structure functions
- inc/mtx.hxx: Graph file reading functions
- inc/update.hxx: Update functions
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/symmetrize.hxx: Graph Symmetrization functions
- inc/transpose.hxx: Graph transpose functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/properties.hxx: Graph Property functions
- inc/batch.hxx: Batch update generation functions

# Support headers
- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
inc		inc
.gitignore		.gitignore
CITATION.bib		CITATION.bib
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
main.cxx		main.cxx
main.sh		main.sh
process.js		process.js
system.txt		system.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code structure

References

About

Languages

License

puzzlef/rak-lowmem-communities-cuda

Folders and files

Latest commit

History

Repository files navigation

Code structure

References

About

Resources

License

Stars

Watchers

Forks

Languages