Skip to content

Design of memory-efficient CUDA-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

License

Notifications You must be signed in to change notification settings

puzzlef/rak-lowmem-communities-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design of memory-efficient CUDA-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

Community detection involves grouping nodes in a graph with dense connections within groups, than between them. We previously proposed efficient multicore (GVE-LPA) and GPU-based (ν-LPA) implementations of Label Propagation Algorithm (LPA) for community detection. However, these methods incur high memory overhead due to their per-thread/per-vertex hashtables. This makes it challenging to process large graphs on shared memory systems. In this report, we introduce memory-efficient GPU-based LPA implementations, using weighted Boyer-Moore (BM) and Misra-Gries (MG) sketches. Our new implementation, νMG8-LPA, using an 8-slot MG sketch, reduces memory usage by 98x and 44x compared to GVE-LPA and ν-LPA, respectively. It is also 2.4x faster than GVE-LPA and only 1.1x slower than ν-LPA, with minimal quality loss (4.7%/2.9% drop compared to GVE-LPA/ν-LPA).

Below we plot the memory usage of NetworKit LPA, GVE-LPA, ν-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA on 13 different graphs. 𝜈MG8-LPA and 𝜈BM-LPA achieve, on average, 2.2x, 98x, and 44x lower memory usage than NetworKit LPA, GVE-LPA, and 𝜈-LPA.

Next, we plot the time taken by NetworKit LPA, GVE-LPA, ν-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA on 13 different graphs. On average, 𝜈BM-LPA is 186x, 9.0x, 3.5x, and 3.7x faster than NetworKit LPA, GVE-LPA, 𝜈-LPA, and 𝜈MG8-LPA, respectively; while 𝜈MG8-LPA is 51x and 2.4x faster than NetworKit LPA and GVE-LPA, but 1.1x and 3.7x slower than 𝜈-LPA and 𝜈BM-LPA.

Further, we plot the speedup of 𝜈MG8-LPA over NetworKit LPA, GVE-LPA, and 𝜈-LPA.

Finally, we plot the modularity of communities identified by NetworKit LPA, GVE-LPA, 𝜈-LPA, 𝜈MG8-LPA, and 𝜈BM-LPA. 𝜈BM-LPA identifies communities that are of 27%, 24%, 23%, and 20% lower quality, respectively. However, the communities identified by 𝜈MG8-LPA are only 8.4%, 4.7%, and 2.9% lower in quality than NetworKit LPA, GVE-LPA, and 𝜈-LPA, and 25% higher than 𝜈BM-LPA.

Refer to our technical report for more details:
Memory Efficient GPU-based Label Propagation Algorithm (LPA) for Community Detection on Large Graphs.


Note

You can just copy main.sh to your system and run it.
For the code, refer to main.cxx.



Code structure

The code structure of νMG-LPA / νBM-LPA is as follows:

# Main files
- main.sh: Shell script for running experiments
- process.js: Node.js script for processing output logs
- main.cxx: Experimentation code

# Key algorithms
- rakLowmemCuda.hxx: Memory-efficient GPU-based LPA, i.e., νMG-LPA, νBM-LPA
- rakLowmem.hxx: Memory-efficient CPU-based LPA, i.e., MG-LPA, BM-LPA
- rakCuda.hxx: GPU-based LPA, i.e., ν-LPA
- rak.hxx: CPU-based LPA, i.e., GVE-LPA
- sketchCuda.hxx: Misra-Gries sketch for νMG-LPA
- hashtableCuda.hxx: Hashtable for ν-LPA

# Common graph operations
- inc/main.hxx: Main header
- inc/Graph.hxx: Graph data structure functions
- inc/mtx.hxx: Graph file reading functions
- inc/update.hxx: Update functions
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/symmetrize.hxx: Graph Symmetrization functions
- inc/transpose.hxx: Graph transpose functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/properties.hxx: Graph Property functions
- inc/batch.hxx: Batch update generation functions

# Support headers
- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG

About

Design of memory-efficient CUDA-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

Resources

License

Stars

Watchers

Forks

Languages