[WIP] Memory footprint reduction in graph creation #4642
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we create a graph from just edge lists (no vertex lists), we find unique vertices from the given edge lists. This can significantly increase the peak memory requirement under the current implementation.
We currently find unique vertices locally in each GPU. We create a local copy of the edges (for each partition), and use sort& unique to find unique vertices. We can cut memory footprint here by hash based binning, and copying only the edges that belong to the current bin in finding unique vertices that belong to the current bin.
Once we find locally unique vertices, we need to shuffle the unique vertices to different GPUs. In each GPU, multiple locally unique vertex vectors are gathered, and there can be multiple duplicate vertices across the vectors. These duplicates can significantly increase the peak memory usage especially when the average vertex degree is low and the number of GPUs is large. This PR performs shuffling separately for different bins keep the temporary memory usage as
edge list size
*max(1/# bins, 2/# local edge partitions)
.We can control # bins to trade-off computing (we need to scan edge list # bins times) vs peak memory usage.