Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Memory footprint reduction in graph creation #4642

Closed
wants to merge 6 commits into from

Conversation

seunghwak
Copy link
Contributor

When we create a graph from just edge lists (no vertex lists), we find unique vertices from the given edge lists. This can significantly increase the peak memory requirement under the current implementation.

We currently find unique vertices locally in each GPU. We create a local copy of the edges (for each partition), and use sort& unique to find unique vertices. We can cut memory footprint here by hash based binning, and copying only the edges that belong to the current bin in finding unique vertices that belong to the current bin.

Once we find locally unique vertices, we need to shuffle the unique vertices to different GPUs. In each GPU, multiple locally unique vertex vectors are gathered, and there can be multiple duplicate vertices across the vectors. These duplicates can significantly increase the peak memory usage especially when the average vertex degree is low and the number of GPUs is large. This PR performs shuffling separately for different bins keep the temporary memory usage as edge list size * max(1/# bins, 2/# local edge partitions).

We can control # bins to trade-off computing (we need to scan edge list # bins times) vs peak memory usage.

@seunghwak
Copy link
Contributor Author

This PR is merged into PR #4619

@seunghwak seunghwak closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant