Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for OOM during CAGRA benchmarks #1832

Merged
merged 4 commits into from
Sep 25, 2023

Conversation

benfred
Copy link
Member

@benfred benfred commented Sep 19, 2023

Running the CAGRA benchmarks and there could be OOM errors on GPU memory with large datasets. This is caused by holding multiple copies of the dataset in GPU memory. Fix by:

  • Free existing memory for the dataset/graph before allocating new memory during update_dataset/update_grph
  • On deserialize, if the serialized index doesn't contain the dataset - don't allocate GPU memory for it
  • Don't call update_dataset repeatedly in the benchmarking code with the same dataset

Running the CAGRA benchmarks and there could be OOM errors on GPU memory with large datasets.
This is caused by holding multiple copies of the dataset in GPU memory. Fix by:

* Free existing memory for the dataset/graph before allocating new memory during update_dataset/update_grph
* On deserialize, if the serialized index doesn't contain the dataset - don't allocate GPU memory for it
* Don't call update_dataset repeatedly in the benchmarking code with the same dataset
@benfred benfred requested a review from a team as a code owner September 19, 2023 04:28
@benfred benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 19, 2023
@github-actions github-actions bot added the cpp label Sep 19, 2023
cpp/bench/ann/src/common/benchmark.hpp Outdated Show resolved Hide resolved
@cjnolet cjnolet dismissed achirkin’s stale review September 25, 2023 17:01

Important for release and Artem is OOO. Creating an issue to follow up and explore the best solution.

@cjnolet
Copy link
Member

cjnolet commented Sep 25, 2023

/merge

@rapids-bot rapids-bot bot merged commit dfde3b4 into rapidsai:branch-23.10 Sep 25, 2023
54 checks passed
@benfred
Copy link
Member Author

benfred commented Sep 25, 2023

Created an issue to track the dataset copies here #1848 -

@benfred benfred deleted the cagra_benchmarks_oom_fix branch September 25, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants