-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787
Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787
Conversation
@@ -130,7 +130,6 @@ rmm::cuda_device_id get_current_cuda_device() | |||
*/ | |||
struct cuda_event { | |||
cuda_event() { CUDF_CUDA_TRY(cudaEventCreateWithFlags(&e_, cudaEventDisableTiming)); } | |||
virtual ~cuda_event() { CUDF_ASSERT_CUDA_SUCCESS(cudaEventDestroy(e_)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (maybe): This means that every usage of cuda_event
in the code base will leak the underlying event. Is this really what we want? Perhaps we only want to leak the thread_local
ones below.
That is, should we change event_for_thread
to do (approximately):
thread_local std::vector<cuda_event *> thread_events(get_num_cuda_devices());
...
thread_events[device_id.value()] = new cuda_event();
?
Alternately, if the only usage of cuda_event
is in event_for_thread
, we could just delete this class completely and implement event_for_thread
as:
thread_local std::vector<cudaEvent_t> thread_events(...);
if (!thread_events[device_id.value()]) {
CUDF_CUDA_TRY(cudaEventCreateWithFlags(&thread_events[device_id.value()], ...);
}
return thread_events[device_id.value()];
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda_event
is only used as the RAII wrapper by event_for_thread
, so deleting the user-defined destructor and allocating cuda_event
on the heap happen to be equivalent. I agree that the leak makes the cuda_event
wrapper entirely superfluous, so I will use the alternative approach you suggested to make the code cleaner. Thank you @wence- !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed
cpp/src/utilities/stream_pool.cpp
Outdated
@@ -147,12 +136,13 @@ struct cuda_event { | |||
*/ | |||
cudaEvent_t event_for_thread() | |||
{ | |||
thread_local std::vector<std::unique_ptr<cuda_event>> thread_events(get_num_cuda_devices()); | |||
thread_local std::vector<cudaEvent_t> thread_events(get_num_cuda_devices()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should keep cuda_event
and only change thread_events to std::vector<cuda_event*>
. cuda_event
is a useful class and there are other places in libcudf where we can utilize it. I'd rather just disable RAII in this specific place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Should cuda_event
be moved to the header file include/cudf/detail/utilities/stream_pool.hpp
? It's currently defined in .cpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should move it at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, this event RAII wrapper should be moved to a separate header file if found useful. I have some doubt on its necessity after a brief search in the cudf repo. Anway, for this PR, I'm keeping its location unchanged, and have applied the change you suggested. @vuule
00b0549
to
442db74
Compare
/ok to test |
The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce: ``` ./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1 ``` The root cause is that some (1) `thread_local` objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA. This simple PR is the kvikIO side of the fix. The other part is done here rapidsai/cudf#16787. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #462
/ok to test |
442db74
to
eb8bcdf
Compare
1a1731b
to
5c28023
Compare
/ok to test |
/ok to test |
/merge |
rapidsai#16787) The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce: ``` ./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1 ``` The root cause is that some (1) `thread_local` objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA. This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462. closes rapidsai#13229 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#16787
Description
The NVbench application
PARQUET_READER_NVBENCH
in libcudf currently crashes with the segmentation fault. To reproduce:The root cause is that some (1)
thread_local
objects on the main thread inlibcudf
and (2)static
objects inkvikio
are destroyed aftercudaDeviceReset()
in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.
closes #13229
Checklist