Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

16 changes: 12 additions & 4 deletions cpp/src/utilities/stream_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,13 @@ struct cuda_event {
cuda_event() { CUDF_CUDA_TRY(cudaEventCreateWithFlags(&e_, cudaEventDisableTiming)); }
virtual ~cuda_event() { CUDF_ASSERT_CUDA_SUCCESS(cudaEventDestroy(e_)); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (maybe): This means that every usage of cuda_event in the code base will leak the underlying event. Is this really what we want? Perhaps we only want to leak the thread_local ones below.

That is, should we change event_for_thread to do (approximately):

thread_local std::vector<cuda_event *> thread_events(get_num_cuda_devices());
...
thread_events[device_id.value()] = new cuda_event();

?

Alternately, if the only usage of cuda_event is in event_for_thread, we could just delete this class completely and implement event_for_thread as:

thread_local std::vector<cudaEvent_t> thread_events(...);
if (!thread_events[device_id.value()]) {
    CUDF_CUDA_TRY(cudaEventCreateWithFlags(&thread_events[device_id.value()], ...);
}
return thread_events[device_id.value()];

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda_event is only used as the RAII wrapper by event_for_thread, so deleting the user-defined destructor and allocating cuda_event on the heap happen to be equivalent. I agree that the leak makes the cuda_event wrapper entirely superfluous, so I will use the alternative approach you suggested to make the code cleaner. Thank you @wence- !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed


// Moveable but not copyable.
cuda_event(const cuda_event&) = delete;
cuda_event& operator=(const cuda_event&) = delete;

cuda_event(cuda_event&&) = default;
cuda_event& operator=(cuda_event&&) = default;

operator cudaEvent_t() { return e_; }

private:
Expand All @@ -147,11 +154,12 @@ struct cuda_event {
*/
cudaEvent_t event_for_thread()
{
thread_local std::vector<std::unique_ptr<cuda_event>> thread_events(get_num_cuda_devices());
// The program may crash if this function is called from the main thread and user application
// subsequently calls cudaDeviceReset().
// As a workaround, here we intentionally disable RAII and leak cudaEvent_t.
thread_local std::vector<cuda_event*> thread_events(get_num_cuda_devices());
auto const device_id = get_current_cuda_device();
if (not thread_events[device_id.value()]) {
thread_events[device_id.value()] = std::make_unique<cuda_event>();
}
if (not thread_events[device_id.value()]) { thread_events[device_id.value()] = new cuda_event(); }
lamarrr marked this conversation as resolved.
Show resolved Hide resolved
return *thread_events[device_id.value()];
}

Expand Down
Loading