Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace direct cudaMemcpyAsync calls with utility functions (within /src) #17550

Open
wants to merge 27 commits into
base: branch-25.02
Choose a base branch
from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 9, 2024

Description

Replaced the calls to cudaMemcpyAsync with the new cuda_memcpy/cuda_memcpy_async utility, which optionally avoids using the copy engine.

Also took the opportunity to use cudf::detail::host_vector and its factories to enable wider pinned memory use.

Remaining instances are either not viable (e.g. copying h_needs_fallback, interop) or D2D copies.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 9, 2024
@vuule vuule self-assigned this Dec 9, 2024
Copy link

copy-pr-bot bot commented Dec 9, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 9, 2024
@@ -78,7 +78,7 @@ class device_scalar : public rmm::device_scalar<T> {
[[nodiscard]] T value(rmm::cuda_stream_view stream) const
{
cuda_memcpy<T>(bounce_buffer, device_span<T const>{this->data(), 1}, stream);
return bounce_buffer[0];
return std::move(bounce_buffer[0]);
Copy link
Contributor Author

@vuule vuule Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case we want to make a device_scalar of a non-copyable type.

@vuule vuule changed the title Replace direct cudaMemcpyAsync calls with utility functions (limited to /src) Replace direct cudaMemcpyAsync calls with utility functions (within /src) Dec 10, 2024
@vuule vuule marked this pull request as ready for review December 10, 2024 19:21
@vuule vuule requested a review from a team as a code owner December 10, 2024 19:21
cpp/src/bitmask/is_element_valid.cpp Outdated Show resolved Hide resolved
cpp/src/column/column_device_view.cu Outdated Show resolved Hide resolved
cpp/src/scalar/scalar.cpp Show resolved Hide resolved
@vuule vuule requested a review from davidwendt December 10, 2024 22:45
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@vuule vuule requested a review from ttnghia December 14, 2024 00:12
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants