Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to_arrow_device() functions that accept views #15465

Merged
merged 20 commits into from
Apr 22, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Apr 4, 2024

Description

Adds the following new interop functions

unique_device_array_t to_arrow_device(cudf::table_view const& table, 
                                      rmm::cuda_stream_view stream,
                                      rmm::device_async_resource_ref mr);
unique_device_array_t to_arrow_device(cudf::column_view const& col, 
                                      rmm::cuda_stream_view stream,
                                      rmm::device_async_resource_ref mr);

Also refactors some common code with the ownership transfer version of these APIs.
And moves the to_arrow_schema() functions to a separate .cpp file.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 4, 2024
@davidwendt davidwendt self-assigned this Apr 4, 2024
@github-actions github-actions bot added the CMake CMake build issue label Apr 4, 2024
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a peek while it's in draft (and rebased #15370 on these changes). Overall looks great here. I was originally hoping to unify the code paths a bit more, but the view path doesn't look all that verbose and it seems like it wouldn't improve readability much to unify. We could try implementing column equivalents of set_null_mask and set_view_to_buffer as class methods on the original to_arrow_device functor so that the operator specializations could be reused, but I don't really know how well that would work. WDYT?

cpp/include/cudf/interop.hpp Show resolved Hide resolved
cpp/src/interop/to_arrow_device.cu Show resolved Hide resolved
Comment on lines 504 to 505
// TODO: Can't we just not set anything here? The arrow spec says that
// you can leave the pointer void if the size is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant, see #15047 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment. Let me know if I should just delete it.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 10, 2024
@davidwendt davidwendt marked this pull request as ready for review April 10, 2024 20:42
@davidwendt davidwendt requested review from a team as code owners April 10, 2024 20:42
@@ -0,0 +1,231 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from to_arrow_device.cu

}
}
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to to_arrow_schema.cpp

@davidwendt davidwendt requested a review from vyasr April 11, 2024 12:41
Comment on lines 382 to 384
cudf::column_view column;
rmm::cuda_stream_view stream;
rmm::mr::device_memory_resource* mr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason to do this vs passing them as parameters to the functors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought it was a bit cleaner since they are the same for every dispatch function.
Adding/removing parameters to every function got tedious.

Copy link
Contributor

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my nitpick and question, this seems good to me

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small feedback, but nothing worth blocking over except perhaps the release callback question. I worry that we're creating potential UB. I don't know if it's really specified explicitly anywhere in the spec, but throwing a C++ exception in a callback that may be called in C is definitely going to be implementation-defined behavior. Perhaps the best we can do is printing or logging the exception then returning safely? This feels like a hole in the spec that we should perhaps start a discussion about upstream.

cpp/include/cudf/interop.hpp Outdated Show resolved Hide resolved
cpp/src/interop/to_arrow_device.cu Outdated Show resolved Hide resolved
cpp/src/interop/to_arrow_device.cu Outdated Show resolved Hide resolved
cpp/src/interop/to_arrow_device.cu Outdated Show resolved Hide resolved
@davidwendt davidwendt requested a review from vyasr April 18, 2024 12:10
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9fa247f into rapidsai:branch-24.06 Apr 22, 2024
70 checks passed
@davidwendt davidwendt deleted the to-arrow-from-view branch April 22, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants