-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark all cuco kernels as static so they have hidden visibility #422
Mark all cuco kernels as static so they have hidden visibility #422
Conversation
/ok to test |
https://github.com/NVIDIA/cuCollections/blob/dev/include/cuco/detail/utility/cuda.hpp probably the best place for such a macro |
a8e41b5
to
b520b69
Compare
Updated to introduce a CUCO_KERNEL macro to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cuco is header-only and may be included in projects that are building with rdc=true
, I think it is more important to preserve the __attribute__ ((visibility ("hidden"))
functionality when __CUDA_RDC__
is defined.
As you mentioned, you will need to universally silence the -Wattributes
warning as seen here: https://github.com/NVIDIA/cccl/blob/2771c61545eff4ec3cede24ba7963c4eebc4bbaf/cub/cub/util_macro.cuh#L128-L140
The issue I have with CCCL approach is that it has side effects on all functions inside the TU and not just the CCCL functions since the suppressions aren't push/popped. This could hide real issues in user code, and why I don't recommend universally silencing the warnings. But I will implement what ever approach the CuCo team wants, which by the sound of it is universal suppression |
I'm trying to wrap my head around this. So Using the |
Correct.
It would help with binary size only in case of
|
Ah, ok, I thought I would prefer the Most of the kernels in cuco are already isolated in separate header files, so we would need to add the push/pop pragmas to 8 files in total (ignoring any kernels in test/benchmark directories):
|
I will go forward with that approach 👍 |
@sleeepyjack |
404006a
to
c277ab7
Compare
/ok to test |
So this means we globally disable this warning in user land? Not ideal. @jrhemstad any ideas? |
That's precisely what we did in every CCCL header. It's a pretty innocuous warning to silence, and the benefits far outweigh any disadvantages. |
Patch both CCCL and CUCO to have only internal linkage. For cuco I am working on upstreaming these changes ( NVIDIA/cuCollections#422 ). Once that is accepted and we have validated that moving cuco is stable ( e.g. changes around `cuco::experimental::static_set` ) we can drop this patch set. For cccl the long term fix is to move to CCCL 2.3+, but due to issues ( NVIDIA/cccl#1249, maybe others ) that isn't viable for the 24.02 timeframe. Since the CCCL changes mean C++ and CUDA sources have non compatible ABI's, we need to specify `THRUST_DISABLE_ABI_NAMESPACE` and `THRUST_IGNORE_ABI_NAMESPACE_ERROR` so that we don't change ABI in rapids-cmake consumers since they expect 2.2 behavior. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: #523
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked with @sleeepyjack earlier today, this PR is ready to ship once we revert changes in the example. Examples are considered as user code thus no to need change it.
Co-authored-by: Yunsong Wang <[email protected]>
Co-authored-by: Yunsong Wang <[email protected]>
I have integrated the changes to examples. So you can merge whenever you are ready. |
/ok to test |
Patch both CCCL and CUCO to have only internal linkage. For cuco I am working on upstreaming these changes ( NVIDIA/cuCollections#422 ). Once that is accepted and we have validated that moving cuco is stable ( e.g. changes around `cuco::experimental::static_set` ) we can drop this patch set. For cccl the long term fix is to move to CCCL 2.3+, but due to issues ( NVIDIA/cccl#1249, maybe others ) that isn't viable for the 24.02 timeframe. Since the CCCL changes mean C++ and CUDA sources have non compatible ABI's, we need to specify `THRUST_DISABLE_ABI_NAMESPACE` and `THRUST_IGNORE_ABI_NAMESPACE_ERROR` so that we don't change ABI in rapids-cmake consumers since they expect 2.2 behavior. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#523
This is to remove the row conversion code from libcudf. It was move from spark-rapids-jni (by #14664) to temporarily workaround the issue due to conflict of kernel names that causes invalid memory access when calling to `thrust::in(ex)clusive_scan` (NVIDIA/spark-rapids-jni#1567). Now we have fixes for the namespace visibility issue (by marking all libcudf kenels private in rapidsai/rapids-cmake#523 and NVIDIA/cuCollections#422) and need to move back the code. Closes #14853. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #15234
This marks all kernels in CUCO as
static
so that they have internal linkage and won't conflict when used by multiple DSOs.I didn't see a single shared/common header in cuco where I could place a
CUCO_KERNEL
macro so I modified each instance instead.While
cccl
went with a__attribute__ ((visibility ("hidden")))
approach to help reduce RDC size, this approach seemed very invasive for cuco. This is due to the fact that we would need to pragma push and pop both gcc warnings and nvcc warnings in each cuco header so that we don't introduce any warnings. This is needed as the compiler incorrectly state that the__attribute__ ((visibility ("hidden")))
has no side-effect.Context:
rapidsai/cudf#14726
NVIDIA/cccl#166
rapidsai/raft#1722