Mark all cuco kernels as static so they have hidden visibility #422

robertmaynard · 2024-01-09T20:59:57Z

This marks all kernels in CUCO as static so that they have internal linkage and won't conflict when used by multiple DSOs.

I didn't see a single shared/common header in cuco where I could place a CUCO_KERNEL macro so I modified each instance instead.
While cccl went with a __attribute__ ((visibility ("hidden"))) approach to help reduce RDC size, this approach seemed very invasive for cuco. This is due to the fact that we would need to pragma push and pop both gcc warnings and nvcc warnings in each cuco header so that we don't introduce any warnings. This is needed as the compiler incorrectly state that the __attribute__ ((visibility ("hidden"))) has no side-effect.

Context:
rapidsai/cudf#14726
NVIDIA/cccl#166
rapidsai/raft#1722

copy-pr-bot · 2024-01-09T21:02:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

PointKernel · 2024-01-09T21:05:56Z

/ok to test

PointKernel · 2024-01-09T21:07:42Z

I didn't see a single shared/common header in cuco where I could place a CUCO_KERNEL macro so I modified each instance instead.

https://github.com/NVIDIA/cuCollections/blob/dev/include/cuco/detail/utility/cuda.hpp probably the best place for such a macro

robertmaynard · 2024-01-10T15:27:20Z

I didn't see a single shared/common header in cuco where I could place a CUCO_KERNEL macro so I modified each instance instead.

https://github.com/NVIDIA/cuCollections/blob/dev/include/cuco/detail/utility/cuda.hpp probably the best place for such a macro

Updated to introduce a CUCO_KERNEL macro to cuda.hpp.

jrhemstad

Since cuco is header-only and may be included in projects that are building with rdc=true, I think it is more important to preserve the __attribute__ ((visibility ("hidden")) functionality when __CUDA_RDC__ is defined.

As you mentioned, you will need to universally silence the -Wattributes warning as seen here: https://github.com/NVIDIA/cccl/blob/2771c61545eff4ec3cede24ba7963c4eebc4bbaf/cub/cub/util_macro.cuh#L128-L140

robertmaynard · 2024-01-10T17:23:49Z

The issue I have with CCCL approach is that it has side effects on all functions inside the TU and not just the CCCL functions since the suppressions aren't push/popped.

This could hide real issues in user code, and why I don't recommend universally silencing the warnings. But I will implement what ever approach the CuCo team wants, which by the sound of it is universal suppression

sleeepyjack · 2024-01-10T21:41:54Z

I'm trying to wrap my head around this. So static is still correct, but we have an increased binary size in case rdc=true?

Using the hidden attribute would also help with the binary size, but adding pragma push/pop to every header (does it need to be every header or just those that directly emit the warning?) is very invasive. Can we prefix the CUCO_KERNEL macro with #pragma warning ( suppress: FooWarning) instead?

robertmaynard · 2024-01-11T14:10:45Z

I'm trying to wrap my head around this. So static is still correct, but we have an increased binary size in case rdc=true?

Correct.

Using the hidden attribute would also help with the binary size, but adding pragma push/pop to every header (does it need to be every header or just those that directly emit the warning?) is very invasive. Can we prefix the CUCO_KERNEL macro with #pragma warning ( suppress: FooWarning) instead?

It would help with binary size only in case of rdc=true. You are correct it will only need to be the headers that have kernels.

Can we prefix the CUCO_KERNEL macro with #pragma warning ( suppress: FooWarning) instead?
You can but that will effect all following code, which would include user code inside any translation unit that included the header.

sleeepyjack · 2024-01-11T14:31:12Z

You can but that will effect all following code, which would include user code inside any translation unit that included the header.

Ah, ok, I thought #pragma warning ( suppress: would only affect the line that directly follows, not the entire TU.

I would prefer the hidden annotation solution over static, but if it's too much work I'm fine with the latter too.

Most of the kernels in cuco are already isolated in separate header files, so we would need to add the push/pop pragmas to 8 files in total (ignoring any kernels in test/benchmark directories):

include/cuco/detail/open_addressing/kernels.cuh
include/cuco/detail/static_map/kernels.cuh
include/cuco/detail/static_multimap/kernels.cuh
include/cuco/detail/static_set/kernels.cuh
include/cuco/detail/trie/dynamic_bitset/kernels.cuh
include/cuco/detail/storage/kernels.cuh
include/cuco/detail/dynamic_map_kernels.cuh
include/cuco/detail/static_map_kernels.cuh (legacy map impl)

robertmaynard · 2024-01-11T14:33:34Z

I would prefer the hidden annotation solution over static

I will go forward with that approach 👍

robertmaynard · 2024-01-11T16:23:46Z

@sleeepyjack
After testing with the cuco examples and libcudf, we can't pop the macros as desired.
This due to how the __attribute__ ((visibility ("hidden"))) is applied during nvcc internal code generation. Consuming TUs will also generate attribute warnings and those can only be captured by not popping the pragmas.

PointKernel · 2024-01-11T19:40:47Z

/ok to test

sleeepyjack · 2024-01-11T22:50:12Z

Consuming TUs will also generate attribute warnings and those can only be captured by not popping the pragmas.

So this means we globally disable this warning in user land? Not ideal. @jrhemstad any ideas?

jrhemstad · 2024-01-12T17:39:36Z

So this means we globally disable this warning in user land? Not ideal. @jrhemstad any ideas?

That's precisely what we did in every CCCL header. It's a pretty innocuous warning to silence, and the benefits far outweigh any disadvantages.

Patch both CCCL and CUCO to have only internal linkage. For cuco I am working on upstreaming these changes ( NVIDIA/cuCollections#422 ). Once that is accepted and we have validated that moving cuco is stable ( e.g. changes around `cuco::experimental::static_set` ) we can drop this patch set. For cccl the long term fix is to move to CCCL 2.3+, but due to issues ( NVIDIA/cccl#1249, maybe others ) that isn't viable for the 24.02 timeframe. Since the CCCL changes mean C++ and CUDA sources have non compatible ABI's, we need to specify `THRUST_DISABLE_ABI_NAMESPACE` and `THRUST_IGNORE_ABI_NAMESPACE_ERROR` so that we don't change ABI in rapids-cmake consumers since they expect 2.2 behavior. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: #523

PointKernel

Checked with @sleeepyjack earlier today, this PR is ready to ship once we revert changes in the example. Examples are considered as user code thus no to need change it.

examples/static_set/device_subsets_example.cu

Co-authored-by: Yunsong Wang <[email protected]>

robertmaynard · 2024-01-19T13:54:16Z

Checked with @sleeepyjack earlier today, this PR is ready to ship once we revert changes in the example. Examples are considered as user code thus no to need change it.

I have integrated the changes to examples. So you can merge whenever you are ready.
Thanks!

sleeepyjack · 2024-01-19T15:40:38Z

/ok to test

Patch both CCCL and CUCO to have only internal linkage. For cuco I am working on upstreaming these changes ( NVIDIA/cuCollections#422 ). Once that is accepted and we have validated that moving cuco is stable ( e.g. changes around `cuco::experimental::static_set` ) we can drop this patch set. For cccl the long term fix is to move to CCCL 2.3+, but due to issues ( NVIDIA/cccl#1249, maybe others ) that isn't viable for the 24.02 timeframe. Since the CCCL changes mean C++ and CUDA sources have non compatible ABI's, we need to specify `THRUST_DISABLE_ABI_NAMESPACE` and `THRUST_IGNORE_ABI_NAMESPACE_ERROR` so that we don't change ABI in rapids-cmake consumers since they expect 2.2 behavior. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#523

This is to remove the row conversion code from libcudf. It was move from spark-rapids-jni (by #14664) to temporarily workaround the issue due to conflict of kernel names that causes invalid memory access when calling to `thrust::in(ex)clusive_scan` (NVIDIA/spark-rapids-jni#1567). Now we have fixes for the namespace visibility issue (by marking all libcudf kenels private in rapidsai/rapids-cmake#523 and NVIDIA/cuCollections#422) and need to move back the code. Closes #14853. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #15234

PointKernel added the type: improvement Improvement / enhancement to an existing function label Jan 9, 2024

PointKernel requested a review from sleeepyjack January 9, 2024 21:05

robertmaynard force-pushed the bug/mark_kernels_as_hidden branch from a8e41b5 to b520b69 Compare January 10, 2024 15:26

jrhemstad reviewed Jan 10, 2024

View reviewed changes

robertmaynard mentioned this pull request Jan 10, 2024

Patch cccl and cuco so that all CUDA kernels they provide have internal linkage rapidsai/cudf#14735

Closed

3 tasks

Mark all cuco kernels as static so they have hidden visibility

c277ab7

robertmaynard force-pushed the bug/mark_kernels_as_hidden branch from 404006a to c277ab7 Compare January 11, 2024 17:05

[pre-commit.ci] auto code formatting

07f9e4b

robertmaynard requested a review from jrhemstad January 11, 2024 17:05

PointKernel approved these changes Jan 12, 2024

View reviewed changes

This was referenced Jan 16, 2024

Mark all cccl and cuco kernels with hidden visibility rapidsai/rapids-cmake#523

Merged

Update RAPIDS to mark all CUDA kernels with internal linkage rapidsai/build-planning#12

Closed

PointKernel reviewed Jan 19, 2024

View reviewed changes

examples/static_set/device_subsets_example.cu Outdated Show resolved Hide resolved

examples/static_set/device_subsets_example.cu Outdated Show resolved Hide resolved

robertmaynard and others added 2 commits January 19, 2024 08:53

Update examples/static_set/device_subsets_example.cu

6c9d92f

Co-authored-by: Yunsong Wang <[email protected]>

Update examples/static_set/device_subsets_example.cu

890727d

Co-authored-by: Yunsong Wang <[email protected]>

PointKernel merged commit 75c9613 into NVIDIA:dev Jan 19, 2024
15 checks passed

This was referenced Jan 23, 2024

[BUG] Investigate inclusive_scan issue in column-row conversion code NVIDIA/spark-rapids-jni#1579

Closed

[FEA] Remove row coversion code since we have fixed the kernel visibility issue rapidsai/cudf#14853

Closed

This was referenced Mar 5, 2024

Remove row conversion code from libcudf rapidsai/cudf#15234

Merged

Move row conversion code from cudf NVIDIA/spark-rapids-jni#1838

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark all cuco kernels as static so they have hidden visibility #422

Mark all cuco kernels as static so they have hidden visibility #422

robertmaynard commented Jan 9, 2024

copy-pr-bot bot commented Jan 9, 2024

PointKernel commented Jan 9, 2024

PointKernel commented Jan 9, 2024 •

edited

Loading

robertmaynard commented Jan 10, 2024

jrhemstad left a comment

robertmaynard commented Jan 10, 2024 •

edited

Loading

sleeepyjack commented Jan 10, 2024 •

edited

Loading

robertmaynard commented Jan 11, 2024

sleeepyjack commented Jan 11, 2024

robertmaynard commented Jan 11, 2024

robertmaynard commented Jan 11, 2024

PointKernel commented Jan 11, 2024

sleeepyjack commented Jan 11, 2024 •

edited

Loading

jrhemstad commented Jan 12, 2024

PointKernel left a comment

robertmaynard commented Jan 19, 2024

sleeepyjack commented Jan 19, 2024

Mark all cuco kernels as static so they have hidden visibility #422

Mark all cuco kernels as static so they have hidden visibility #422

Conversation

robertmaynard commented Jan 9, 2024

copy-pr-bot bot commented Jan 9, 2024

PointKernel commented Jan 9, 2024

PointKernel commented Jan 9, 2024 • edited Loading

robertmaynard commented Jan 10, 2024

jrhemstad left a comment

Choose a reason for hiding this comment

robertmaynard commented Jan 10, 2024 • edited Loading

sleeepyjack commented Jan 10, 2024 • edited Loading

robertmaynard commented Jan 11, 2024

sleeepyjack commented Jan 11, 2024

robertmaynard commented Jan 11, 2024

robertmaynard commented Jan 11, 2024

PointKernel commented Jan 11, 2024

sleeepyjack commented Jan 11, 2024 • edited Loading

jrhemstad commented Jan 12, 2024

PointKernel left a comment

Choose a reason for hiding this comment

robertmaynard commented Jan 19, 2024

sleeepyjack commented Jan 19, 2024

PointKernel commented Jan 9, 2024 •

edited

Loading

robertmaynard commented Jan 10, 2024 •

edited

Loading

sleeepyjack commented Jan 10, 2024 •

edited

Loading

sleeepyjack commented Jan 11, 2024 •

edited

Loading