-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Sanitizer reports misaligned error when doing reduction on short type values in cuda12 ENV #14192
Comments
I'm not able to reproduce the error on my local libcudf build.
|
We have a few reductions tests in libcudf that use min-aggregation followed by a call to max-aggregation. Here is one that executes with cudf/cpp/tests/reductions/reduction_tests.cpp Lines 152 to 157 in 3196f6c
All of the tests are run with compute-sanitizer in our nightly builds Curious if these tests also fail for you in your environment as well. |
I suspect that this is the same as my previous reported issue: #13685 |
Yes, they also fail. |
This seems specific to your test environment since our nightly compute-sanitizer does not fail running Perhaps you can provide some details on the environment. I see mention of a docker image in the description. Does the error occur only on centos7? |
This is a custom build of libcudf for the RAPIDS Accelerator, where we are compiling libcudf as a PIC static library that is ultimately linked into a shared library and used by the JVM.
The Docker image is used to produce this build, see https://github.com/NVIDIA/spark-rapids-jni/blob/branch-23.10/CONTRIBUTING.md#building-in-the-docker-container. After pulling the spark-rapids-jni and executing the build-in-docker script, the libcudf install will be in spark-rapids-jni/target/libcudf-install/. You can use the run-in-docker script to get an interactive shell within the same environment as the build env if desired. The result of the build can be run on any supported OS (e.g.: Ubuntu). I don't know if the error has been reproduced in different OS's.
No, it only occurs when running under the compute-sanitizer, and specifically when compiling with CUDA12. I agree that at this point it appears to be a compute-sanitizer bug specific to CUDA 12. |
|
Curious if this is perhaps resolved with the fixes for this issue NVIDIA/spark-rapids-jni#1567 |
This is likely the same |
Describe the bug
Sanitizer reports misaligned error when doing reduction on short type values in cuda12 ENV
Steps/Code to reproduce bug
Code:
Compile and Run with sanitizer:
Print sanitizer log:
The main errors are:
Others:
Expected behavior
Fix Sanitizer error.
Environment overview (please complete the following information)
Environment details
Docker image: urm.nvidia.com/sw-spark-docker/plugin-jni:centos7-cuda12.0.1-blossom
CUDA 12, for more details, refer to NVIDIA/spark-rapids-jni#1349
Additional context
Refer to NVIDIA/spark-rapids-jni#1349
The text was updated successfully, but these errors were encountered: