-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with --local command line parameter. #49
Comments
Hi, we would like to know if there are any solutions for the above mentioned issue. Thanks. |
I cannot reproduce the issue with hipSYCL. Your output indicates that somehow a block size of 0 enters the benchmark. This value is derived from the local size. I had a quick look at the code paths, and I don't understand how this could happen - it does not for me. |
Hi, we are not working with hipSYCL. The issue that we are facing is occurring during runtime. The test case is failing to execute when we are not passing the local parameter (as in, when it is taking the value of local parameter as 256 by default). Command being used to execute - ./blocked_transform --device=gpu However, it is working fine when we are explicitly defining the local parameter to 256 during runtime. Command being used to execute - ./blocked_transform --device=gpu --local=256 We are not sure as to why this issue is occurring. Thanks. |
I'm aware of this. But I don't have an installation of the DPC++ SYCL implementation with CUDA backend here. I'm just saying I cannot reproduce this with my setup. And I don't understand why DPC++ or hipSYCL would behave differently here anyway. The error does not seem to be related to SYCL specific functionality.
I understood this. As I've said I cannot reproduce here. Command line option handling is the same for DPC++ and hipSYCL. For further investigation into the issue, I asked you the following:
i.e. make sure that the |
Hi, as suggested, I've added the following in the blocked_transform.cpp code and I've rebuilt it again. #include<assert.h> It seems that by default, the value of local size is being taken as 1024 (please see attached screenshot below). However, when I am defining '--local' to be either 256 (default value) or 1024 explicitly, it is working fine. Command being used: ./blocked_transform --device=gpu --local=256 Could this be a bug in the code? |
Hi, is there any update regarding this issue? Thanks. |
While executing the test case blocked_transform which is present under runtime (https://github.com/bcosenza/sycl-bench/blob/master/runtime/blocked_transform.cpp), we noticed that we are getting a core dump error.
Command used to execute - ./blocked_transform --device=gpu
Output -
********** Results for Runtime_BlockedTransform_iter_64_blocksize_0**********
problem-size: 3072
local-size: 1024
device-name: NVIDIA RTX A4000
sycl-implementation: LLVM CUDA (Codeplay)
blocked_transform: /tmp/llvm-sycl-nightly-20220222/sycl/source/detail/scheduler/commands.cpp:1826: void cl::sycl::detail::adjustNDRangePerKernel(cl::sycl::detail::NDRDescT&, cl::sycl::detail::pi::PiKernel, const cl::sycl::detail::device_impl&): Assertion `NDR.NumWorkGroups[0] != 0 && NDR.LocalSize[0] == 0' failed.
Aborted (core dumped)
However, when we are explicitly assigning the value of the --local parameter to 256 (which is the default value) during runtime, it is executing without any errors.
Command used to execute - ./blocked_transform --device=gpu --local=256
We would like to know if there is a fix for this issue? If so, where can we get the revised code?
The text was updated successfully, but these errors were encountered: