-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133
Comments
It looks like this happens when there is a batch that has one row with an empty List in a List where the datatype is supposed to be List[List[Something]]. It reproduces for me very often on a single node standalone cluster with 1 worker. My box has 64 cores. Its doing a group by reduction collect_set in this test and The error above I believe is when inner type "Something" is complex type like a struct. If it is like an INT32 you get a slightly different error:
So the data type going into the reduction is List[List[INT32] and the inner list is empty and after the reduction we get back a LIST[INT32] which doesn't match the expected type. I'm still trying to narrow down exactly where this is happening or where it should be handled. |
Looking at java/src/test/java/ai/rapids/cudf/ReductionTest.java in cudf it doesn't look it has any tested for nested complex columns |
Ok so this happens when you have a column type of like List[List[INT32]] and you get the data as List[null]. Smaller manual reproduce steps with pyspark against standalone cluster with 1 worker that has 64 cores and 1 CPU: CPU:
Another CPU case with some valid, GPU here also fails.
GPU:
Valid case with out array being null
|
Note for the integration tests its again easy to reproduce in standalone mode with 1 workers where it has 64 cores and 1GPU:
|
Note the exception stack traces are different from the 2 examples I gave above: Manual pyspark repro:
Integration test test_hash_reduction_collect_set_on_nested_array_type failure:
|
Filed rapidsai/cudf#14924. |
I can reproduce the bug using example in #10133 (comment), and verify that it is fixed by rapidsai/cudf#15243. |
This should be closed by rapidsai/cudf#15243. |
No workaround/disable was added for this. Verified the recent EGX nightly tests that always failed with this are now passing. |
Describe the bug
test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment with the error "Type converstion is not allowed..."
Detailed output
Steps/Code to reproduce bug
Run integration tests in a distributed environment
Expected behavior
Tests pass
Environment details (please complete the following information)
Additional context
The text was updated successfully, but these errors were encountered: