-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GpuExplode single row split to fit cuDF limits #10088
Comments
PoC abellina@8bced26 In the PoC note that I pick an arbitrary 100 splits for the list we are exploding, this is not what we want ultimately. We should compute that to fit 2B entries in a cuDF column. |
PoC is fairly close to what we want, I'll polish this up and put up a PR |
I am seeing a problem where the output isn't quite right with more columns in the project, none of the tests are seeing this. I am investigating why this is happening. Specifically, the first column is getting used for columns, except the exploding column. So if we have 2 carry along columns, the second column isn't correct. |
We have seen a case where a single row with strings (a 1MB sized string) and other columns, could have a list to
explode
by with many elements (10K elements for example). When we try to handle such an explode, we currently will not split the input, because it is a single row, but because of the repetition amount here we can go over cuDF column size limits.The proposal is to at least do this in the
withRetry
case, wheresplitInHalfByRows
for the explode case could have a special clause where for an input of 1 rows, it still can split the list we are exploding by, and replicate the row accordingly.For example:
Would become:
or more rows, with less items in the list. We should be able to calculate the size of the row that would fit cuDF and then split the list accordingly and replicate the rest of the columns to go along.
The text was updated successfully, but these errors were encountered: