Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Followup for null count fixup in row_conversion.cu. #1155

Merged
merged 4 commits into from
May 23, 2023

Conversation

mythrocks
Copy link
Collaborator

This is a followup to #1148.

row_conversion.cu was modified in rapidsai/cudf#13372 to explicitly calculate null-counts for output columns.

This commit replicates the changes in rapidsai/cudf/pull/13372, and adds explicit null-count calculation for the string offsets column.

@mythrocks mythrocks requested a review from hyperbolic2346 May 19, 2023 23:00
@mythrocks
Copy link
Collaborator Author

Build

mythrocks added 2 commits May 22, 2023 13:09
This is a followup to NVIDIA#1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks force-pushed the column-null-count-followup branch from 5d3351b to ac31cce Compare May 22, 2023 20:11
@mythrocks
Copy link
Collaborator Author

Build

Copy link
Collaborator

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is dense, but I believe this to be correct. The computed columns have null masks if incoming data has null masks, but there is no null count, so doing this work seems appropriate.

@mythrocks mythrocks requested a review from ttnghia May 23, 2023 21:02
@mythrocks
Copy link
Collaborator Author

mythrocks commented May 23, 2023

Thanks for reviewing, @hyperbolic2346. I'm ready to merge this.

Edit: @ttnghia, the only pending issue is static_cast vs reinterpret_cast. I'd be happy to do a follow-up if required, but it looks like static_cast should do the trick.

@mythrocks mythrocks merged commit 7134023 into NVIDIA:branch-23.06 May 23, 2023
@mythrocks mythrocks deleted the column-null-count-followup branch May 25, 2023 00:37
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 25, 2023
Fixes NVIDIA#1169.

This commit fixes the null counts in row/column conversion, for all output
columns.
The prior fix in NVIDIA#1155 only fixed the count for STRING outputs.

There are 2 additional/tangential fixes in this change:
  1. `cudf::detail::null_count()` is used in place of `cudf::null_count()`,
     thus running on the current stream.
  2. The Java `RowConversion.convertFromRowsFixedWidthOptimized()` now uses
     the `convertFromRowsFixedWidthOptimized()` native function instead of
     `convertFromRows()`. This should prove faster.
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 25, 2023
Fixes NVIDIA#1169.

This commit fixes the null counts in row/column conversion, for all output
columns.
The prior fix in NVIDIA#1155 only fixed the count for STRING outputs.

There are 2 additional/tangential fixes in this change:
  1. `cudf::detail::null_count()` is used in place of `cudf::null_count()`,
     thus running on the current stream.
  2. The Java `RowConversion.convertFromRowsFixedWidthOptimized()` now uses
     the `convertFromRowsFixedWidthOptimized()` native function instead of
     `convertFromRows()`. This should prove faster.

Signed-off-by: MithunR <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants