-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decoder for DELTA_BYTE_ARRAY to Parquet reader #14101
Add decoder for DELTA_BYTE_ARRAY to Parquet reader #14101
Conversation
This one cannot be tested with python in same way the DELTA_BINARY_PACKED decoder is because pyarrow doesn't yet support writing DELTA_BYTE_ARRAY (as of version 13.0, it will be in 14.0). The only option at this point is to add yet another small parquet file to the python test data directory, but that's not a very thorough test. |
…decode_delta_byte_arr
/ok to test |
} else { // warp 3 | ||
target_pos = min(s->nz_count, src_pos + batch_size); | ||
} | ||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sync necessary? There's one already at the bottom of the loop and there's nothing here that seems to require synchronization. I just looked at the related loop in page_data.cu and we're doing it there as well. Hmm. This feels very vestigial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never thought about it, but I think you're right. Maybe back in the day target_pos was shared or something. I'll take a look at all the decoders.
Edit: Did some archaeology and that sync has been there since the decoder was reworked in 2019, and near as I can tell it wasn't needed then either. I've removed them from the 4 current decoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let's leave them in for now and remove them in a separate PR. I'd rather let it soak for a full milestone (until 24.02) to make sure there's not something subtle here that we're missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll put them back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done...and added some TODOs
Co-authored-by: nvdbaranec <[email protected]>
Co-authored-by: nvdbaranec <[email protected]>
…decode_delta_byte_arr
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left an optional comment here: https://github.com/rapidsai/cudf/pull/14101/files#r1396338749
/merge |
Description
Part of #13501. Adds ability to decode DELTA_BYTE_ARRAY encoded pages.
Checklist