-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary / row helpers #6096
Binary / row helpers #6096
Conversation
Currently these are accessible via `AsRef`, but that trait only gives you the bytes with the lifetime of the `Row` struct and not the lifetime of the backing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bkirwi and @XiangpengHao
I think this PR needs some additional negative tests and error testing but otherwise I think it is looking good to me
cc @tustvold in case you have time to comment on the safety of the design
Marking as draft so it is clear this PR isn't waiting on feedback anymore (at least I don't think it is). Please mark it as ready for review when it is ready for another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I think I've addressed all comments, though there were a couple things I wasn't certain of - addressed inline.
(Looks like there was some merge skew in the tests; I've merged the main branch in here which ought to fix it.) |
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments but these seem like straightforward changes to me.
buffer: array.values().to_vec(), | ||
offsets: array.offsets().iter().map(|&i| i.as_usize()).collect(), | ||
config: RowConfig { | ||
fields: Arc::clone(&self.fields), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More for my curiosity than anything but why Arc::clone(&self.fields)
instead of self.fields.clone()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some people prefer this form because it makes it more explicit that we're just incrementing an arc and not cloning the underlying data. See the clippy lint docs for more: https://rust-lang.github.io/rust-clippy/master/index.html#/clone_on_ref_ptr
I've gotten used to this style, though I do not personally care deeply about it! This codebase seems to use a mix of both.
arrow-row/src/lib.rs
Outdated
/// | ||
/// // We can convert rows into binary format and back in batch. | ||
/// let values: Vec<OwnedRow> = rows.iter().map(|r| r.owned()).collect(); | ||
/// let binary = rows.try_into_binary().expect("small"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a little confused by .expect("small")
. What does "small" mean in this context? Why not just .unwrap()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_into_binary
fails when the data is too large to be indexed with a 32-bit integer, so this was meant to suggest that it was fine to unwrap here because the data is known to be small. I'll expand the message a bit!
(ah, I see rust fmt is failling, probably need CI passing before merge) |
@bkirwi can you please fix the CI tests so we can merge this PR? Thank you @westonpace for the review |
Thanks for the review! I should be able to get to the follow-up later this
week.
…On Tue, Sep 24, 2024 at 11:12 Andrew Lamb ***@***.***> wrote:
@bkirwi <https://github.com/bkirwi> can you please fix the CI tests so we
can merge this PR?
Thank you @westonpace <https://github.com/westonpace> for the review
—
Reply to this email directly, view it on GitHub
<#6096 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMFXMZRRAZP444R7XGRAVDZYF6N3AVCNFSM6AAAAABLFIXZL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZRGU4TENJWG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Alright, I've fixed the lint and applied a few more suggestions. Thanks all for the review! |
Which issue does this PR close?
Closes #6063.
(Potentially - still under discussion at the linked issue!)
Rationale for this change
I've added the optional
from_binary
method discussed in the associated issue also.What changes are included in this PR?
data
,into_binary
andfrom_binary
functions, and an extension to the fuzz test that checks the data survives the roundtrip.Are there any user-facing changes?
Yes, though I suspect the rustdoc covers them enough?