-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GenericListViewArray
for ListView
& LargeListView
#5723
Conversation
@tustvold If you have time, I would appreciate it if you could help me review this PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, been swamped and then sick, but this is looking almost there
Thank you for taking the time to review my code despite being busy and unwell. |
For future reference, it is helpful for reviewers if you don't force push branches once reviews have been made, this makes it much easier to see what has changed. I'll try to find time to review this later today, but I now have to do a full review again which is quite time consuming on such a large PR |
@tustvold Thank you for your help. I will be more careful about the force push issue in my future commits. I am very sorry. |
Hi @tustvold , perhaps when you're free this week, you could help me review this. This way, we can continue to move forward with this work. Thank you. |
I'm afraid I don't have capacity to assist with this initiative further, but perhaps @alamb may do. I am reducing my involvement in arrow-rs as I no longer use it in my day job |
@alamb Perhaps you can help advance this PR when you have time. 🥸 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions but overall this matches my understanding of list view arrays. Thanks for the work!
impl<OffsetSize: OffsetSizeTrait> Clone for GenericListViewArray<OffsetSize> { | ||
fn clone(&self) -> Self { | ||
Self { | ||
data_type: self.data_type.clone(), | ||
nulls: self.nulls.clone(), | ||
values: self.values.clone(), | ||
value_offsets: self.value_offsets.clone(), | ||
value_sizes: self.value_sizes.clone(), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you derive the Clone
impl?
for i in 0..offsets.len() { | ||
let length = values.len(); | ||
let offset = offsets[i].as_usize(); | ||
let size = sizes[i].as_usize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, is it faster to use offsets.iter().zip(sizes)
(maybe the optimize down to the same thing?)
/// | ||
/// * `offsets.len() != sizes.len()` | ||
/// * `offsets.len() != nulls.len()` | ||
/// * `offsets.last() > values.len()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies you only check the last offset but you validate all offsets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the code again and realized that I had already validated all the offsets, so I changed it to offsets[i] > values.len().
length | ||
))); | ||
} | ||
if offset.saturating_add(size) > values.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the above length comparison if you also have this comparison?
let mut sizes = Vec::with_capacity(iter.size_hint().0); | ||
for size in iter { | ||
offsets.push(OffsetSize::usize_as(acc)); | ||
acc = acc.checked_add(size).expect("usize overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor corner case nit: Imagine we are working with i32
offsets and we have 1Gi - 1
fixed size lists of size 2. Will this panic even thought it shouldn't.
acc = acc.checked_add(size).expect("usize overflow"); | ||
sizes.push(OffsetSize::usize_as(size)); | ||
} | ||
OffsetSize::from_usize(acc).expect("offset overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this final check? It seems redundant.
// SAFETY: | ||
// ArrayData is valid, and verified type above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why SAFETY
? We aren't doing anything unsafe here that I can see.
let sizes = list.value_sizes(); | ||
assert_eq!(sizes, &[3, 3, 3]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other tests might be:
- Test where the lists overlap (e.g. sizes =
[5, 5]
and offsets =[0, 3]
andvalues
has length8
. - Test where offsets does not cover the entire range of values (e.g. values has 50 things and offsets only go up to 10)
- List of empty lists (with an empty values)
hi @westonpace please help me review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last small things.
for (offset, size) in offsets.iter().zip(sizes.iter()) { | ||
let offset = offset.as_usize(); | ||
let size = size.as_usize(); | ||
if offset.saturating_add(size) > values.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a checked_add
.
for size in iter { | ||
offsets.push(OffsetSize::usize_as(acc)); | ||
acc = acc.saturating_add(size); | ||
sizes.push(OffsetSize::usize_as(size)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did like that you were checking for overflow here, my only concern was that it was in the wrong spot. What if, before the for loop, you checked to ensure that value.len()
is less than or equal to the max offset? Then you can keep using saturating_add
here. Or even better, use the basic add since we've verified it will never overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reconsidered, and it seems that no additional validation is needed here. 🤔
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
@westonpace 😭 Apologies for the delay in processing this pull request due to work reasons. I hope you can help review it when you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thing but I think this is good to go. Thanks for addressing the earlier reviews :)
//check if the size is usize | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure I understand the meaning of this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is no longer needed, so I've removed it.
self.values.slice(offset, length) | ||
} | ||
|
||
/// Returns ith value of this list view array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should get a # Panics
note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrobbel Thank you for your review.
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
I took a quick skim of this PR and the code, tests and documentation are looking quite nice I think -- thank you @Kikkon @westonpace and @mbrobbel 🚀 |
Which issue does this PR close?
A part of #5501
Rationale for this change
What changes are included in this PR?
Add GenericListViewArray implement
Are there any user-facing changes?