-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(arrow-ord): support boolean in rank
and sorting list of booleans
#6912
base: main
Are you sure you want to change the base?
Conversation
Rather than sorting booleans directly, it would be orders of magnitude faster to count the number of bits and use this to compute the result. Nulls can be handled by first computing the bitwise and with the values. |
Great idea, did not think about it (I was just trying to implement before trying to make it fast as you said) |
Fair, I'll try to find some time over the next few weeks to review this, but I'm afraid I have very little time to review PRs, especially of this size |
@tustvold Updated to improve performance, please let me know if that's what you meant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to improve performance, please let me know if that's what you meant
I think he was referring to the sort logic for booleans, which doesn't actually seem to be implemented in this PR? It might be worth splitting this PR as its doing two separate things and might cause confusion.
let a = BooleanArray::from(vec![Some(true), Some(true), None, Some(false), Some(false)]); | ||
let res = rank(&a, None).unwrap(); | ||
assert_eq!(res, &[2, 2, 0, 1, 1]); | ||
|
||
let res = rank(&a, Some(descending)).unwrap(); | ||
assert_eq!(res, &[1, 1, 0, 2, 2]); | ||
|
||
let res = rank(&a, Some(nulls_last)).unwrap(); | ||
assert_eq!(res, &[1, 1, 2, 0, 0]); | ||
|
||
let res = rank(&a, Some(nulls_last_descending)).unwrap(); | ||
assert_eq!(res, &[0, 0, 2, 1, 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a dense_rank instead of rank which takes into account ties.
See the doctest example, where you can imagine "foo"
as true
and "bar"
as false
(or vice-versa):
arrow-rs/arrow-ord/src/rank.rs
Lines 42 to 46 in b77d38d
/// # use arrow_array::StringArray; | |
/// # use arrow_ord::rank::rank; | |
/// let array = StringArray::from(vec![Some("foo"), None, Some("foo"), None, Some("bar")]); | |
/// let ranks = rank(&array, None).unwrap(); | |
/// assert_eq!(ranks, &[5, 2, 5, 2, 3]); |
I think the logic might need to be revisited for the boolean rank function?
Which issue does this PR close?
Part of #6911
Rationale for this change
I want to sort list of of booleans
What changes are included in this PR?
added rank function for boolean and added tests for sorting list of booleans and ranking booleans
Are there any user-facing changes?
Yes, the
rank
function now supportBooleanArray
and sort support sorting list of booleansthis build on top of:
can_rank
to therank
file #6910