Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(arrow-ord): support boolean in rank and sorting list of booleans #6912

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rluvaton
Copy link
Contributor

Which issue does this PR close?

Part of #6911

Rationale for this change

I want to sort list of of booleans

What changes are included in this PR?

added rank function for boolean and added tests for sorting list of booleans and ranking booleans

Are there any user-facing changes?

Yes, the rank function now support BooleanArray and sort support sorting list of booleans

this build on top of:

@rluvaton rluvaton marked this pull request as draft December 26, 2024 13:25
@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 26, 2024
@rluvaton rluvaton marked this pull request as ready for review December 26, 2024 14:13
@tustvold
Copy link
Contributor

Rather than sorting booleans directly, it would be orders of magnitude faster to count the number of bits and use this to compute the result. Nulls can be handled by first computing the bitwise and with the values.

@rluvaton
Copy link
Contributor Author

Rather than sorting booleans directly, it would be orders of magnitude faster to count the number of bits and use this to compute the result. Nulls can be handled by first computing the bitwise and with the values.

Great idea, did not think about it (I was just trying to implement before trying to make it fast as you said)

@tustvold
Copy link
Contributor

Fair, I'll try to find some time over the next few weeks to review this, but I'm afraid I have very little time to review PRs, especially of this size

@rluvaton
Copy link
Contributor Author

@tustvold Updated to improve performance, please let me know if that's what you meant

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to improve performance, please let me know if that's what you meant

I think he was referring to the sort logic for booleans, which doesn't actually seem to be implemented in this PR? It might be worth splitting this PR as its doing two separate things and might cause confusion.

Comment on lines +254 to +265
let a = BooleanArray::from(vec![Some(true), Some(true), None, Some(false), Some(false)]);
let res = rank(&a, None).unwrap();
assert_eq!(res, &[2, 2, 0, 1, 1]);

let res = rank(&a, Some(descending)).unwrap();
assert_eq!(res, &[1, 1, 0, 2, 2]);

let res = rank(&a, Some(nulls_last)).unwrap();
assert_eq!(res, &[1, 1, 2, 0, 0]);

let res = rank(&a, Some(nulls_last_descending)).unwrap();
assert_eq!(res, &[0, 0, 2, 1, 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a dense_rank instead of rank which takes into account ties.

See the doctest example, where you can imagine "foo" as true and "bar" as false (or vice-versa):

/// # use arrow_array::StringArray;
/// # use arrow_ord::rank::rank;
/// let array = StringArray::from(vec![Some("foo"), None, Some("foo"), None, Some("bar")]);
/// let ranks = rank(&array, None).unwrap();
/// assert_eq!(ranks, &[5, 2, 5, 2, 3]);

I think the logic might need to be revisited for the boolean rank function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants