Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port PendingRow to arrow-rs #8617

Merged
merged 11 commits into from
Jan 9, 2025
Merged

Port PendingRow to arrow-rs #8617

merged 11 commits into from
Jan 9, 2025

Conversation

emilk
Copy link
Member

@emilk emilk commented Jan 8, 2025

NOTE: this will regress on memory use, because we lack .shrink_to_fit, which will arrive in

@emilk emilk force-pushed the emilk/pending-row branch from 3b09fb8 to 55df031 Compare January 8, 2025 19:38
Copy link

github-actions bot commented Jan 8, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
21d76fd https://rerun.io/viewer/pr/8617 +nightly +main

Note: This comment is updated whenever you push a commit.

@emilk emilk added 🏹 arrow concerning arrow do-not-merge Do not merge this PR 🚜 refactor Change the code, not the functionality exclude from changelog PRs with this won't show up in CHANGELOG.md labels Jan 8, 2025
/// * The list only contains null entries, or empty arrays, or a mix of both.
#[inline]
pub fn is_list_array_semantically_empty(list_array: &ListArray) -> bool {
list_array.values().is_empty()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the freedom to simplify this function quite a bit from the original:

pub fn is_list_array_semantically_empty(list_array: &ArrowListArray<i32>) -> bool {
let is_physically_empty = || list_array.is_empty();
let is_all_nulls = || {
list_array
.validity()
.map_or(false, |bitmap| bitmap.unset_bits() == list_array.len())
};
let is_all_empties = || list_array.offsets().lengths().all(|len| len == 0);
let is_a_mix_of_nulls_and_empties =
|| list_array.iter().flatten().all(|array| array.is_empty());
is_physically_empty() || is_all_nulls() || is_all_empties() || is_a_mix_of_nulls_and_empties()
}

Was the old complexity necessary for some reason I don't see, or an oversight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I can say is that this logic was very important for the UX of the dataframe views and APIs.

Whether the two implementations are equivalent, I'm tempted to say yes but I might be missing something too.

@@ -0,0 +1,317 @@
use arrow::{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub fn concat_arrays(arrays: &[&dyn Array]) -> arrow::error::Result<ArrayRef> {
#[allow(clippy::disallowed_methods)] // that's the whole point
arrow::compute::concat(arrays)
// TODO: call .shrink_to_fit on the result
Copy link
Member Author

@emilk emilk Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…but maybe it's fine to regress on this temporarily

@emilk emilk marked this pull request as ready for review January 8, 2025 19:59
@emilk emilk changed the title WIP: arrow2 shenanigans Port PendingRow to arrow-rs Jan 8, 2025
@teh-cmc teh-cmc self-requested a review January 9, 2025 09:06
@emilk
Copy link
Member Author

emilk commented Jan 9, 2025

@rerun-bot full-check

Copy link

github-actions bot commented Jan 9, 2025

Started a full build: https://github.com/rerun-io/rerun/actions/runs/12687112287

Copy link
Member

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a very shallow look but it all seems fairly mechanical so there shouldn't too many surprises.

A full-check will definitely tell us if something went wrong in any case.

/// * The list only contains null entries, or empty arrays, or a mix of both.
#[inline]
pub fn is_list_array_semantically_empty(list_array: &ListArray) -> bool {
list_array.values().is_empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I can say is that this logic was very important for the UX of the dataframe views and APIs.

Whether the two implementations are equivalent, I'm tempted to say yes but I might be missing something too.

@emilk
Copy link
Member Author

emilk commented Jan 9, 2025

@rerun-bot full-check

@emilk emilk removed the do-not-merge Do not merge this PR label Jan 9, 2025
Copy link

github-actions bot commented Jan 9, 2025

Started a full build: https://github.com/rerun-io/rerun/actions/runs/12687313927

@emilk emilk mentioned this pull request Jan 8, 2025
16 tasks
@emilk
Copy link
Member Author

emilk commented Jan 9, 2025

Deploying the Python docs failed the first run, but worked the second go

@emilk emilk merged commit 3a50bc2 into main Jan 9, 2025
66 of 67 checks passed
@emilk emilk deleted the emilk/pending-row branch January 9, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow exclude from changelog PRs with this won't show up in CHANGELOG.md 🚜 refactor Change the code, not the functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants