Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Range<u64> as argument for the RangeQueryHandle::get() #7395

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions crates/store/re_dataframe/examples/range_paginated.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,21 +61,21 @@ fn main() -> anyhow::Result<()> {
query_handle.num_rows()
);

let (offset, len) = (0, 4);
println!("offset:{offset} len:{len}");
concat_and_print(query_handle.get(offset, len));
let row_range = 0..4;
println!("range: {row_range:?}");
concat_and_print(query_handle.get(row_range));

let (offset, len) = (2, 4);
println!("offset:{offset} len:{len}");
concat_and_print(query_handle.get(offset, len));
let row_range = 2..6;
println!("range: {row_range:?}");
concat_and_print(query_handle.get(row_range));

let (offset, len) = (10, 5);
println!("offset:{offset} len:{len}");
concat_and_print(query_handle.get(offset, len));
let row_range = 10..15;
println!("range: {row_range:?}");
concat_and_print(query_handle.get(row_range));

let (offset, len) = (0, 15);
println!("offset:{offset} len:{len}");
concat_and_print(query_handle.get(offset, len));
let row_range = 0..15;
println!("range: {row_range:?}");
concat_and_print(query_handle.get(row_range));
}

Ok(())
Expand Down
17 changes: 10 additions & 7 deletions crates/store/re_dataframe/src/range.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use std::ops::Range;
use std::sync::{atomic::AtomicU64, OnceLock};

use ahash::HashMap;
Expand Down Expand Up @@ -219,7 +220,9 @@ impl RangeQueryHandle<'_> {
/// This is the most performant way to iterate over the dataset.
//
// TODO(cmc): This could be turned into an actual lazy iterator at some point.
pub fn get(&self, offset: u64, mut len: u64) -> Vec<RecordBatch> {
pub fn get(&self, row_range: Range<u64>) -> Vec<RecordBatch> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a

Suggested change
pub fn get(&self, row_range: Range<u64>) -> Vec<RecordBatch> {
pub fn get(&self, row_range: Range<u64>) -> Vec<RecordBatch> {
debug_assert!(row_range.start <= row_range.end, "Negative range: {row_range}");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this one. This is user input (via the SDK), so I think it should be validated a bit less... drastically? Here I opted for (silently) returning an empty dataframe if the range length is <= 0. We probably will have more explicit error handling in the future, but that seemed overkill for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably will have more explicit error handling in the future

Yeah, an error-result would have been my expectation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this very API change is what spawned this error condition in the first place. (offset: u64, len: u64) encodes better than Range<u64> the fact that "inverted ranges" are invalid.

With the fact that Range is not Copy and is occasionally annoying, this makes me question this entire PR. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, interesting point.

I actually do prefer the offset/len structure as it is closer to offset / limit that shows up in SQL or other datafusion integrations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Let's scrap this.

let offset = row_range.start;
let mut len = row_range.end.saturating_sub(row_range.start);
re_tracing::profile_function!(format!("get({offset}, {len}, {})", self.query));

let state = self.init();
Expand Down Expand Up @@ -533,7 +536,7 @@ mod tests {

// Paginated API
{
let batch = handle.get(0, 0).pop().unwrap();
let batch = handle.get(0..0).pop().unwrap();
// The output should be an empty recordbatch with the right schema and empty arrays.
assert_eq!(0, batch.num_rows());
assert!(
Expand All @@ -543,9 +546,9 @@ mod tests {
assert!(itertools::izip!(handle.schema(), batch.data.iter())
.all(|(descr, array)| descr.datatype() == array.data_type()));

let _batch = handle.get(0, 1).pop().unwrap();
let _batch = handle.get(0..1).pop().unwrap();

let batch = handle.get(1, 1).pop();
let batch = handle.get(1..2).pop();
assert!(batch.is_none());
}
}
Expand Down Expand Up @@ -633,7 +636,7 @@ mod tests {

// Paginated API
{
let batch = handle.get(0, 1).pop().unwrap();
let batch = handle.get(0..1).pop().unwrap();
// The output should be an empty recordbatch with the right schema and empty arrays.
assert_eq!(1, batch.num_rows());
assert_eq!(
Expand All @@ -654,9 +657,9 @@ mod tests {
.all(|(descr, field)| descr.to_arrow_field() == *field)
);

let _batch = handle.get(1, 1).pop().unwrap();
let _batch = handle.get(1..2).pop().unwrap();

let batch = handle.get(2, 1).pop();
let batch = handle.get(2..3).pop();
assert!(batch.is_none());
}
}
Expand Down
11 changes: 4 additions & 7 deletions crates/viewer/re_space_view_dataframe/src/dataframe_ui.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,15 @@ impl QueryHandle<'_> {
}
}

fn get(&self, start: u64, num_rows: u64) -> Vec<RecordBatch> {
fn get(&self, row_range: Range<u64>) -> Vec<RecordBatch> {
match self {
QueryHandle::LatestAt(query_handle) => {
// latest-at queries only have one row
debug_assert_eq!((start, num_rows), (0, 1));
debug_assert_eq!(row_range, 0..1);

vec![query_handle.get()]
}
QueryHandle::Range(query_handle) => query_handle.get(start, num_rows),
QueryHandle::Range(query_handle) => query_handle.get(row_range),
}
}

Expand Down Expand Up @@ -178,10 +178,7 @@ impl<'a> egui_table::TableDelegate for DataframeTableDelegate<'a> {

let data = RowsDisplayData::try_new(
&info.visible_rows,
self.query_handle.get(
info.visible_rows.start,
info.visible_rows.end - info.visible_rows.start,
),
self.query_handle.get(info.visible_rows.clone()),
self.schema,
&self.query_handle.timeline(),
);
Expand Down
Loading