-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering items in list inside ListArray #6846
Comments
what is your filter logic? If you want a customize filter logic like |
I'm using DataFusion and actually creating array filter Physical Plan |
But It still should possible to create array list from iterator, no? |
Oh, I see the question you have. I think we can't build the ListArray from let data = vec![
Some(vec![Some(0), Some(1), Some(2)]),
Some(vec![Some(3), Some(4), Some(5)]),
Some(vec![Some(6), Some(7)]),
];
let list_array = ListArray::from_iter_primitive::<Int32Type, _, _>(data); I agree we could make this more easy to use let arrays = list_array.iter().map(|x| {
if let Some(x) = x {
let some_predicate =
BooleanArray::from((0..x.len()).map(|i| i % 2 == 0).collect::<Vec<_>>());
let val = arrow::compute::filter(x.deref(), &some_predicate)?;
let i32_val = val.as_primitive::<Int32Type>();
let val = i32_val.values().iter().map(|x| Some(*x)).collect::<Vec<_>>();
Ok(Some(val))
} else {
Ok(None)
}
}).collect::<Result<Vec<_>, ArrowError>>()?;
let new_list = ListArray::from_iter_primitive::<Int32Type, _, _>(arrays); |
It can also not be primitive, the function is just an example As you see from my comment
|
I think we can create utils function to build the list easily (and ideally efficient) |
If the types are known statically one can use the builder APIs https://docs.rs/arrow-array/latest/arrow_array/builder/index.html#nested-usage If the types are not known statically, it gets much more complicated and will require evaluating the filter predicate on the List::values and then using this in combination with the selection kernels to construct a new ListArray. As a ListArray isn't stored as a list of arrays, there isn't an efficient way to "collect" an iterator of arrays |
Doing any operation on on list of lists is a pain If I have #[test]
fn should_run() {
let from: Arc<ListArray> = create_test_list();
let mut to = ListBuilder::new(
ListBuilder::new(
Int32Builder::new()
)
);
let indices: &[usize] = &[0, 1, 2];
let data_type = DataType::List(
Arc::new(Field::new(
"item",
DataType::List(
Arc::new(Field::new(
"item",
DataType::Int32,
false
))
),
false
))
);
for &i in indices {
if from.is_valid(i) {
let inner_list = from.value(i).as_any().downcast_ref::<GenericListArray<i32>>().unwrap();
to.append_value(inner_list); // <- THIS WILL FAIL TO COMPILE
} else {
to.append_null();
}
}
}
fn create_test_list() -> Arc<ListArray> {
let primitive_builder = Int32Builder::with_capacity(10);
let values_builder = ListBuilder::new(primitive_builder);
let mut builder = ListBuilder::new(values_builder);
// [[[1, 2], [3, 4]], [[5, 6, 7], null, [8]], null, [[9, 10]]]
builder.values().values().append_value(1);
builder.values().values().append_value(2);
builder.values().append(true);
builder.values().values().append_value(3);
builder.values().values().append_value(4);
builder.values().append(true);
builder.append(true);
builder.values().values().append_value(5);
builder.values().values().append_value(6);
builder.values().values().append_value(7);
builder.values().append(true);
builder.values().append(false);
builder.values().values().append_value(8);
builder.values().append(true);
builder.append(true);
builder.append(false);
builder.values().values().append_value(9);
builder.values().values().append_value(10);
builder.values().append(true);
builder.append(true);
Arc::new(builder.finish())
} It's really a pain |
I see GreptimeDB implement their own ListVector based on GenericListBuilder that allow them to append list items when moved to use this repo instead of arrow2 |
Same problem, If the types are not known statically, it's really complicated while it shouldn't be from API interface perspective |
PRs are always welcome if you have ideas on how to improve things |
already ahead of you :) |
I agree, complex nested type (list of list, list of struct, struct of struct etc..) is not well supported and easy to use for now. Welcome for the contribution! |
Can you guys take a look at #6863 ? |
Which part is this question about
library api
Describe your question
I have a
ListArray
and I want to filter each item in the listBut I couldn't find a way to do it
I tried this:
But got:
The text was updated successfully, but these errors were encountered: