Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Practice: ArrayView vs Pointer #1059

Open
JP-Ellis opened this issue Sep 1, 2021 · 6 comments · May be fixed by #1440
Open

Best Practice: ArrayView vs Pointer #1059

JP-Ellis opened this issue Sep 1, 2021 · 6 comments · May be fixed by #1440

Comments

@JP-Ellis
Copy link
Contributor

JP-Ellis commented Sep 1, 2021

I was wondering whether there was a best practive when writing functions that expect arrays, specifically when they are not taking ownership of the array. I couldn't see this being addressed in the documentation or in an existing issue so here's a few things I think could be worth thinking about:

  1. Should functions prefer to take ArrayView over the equivalent Array, and what about using a pointer vs by value? For example, which of the following would be 'best'?

    fn sum(a: ArrayView1<f64>) -> f64;
    fn sum(a: &ArrayView1<f64>) -> f64;
    fn sum(a: &Array1<f64>) -> f64;
  2. When muteable data is expected, is there a preference to using ArrayViewMut vs &mut Array?

    fn increment(mut a: ArrayViewMut1<f64>) -> f64;
    fn increment(a: &mut Array1<f64>) -> f64;

I would expect that for most applications, the ArrayView and ArrayViewMut are preferred as they are more general.

Is the fact that they are views into an array mean that they behave more like pointers and thus &ArrayView is redundant?

@multimeric
Copy link

multimeric commented Sep 1, 2021

I'm just a user, not a dev, but here's my take.

Firstly you should understand that an ArrayView is not just a pointer to an array. It's a pointer to some data (which may or may not be a full Array), but possibly using a subset of that data, and possibly with a different stride length, meaning that it could be a view over say every second element of the array. This information is stored within the ArrayBase struct.

My second general point is that, since all array subtypes are just ArrayBase with different type signatures, you can and should make your functions generic using these and almost never require any particular array type.

Now to your questions:

Should functions prefer to take ArrayView over the equivalent Array

Neither. As mentioned above, if your function doesn't care what type of array it takes, it should be made generic over ArrayBase<S, D>. I can't think of any functions that should require an ArrayView, especially because I believe all array types implement view() to convert them in a recommended way to an ArrayView if you somehow need one (e.g. the split_at() function is only implemented for ArrayView for some reason).

what about using a pointer vs by value

Follow Rust best practices here, and use the lowest level of ownership you can. If your function doesn't mutate the array, just accept a borrowed ArrayBase. In your example this would be:

fn sum<A, S, D>(a: &ArrayBase<S, D>) -> A
where
    S: Data<Elem = A>,
    D: Dimension;

And indeed this is basically how it's implemented in ndarray, albeit inside an impl block:

impl<A, S, D> ArrayBase<S, D>
where
S: Data<Elem = A>,
D: Dimension,
{
/// Return the sum of all elements in the array.
///
/// ```
/// use ndarray::arr2;
///
/// let a = arr2(&[[1., 2.],
/// [3., 4.]]);
/// assert_eq!(a.sum(), 10.);
/// ```
pub fn sum(&self) -> A
where
A: Clone + Add<Output = A> + num_traits::Zero,
{
if let Some(slc) = self.as_slice_memory_order() {
return numeric_util::unrolled_fold(slc, A::zero, A::add);
}
let mut sum = A::zero();
for row in self.rows() {
if let Some(slc) = row.as_slice() {
sum = sum + numeric_util::unrolled_fold(slc, A::zero, A::add);
} else {
sum = sum + row.iter().fold(A::zero(), |acc, elt| acc + elt.clone());
}
}
sum
}

When muteable data is expected, is there a preference to using ArrayViewMut vs &mut Array?

Remember, if your function takes an ArrayViewMut, the caller has to transfer ownership of the whole View struct to your function, and can no longer use it. This means throwing away all the subset/stride information. Again, make your functions generic over &ArrayBase. If only the contents of the array need to be mutated, make it generic over ArrayBase<S, D> where S: DataMut. Only use a mutable reference if you need to edit the array itself, which I can't think of many use cases for. Maybe if you want to reshape it?

I would expect that for most applications, the ArrayView and ArrayViewMut are preferred as they are more general.

As explained, no, making it generic over ArrayBase with the appropriate trait is preferred as it's more general again.

Is the fact that they are views into an array mean that they behave more like pointers and thus &ArrayView is redundant?

No. See my first paragraph.

@JP-Ellis
Copy link
Contributor Author

JP-Ellis commented Sep 1, 2021

Thanks for your detailed answer 😄 I will refactor some of my code to be a bit more general and use ArrayBase.

Perhaps there should be some part of the documentation that addresses this to make it clearer? Or if that exists somewhere, I didn't see it.

Note that whether I pass in ArrayViewMut into a function or pass a mutable pointer to it, Rust should prevent other parts of the code from accessing it elsewhere (even for reading). So that's why I thought it wouldn't matter too much. I agree that the mutable reference would only ever be needed if I wanted to chnage the underlying array and generally modifying data should only require a mutable view into the array (through DataMut).

@multimeric
Copy link

Note that whether I pass in ArrayViewMut into a function or pass a mutable pointer to it, Rust should prevent other parts of the code from accessing it elsewhere (even for reading). So that's why I thought it wouldn't matter too much.

This is true, it's not the safety that makes this unadvisable, it's the fact that you're requesting more ownership in your function than you need, and users will be annoyed that they have to re-create the view in their code since your function consumes it.

@jturner314
Copy link
Member

#879 has some discussion about this question and a proposal to improve ndarray's API in this respect. (See "Good way to express function parameters" under "Motivation".) With ndarray's current API, I pretty much agree with @multimeric, with the caveat that monomorphization bloat is also a concern, as mentioned in #879. To avoid this issue, it's probably best to use &ArrayBase<S, D> as the parameter type for the public function, but have the body of the public function immediately call a private, inner function which operates on ArrayView<A, D>. That gets pretty tedious to write, though. As a result, for my own private code, I often have functions take ArrayView/Mut parameters just to keep things simple.

@JP-Ellis
Copy link
Contributor Author

JP-Ellis commented Sep 2, 2021

@jturner314 Thanks for the link to the RFC! This looks like a very promising change for the next major release of ndarray.

@akern40
Copy link
Collaborator

akern40 commented Oct 26, 2024

This is a great question and, as jturner314 had pointed out, will be addressed by resolving #879. As a result, this will be closed by #1440 when it lands.

@akern40 akern40 linked a pull request Oct 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants