PyArrayLike type introduced #383

124C41p · 2023-07-06T20:33:06Z

Summary

Extracts a read only reference if the correct numpy array type is given. Tries to convert the input into the correct type otherwise.

Resolves #382

Disclaimer

This implementation seems to work as intended and covers my personal use case. However, I am not sure how well it performs (you certainly want it to be not significantly slower than calling np.asarray(...) on Python side, and then extracting PyReadonlyArray<...> on Rust side). Also, there are several tasks remaining like documentation and proper error handling.

Do you want to take over at this point for finalizing the branch in your own style?

adamreichold

Do you want to take over at this point for finalizing the branch in your own style?

I would like to encourage you to continue working on this until it is ready to be merged. Projects like this benefit from more people with different styles working on them.

I added some smaller comments inline, but the main point I am wonder now that I have look at the pull request is whether the converted case should be a plain ndarray::Array or rather a NumPy array as well. This would have the benefit of a more uniform API and would allow the result to be passed to functions expecting a NumPy array. I am not sure if the cost of creating a NumPy array versus an ndarray::Array is prohibitive though.

I think spelled out this would look something like

pub struct PyArrayLike<'py, T, D>(PyReadonlyArray<'py, T, D>);

impl Deref for PyArrayLike<'py, T, D> {
  type Target = PyReadonlyArray<'py, T, D>;

  ...
}

impl FromPyObject<'py> for PyArrayLike<'py, T, D> {
   // ... as before, just constructing `PyArray` ...
}

What do you think?

src/lib.rs

src/array_like.rs

124C41p · 2023-07-07T12:12:01Z

I added some smaller comments inline, but the main point I am wonder now that I have look at the pull request is whether the converted case should be a plain ndarray::Array or rather a NumPy array as well. This would have the benefit of a more uniform API and would allow the result to be passed to functions expecting a NumPy array. I am not sure if the cost of creating a NumPy array versus an ndarray::Array is prohibitive though.

I think spelled out this would look something like
pub struct PyArrayLike<'py, T, D>(PyReadonlyArray<'py, T, D>);

impl Deref for PyArrayLike<'py, T, D> {
  type Target = PyReadonlyArray<'py, T, D>;

  ...
}

impl FromPyObject<'py> for PyArrayLike<'py, T, D> {
   // ... as before, just constructing `PyArray` ...
}
What do you think?

I like this idea, and I just published a commit implementing it. There might be one downside however: If the array is created by the np.asarray fallback, we only get a PyReadonlyArray, but we should actually own a PyArray. If the user calls .to_owned_array() afterwards, we are creating a copy of a copy. Maybe it would be better to define it that way?

pub struct PyArrayLike<T, D>(Cow<PyArray<T, D>>)

adamreichold · 2023-07-07T13:04:04Z

If the array is created by the np.asarray fallback, we only get a PyReadonlyArray, but we should actually own a PyArray.

PyReadonlyArray<'py, T, D> is just a wrapper around a &'py PyArray<T, D> and derefs into that. A bare reference is as much ownership as Rust code can get for Python types. So the call to to_owned_array will do the same whether applied to PyReadonlyArray or &PyArray.

(Note that even constructors like PyArray::from_vec return only &PyArray, not an owned PyArray in the Rust sense since this type always lives on the Python heap.)

adamreichold · 2023-07-07T13:11:39Z

src/array_like.rs

+        //     }
+        // }
+
+        let numpy_module = get_array_module(py)?;


I think it would be nice to cache the method reference using GILOnceCell.

Also, do we need to set the dtype keyword argument? I think this would try harder than your previous code would?

I think it would be nice to cache the method reference using GILOnceCell.

That's nice! I was not aware of that Cell type.

Also, do we need to set the dtype keyword argument? I think this would try harder than your previous code would?

I think it is essential to set the dtype. E.g. if the input array is an ndarray of dtype int32 then np.asarray(...) wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D> from it.

E.g. if the input array is an ndarray of dtype int32 then np.asarray(...) wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D> from it.

But this is exactly what I meant, your direct-line code using .extract::<Vec<T>> will also not convert i32 to f64 if T is f64. Hence I would like to avoid making PyArrayLike do two different things, turning non-array storage in array storage and converting element types.

E.g. if the input array is an ndarray of dtype int32 then np.asarray(...) wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D> from it.

But this is exactly what I meant, your direct-line code using .extract::<Vec<T>> will also not convert i32 to f64 if T is f64. Hence I would like to avoid making PyArrayLike do two different things, turning non-array storage in array storage and converting element types.

Actually .extract::<Vec<f64>> does convert i32 to f64 and everything else which is meaningful (it will not convert f64 to i32 e.g.).

For my personal use case, the whole point of the PyArrayLike type is to obviate the need for calling np.asarray on Python side before calling a PyO3 function. So it should be able to match against anything which can be reasonably converted into the target type.

Actually .extract::<Vec> does convert i32 to f64 and everything else which is meaningful (it will not convert f64 to i32 e.g.).

This particular case is an idiosyncrasy of PyFloat_AsDouble falling back to __index__ but FromPyObject will generally not do lossy conversions and coerce types, whereas

>>> np.asarray([1.5, 2.5, 3.5], dtype=np.int32) array([1, 2, 3], dtype=int32)

happily throws away the fractional parts. As it is written now, passing [1.5, 2.5, 3.5] as PyArrayLike1<i32> will fall through the Vec extraction just to be made work by reaching the call to asarray with dtype set.

I am not saying, to coercing the data type has no usage, but the above seems inconsistent to me and if you really need lists of floating point numbers as arrays of integers I would propose separate types for the coercing and non-coercing variants and to leave out the .extract-based code path for the coercing one.

Not sure about reasonable names though? Since our MSRV allows const generics by now, we could even using a const COERCE: bool generic parameter?

(note that the (Python) users of my library cannot even see/manipulate the COERCE flag)

Having the element type as a generic parameter is a core design principle for the current rust-numpy and it definitely does chafe against Python API conventions, but it allows the Rust code to be statically checked and optimized.

In practice, this leads to entry points which dispatch into different instantiations of one or more generic functions which is also how a COERCE generic parameter could be reified as a function parameter on the Python side.

Having the element type as a generic parameter is a core design principle for the current rust-numpy and it definitely does chafe against Python API conventions, but it allows the Rust code to be statically checked and optimized.

I was not criticizing that design at all. I just wanted to point out that the COERCE-flag is opaque to my users, so I would rather not put potentially "dangerous" code behind it.

I can only re-iterate: I do not want to maintain rust-numpy-specific semantics for PyArrayLike.

Ok, thanks for clarifying. But please understand that I don't see the point in developing a feature that I would probably never use. I would rather create my personal crate (I am not using internals of rust-numpy after all), and maybe publish it some day.

I am still very grateful for your patience and all your help: I certainly learned a lot from our discussions. Thanks a lot!

I was not criticizing that design at all. I just wanted to point out that the COERCE-flag is opaque to my users, so I would rather not put potentially "dangerous" code behind it.

I did not interpret your statements that way. I made that reference to point out that the T and D in a #[pyfunction] taking an PyArray<T, D> argument are similarly not visible to Python callers. If you want to provide a Python API which does handle them dynamically, you need multiple instantiations (or the IxDyn overhead). COERCE is no different in that regard.

I am still very grateful for your patience and all your help: I certainly learned a lot from our discussions. Thanks a lot!

Fair enough. Do you mind if I try to bring this into a shape I am happy to merge and maintain even if you do not use it yourself?

That's perfectly fine for me. Please go ahead as you like.

Ok, I pushed what I think I will go with. Still needs docs and examples though.

src/array_like.rs

src/array.rs

124C41p

Just out of curiosity: You mentioned ob.downcast::<PyArray<T, D>>()?.readonly() being faster than ob.extract::<PyReadonlyArray<T, D>>(). But is there any downside? Is it less safe? Is there any situation where you would prefer the second over the first?

adamreichold · 2023-07-09T08:30:52Z

Just out of curiosity: You mentioned ob.downcast::<PyArray<T, D>>()?.readonly() being faster than ob.extract::<PyReadonlyArray<T, D>>(). But is there any downside? Is it less safe? Is there any situation where you would prefer the second over the first?

Functionally they are equivalent. There is one performance paper cut that if you do not use the error returned by ob.extract::<PyReadonlyArray<T, D>>(), e.g. in

if let Ok(array) = ob.extract<PyReadonlyArray<T, D>>() {
  return Ok(Self(array));
}

then this is measurably slower than ob.downcast::<PyArray<T, D>>()?.readonly() because extract always returns Result<_, PyErr> which means the PyDowncastError returned by downcast (which extract uses internally) needs to be wrapped up in a PyErr, only to be thrown away again.

This is a general PyO3 issue which have not been able to resolve thus far. (FromPyObject will probably need an associated type for the error, but combining multiple FromPyObject implementations into new ones still needs to be possible. No rocket science, but a lot of code churn over the whole ecosystem to make it happen.)

You can also see rust-numpy trying to work around this here:

rust-numpy/src/array.rs

Lines 135 to 137 in 3843fa9

    
           fn is_type_of(ob: &PyAny) -> bool { 
        
               Self::extract::<IgnoreError>(ob).is_ok() 
        
           }

124C41p · 2023-07-11T22:36:14Z

The review request was a misclick, sorry.

kngwyu · 2023-07-12T07:40:30Z

@124C41p Thank you for this PR, and @adamreichold thank you for your encouraging tutoring.
I have one question about the design: is this really need to be struct? I mean, if we don't need C: Coerce information after extracting, maybe it's better to just have a function, say, extract_arraylike(obj, allow_typecast).

124C41p · 2023-07-12T11:20:15Z

@kngwyu To my understanding having a struct is necessary for conveniently using it inside the header of a pyfunction. Also, since the struct has only two fields, one being a phantom type, it should get optimized away by the compiler anyway. So extracting a PyArrayLike should not be more expensive (in the fast path) than extracting a PyReadonlyArray.

kngwyu · 2023-07-12T16:19:18Z

I see, you're right that we definitely need them to use as function arguments.

kngwyu · 2023-07-14T03:26:48Z

BTW, I checked asarray and _array_fromobject_generic implementations, finding that it just tries some faster paths (e.g., obj is already an array) and then calls PyArray_CheckFromAny_int, which is an internal version of PyArray_CheckFromAny. So, I think that using PyArray_CheckFromAny is better because it is an exposed C-API, if we can handle faster paths correctly.
So now the problem is whether we can handle faster paths simply enough. The code was a bit complex to read for me, but it looks like only handles the cases where the given object is already a numpy array or an instance of an array subclass. Maybe it's worth considering to use it?

adamreichold · 2023-07-14T06:34:59Z

BTW, I checked asarray and _array_fromobject_generic implementations, finding that it just tries some faster paths (e.g., obj is already an array) and then calls PyArray_CheckFromAny_int, which is an internal version of PyArray_CheckFromAny. So, I think that using PyArray_CheckFromAny is better because it is an exposed C-API, if we can handle faster paths correctly. So now the problem is whether we can handle faster paths simply enough. The code was a bit complex to read for me, but it looks like only handles the cases where the given object is already a numpy array or an instance of an array subclass. Maybe it's worth considering to use it?

I agree in principle and also started looking into which FFI code paths this ends up eventually. But as you say, it is somewhat convoluted and not at all obvious how to call PyArray_CheckFromAny. Therefore, I would suggest finishing this using the asarray fallback and then working on the FFI integration as an optimization follow-up. (Basically, this needs docs and examples to ready for review, but I have not had the time to write that yet.)

kngwyu · 2023-07-14T08:01:25Z

I agree, but let me confirm one thing...

not at all obvious how to call PyArray_CheckFromAny

We already have it, right? Do you mean that it's not obvious how to use it?

adamreichold · 2023-07-14T08:09:10Z

We already have it, right? Do you mean that it's not obvious how to use it?

Yes, I do not refer to the signature, just how to use it correctly. Which code paths to handle ourselves. Which parameters to pass and so on.

adamreichold · 2023-10-03T13:04:13Z

@kngwyu I added docs and examples and think this is good to go now. Could you have another look? Thanks!

kngwyu

Thanks, only one comment about the document

src/array_like.rs

Extracts a read-only reference if the correct NumPy array type is given. Tries to convert the input into the correct type using `numpy.asarray` otherwise.

adamreichold reviewed Jul 7, 2023

View reviewed changes

src/array_like.rs Outdated Show resolved Hide resolved

adamreichold reviewed Jul 8, 2023

View reviewed changes

src/array.rs Outdated Show resolved Hide resolved

124C41p commented Jul 9, 2023

View reviewed changes

124C41p requested a review from adamreichold July 11, 2023 20:41

kngwyu approved these changes Oct 4, 2023

View reviewed changes

src/array_like.rs Outdated Show resolved Hide resolved

Add PyArrayLike wrapper around PyReadonlyArray

7f141a8

Extracts a read-only reference if the correct NumPy array type is given. Tries to convert the input into the correct type using `numpy.asarray` otherwise.

adamreichold merged commit 1671b00 into PyO3:main Oct 7, 2023
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyArrayLike type introduced #383

PyArrayLike type introduced #383

124C41p commented Jul 6, 2023

adamreichold left a comment

124C41p commented Jul 7, 2023 •

edited

Loading

adamreichold commented Jul 7, 2023

adamreichold Jul 7, 2023

124C41p Jul 7, 2023

adamreichold Jul 7, 2023

124C41p Jul 7, 2023 •

edited

Loading

adamreichold Jul 7, 2023 •

edited

Loading

adamreichold Jul 8, 2023

124C41p Jul 8, 2023

adamreichold Jul 8, 2023

124C41p Jul 8, 2023

adamreichold Jul 8, 2023

124C41p left a comment

adamreichold commented Jul 9, 2023

124C41p commented Jul 11, 2023

kngwyu commented Jul 12, 2023

124C41p commented Jul 12, 2023

kngwyu commented Jul 12, 2023 •

edited

Loading

kngwyu commented Jul 14, 2023 •

edited

Loading

adamreichold commented Jul 14, 2023

kngwyu commented Jul 14, 2023

adamreichold commented Jul 14, 2023

adamreichold commented Oct 3, 2023

kngwyu left a comment

PyArrayLike type introduced #383

PyArrayLike type introduced #383

Conversation

124C41p commented Jul 6, 2023

Summary

Disclaimer

adamreichold left a comment

Choose a reason for hiding this comment

124C41p commented Jul 7, 2023 • edited Loading

adamreichold commented Jul 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

124C41p Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

adamreichold Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

124C41p left a comment

Choose a reason for hiding this comment

adamreichold commented Jul 9, 2023

124C41p commented Jul 11, 2023

kngwyu commented Jul 12, 2023

124C41p commented Jul 12, 2023

kngwyu commented Jul 12, 2023 • edited Loading

kngwyu commented Jul 14, 2023 • edited Loading

adamreichold commented Jul 14, 2023

kngwyu commented Jul 14, 2023

adamreichold commented Jul 14, 2023

adamreichold commented Oct 3, 2023

kngwyu left a comment

Choose a reason for hiding this comment

124C41p commented Jul 7, 2023 •

edited

Loading

124C41p Jul 7, 2023 •

edited

Loading

adamreichold Jul 7, 2023 •

edited

Loading

kngwyu commented Jul 12, 2023 •

edited

Loading

kngwyu commented Jul 14, 2023 •

edited

Loading