-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyArrayLike type introduced #383
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to take over at this point for finalizing the branch in your own style?
I would like to encourage you to continue working on this until it is ready to be merged. Projects like this benefit from more people with different styles working on them.
I added some smaller comments inline, but the main point I am wonder now that I have look at the pull request is whether the converted case should be a plain ndarray::Array
or rather a NumPy array as well. This would have the benefit of a more uniform API and would allow the result to be passed to functions expecting a NumPy array. I am not sure if the cost of creating a NumPy array versus an ndarray::Array
is prohibitive though.
I think spelled out this would look something like
pub struct PyArrayLike<'py, T, D>(PyReadonlyArray<'py, T, D>);
impl Deref for PyArrayLike<'py, T, D> {
type Target = PyReadonlyArray<'py, T, D>;
...
}
impl FromPyObject<'py> for PyArrayLike<'py, T, D> {
// ... as before, just constructing `PyArray` ...
}
What do you think?
I like this idea, and I just published a commit implementing it. There might be one downside however: If the array is created by the pub struct PyArrayLike<T, D>(Cow<PyArray<T, D>>) |
(Note that even constructors like |
src/array_like.rs
Outdated
// } | ||
// } | ||
|
||
let numpy_module = get_array_module(py)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to cache the method reference using GILOnceCell
.
Also, do we need to set the dtype
keyword argument? I think this would try harder than your previous code would?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to cache the method reference using
GILOnceCell
.
That's nice! I was not aware of that Cell type.
Also, do we need to set the
dtype
keyword argument? I think this would try harder than your previous code would?
I think it is essential to set the dtype
. E.g. if the input array is an ndarray of dtype int32
then np.asarray(...)
wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D>
from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. if the input array is an ndarray of dtype int32 then np.asarray(...) wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D> from it.
But this is exactly what I meant, your direct-line code using .extract::<Vec<T>>
will also not convert i32
to f64
if T
is f64
. Hence I would like to avoid making PyArrayLike
do two different things, turning non-array storage in array storage and converting element types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. if the input array is an ndarray of dtype int32 then np.asarray(...) wouldn't do anything, so we wouldn't be able to extract a PyReadonlyArray<f64, D> from it.
But this is exactly what I meant, your direct-line code using
.extract::<Vec<T>>
will also not converti32
tof64
ifT
isf64
. Hence I would like to avoid makingPyArrayLike
do two different things, turning non-array storage in array storage and converting element types.
Actually .extract::<Vec<f64>>
does convert i32
to f64
and everything else which is meaningful (it will not convert f64
to i32
e.g.).
For my personal use case, the whole point of the PyArrayLike
type is to obviate the need for calling np.asarray
on Python side before calling a PyO3 function. So it should be able to match against anything which can be reasonably converted into the target type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually .extract::<Vec> does convert i32 to f64 and everything else which is meaningful (it will not convert f64 to i32 e.g.).
This particular case is an idiosyncrasy of PyFloat_AsDouble
falling back to __index__
but FromPyObject
will generally not do lossy conversions and coerce types, whereas
>>> np.asarray([1.5, 2.5, 3.5], dtype=np.int32)
array([1, 2, 3], dtype=int32)
happily throws away the fractional parts. As it is written now, passing [1.5, 2.5, 3.5]
as PyArrayLike1<i32>
will fall through the Vec
extraction just to be made work by reaching the call to asarray
with dtype
set.
I am not saying, to coercing the data type has no usage, but the above seems inconsistent to me and if you really need lists of floating point numbers as arrays of integers I would propose separate types for the coercing and non-coercing variants and to leave out the .extract
-based code path for the coercing one.
Not sure about reasonable names though? Since our MSRV allows const generics by now, we could even using a const COERCE: bool
generic parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note that the (Python) users of my library cannot even see/manipulate the COERCE flag)
Having the element type as a generic parameter is a core design principle for the current rust-numpy
and it definitely does chafe against Python API conventions, but it allows the Rust code to be statically checked and optimized.
In practice, this leads to entry points which dispatch into different instantiations of one or more generic functions which is also how a COERCE
generic parameter could be reified as a function parameter on the Python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the element type as a generic parameter is a core design principle for the current
rust-numpy
and it definitely does chafe against Python API conventions, but it allows the Rust code to be statically checked and optimized.
I was not criticizing that design at all. I just wanted to point out that the COERCE
-flag is opaque to my users, so I would rather not put potentially "dangerous" code behind it.
I can only re-iterate: I do not want to maintain
rust-numpy
-specific semantics forPyArrayLike
.
Ok, thanks for clarifying. But please understand that I don't see the point in developing a feature that I would probably never use. I would rather create my personal crate (I am not using internals of rust-numpy
after all), and maybe publish it some day.
I am still very grateful for your patience and all your help: I certainly learned a lot from our discussions. Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not criticizing that design at all. I just wanted to point out that the COERCE-flag is opaque to my users, so I would rather not put potentially "dangerous" code behind it.
I did not interpret your statements that way. I made that reference to point out that the T
and D
in a #[pyfunction]
taking an PyArray<T, D>
argument are similarly not visible to Python callers. If you want to provide a Python API which does handle them dynamically, you need multiple instantiations (or the IxDyn
overhead). COERCE
is no different in that regard.
I am still very grateful for your patience and all your help: I certainly learned a lot from our discussions. Thanks a lot!
Fair enough. Do you mind if I try to bring this into a shape I am happy to merge and maintain even if you do not use it yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfectly fine for me. Please go ahead as you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I pushed what I think I will go with. Still needs docs and examples though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: You mentioned ob.downcast::<PyArray<T, D>>()?.readonly()
being faster than ob.extract::<PyReadonlyArray<T, D>>()
. But is there any downside? Is it less safe? Is there any situation where you would prefer the second over the first?
Functionally they are equivalent. There is one performance paper cut that if you do not use the error returned by if let Ok(array) = ob.extract<PyReadonlyArray<T, D>>() {
return Ok(Self(array));
} then this is measurably slower than This is a general PyO3 issue which have not been able to resolve thus far. ( You can also see Lines 135 to 137 in 3843fa9
|
The review request was a misclick, sorry. |
@124C41p Thank you for this PR, and @adamreichold thank you for your encouraging tutoring. |
@kngwyu To my understanding having a struct is necessary for conveniently using it inside the header of a pyfunction. Also, since the struct has only two fields, one being a phantom type, it should get optimized away by the compiler anyway. So extracting a |
I see, you're right that we definitely need them to use as function arguments. |
BTW, I checked asarray and _array_fromobject_generic implementations, finding that it just tries some faster paths (e.g., |
I agree in principle and also started looking into which FFI code paths this ends up eventually. But as you say, it is somewhat convoluted and not at all obvious how to call |
I agree, but let me confirm one thing...
We already have it, right? Do you mean that it's not obvious how to use it? |
Yes, I do not refer to the signature, just how to use it correctly. Which code paths to handle ourselves. Which parameters to pass and so on. |
@kngwyu I added docs and examples and think this is good to go now. Could you have another look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, only one comment about the document
Extracts a read-only reference if the correct NumPy array type is given. Tries to convert the input into the correct type using `numpy.asarray` otherwise.
Summary
Extracts a read only reference if the correct numpy array type is given. Tries to convert the input into the correct type otherwise.
Resolves #382
Disclaimer
This implementation seems to work as intended and covers my personal use case. However, I am not sure how well it performs (you certainly want it to be not significantly slower than calling
np.asarray(...)
on Python side, and then extractingPyReadonlyArray<...>
on Rust side). Also, there are several tasks remaining like documentation and proper error handling.Do you want to take over at this point for finalizing the branch in your own style?