Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array api-related upstream-dev failures #8854

Merged
merged 79 commits into from
May 22, 2024
Merged

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Mar 19, 2024

This "fixes" the upstream-dev failures related to the removal of numpy.array_api. There are a couple of open questions, though:

  • array-api-strict is not installed by default, so namedarray would get a new dependency. Not sure how to deal with that – as far as I can tell, numpy.array_api was not supposed to be used that way, so maybe we need to use array-api-compat instead? What do you think, @andersy005, @Illviljan?
  • array-api-strict does not define Array.nbytes (causing a funny exception that wrongly claims DataArray does not define nbytes)
  • array-api-strict has a different DType class, which makes it tricky to work with both numpy dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes using isdtype, but numpy.isdtype will only exist in numpy>=2, array-api-strict's version does not define datetime / string / object dtypes, and numpy.issubdtype does not work with the non-numpy dtype class). So maybe we need to use array-api-compat internally?

keewis added 2 commits March 19, 2024 11:37
This would make it a dependency of `namedarray`, and not allow
behavior that is allowed but not required by the array API standard. Otherwise we can:
- use the main `numpy` namespace
- use `array_api_compat` (would also be a new dependency) to allow
optional behavior
@keewis keewis marked this pull request as draft March 19, 2024 13:17
@TomNicholas TomNicholas added topic-arrays related to flexible array support array API standard Support for the Python array API standard labels Mar 19, 2024
@Illviljan
Copy link
Contributor

Illviljan commented Mar 21, 2024

My intention with the tests in namedarray were intended to be using the strict array, so I think array-api-strict makes sense there. So if I failed to use the strict version properly we should fix those.

It should not be required for anything else than testing though.

>       arrayapi_a = nxp.asarray([2.1, 4], dtype=np.dtype(np.int64))

/home/runner/work/xarray/xarray/xarray/tests/test_namedarray.py:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/runner/micromamba/envs/xarray-tests/lib/python3.12/site-packages/array_api_strict/_creation_functions.py:55: in asarray
    _check_valid_dtype(dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dtype = dtype('int64')

    def _check_valid_dtype(dtype):
        # Note: Only spelling dtypes as the dtype objects is supported.
        if dtype not in (None,) + _all_dtypes:
>           raise ValueError("dtype must be one of the supported dtypes")
E           ValueError: dtype must be one of the supported dtypes

These should probably just be nxp.asarray([2.1, 4], dtype=nxp.int64) ?
https://github.com/data-apis/array-api-strict/blob/b52232c13b39d7b081e47b7bb1a0ef3dcbd1a6ff/array_api_strict/_dtypes.py#L45

@keewis
Copy link
Collaborator Author

keewis commented Mar 21, 2024

my question was about the import of numpy.array_api in xarray.namedarray._array_api. I removed it, but I didn't check whether nxp is used (as far as I can tell, only the doctests fail?)

Edit: I'll apply your suggestion tomorrow

@Illviljan
Copy link
Contributor

my question was about the import of numpy.array_api in xarray.namedarray._array_api. I removed it, but I didn't check whether nxp is used (as far as I can tell, only the doctests fail?)

Ah the doctests, the idea was to use an array api compliant array in the array api module. Previously numpy couldn't sometimes pass through the __array_namespace__ path so it couldn't be tested properly.
But now numpy should be array api compliant so I think it is safe to replace nxp with np in the doctests.
numpy/numpy#25911

@keewis keewis added the run-upstream Run upstream CI label Mar 22, 2024
Copy link
Collaborator Author

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably just be nxp.asarray([2.1, 4], dtype=nxp.int64) ?

that would have worked if np.array reprs would include the default dtype, so we might need to use a different dtype (int16 or int8, maybe? int32 has the same issue on a 32-bit OS)

In any case I'm unlikely to be able to continue working on this until the weekend next week, so if anyone wants to pick this up feel free to!

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
@keewis
Copy link
Collaborator Author

keewis commented Apr 10, 2024

unless I missed any (which is definitely possible), I've got it down to 3 unique errors:

  1. we're using dtype.kind to check for "nullable" dtypes (float, complex, object). The array api says we should use xp.isdtype for this, but that does not exist in numpy<2. So we'd have to dispatch to either np.issubdtype or xp.isdtype depending on availability. However, the Array API does not have object, string or datetime / timedelta dtypes (maybe more), so we can't really search for these using the Array API, which makes compatibility code tricky.
  2. the Array API has neither array.nbytes nor dtype.itemsize. We've been using the latter to compute the number of bytes in the array in the absence of nbytes. As mentioned above, the AttributeError for itemsize makes the error message claim that DataArray (or Dataset) don't define bytes.
  3. result_type currently does not at all dispatch to the Array API at all. This should be easy to fix in comparison, so I'll get to that later this week.

Edit: I'd appreciate input on how to best resolve 1 and 2

@Illviljan
Copy link
Contributor

For 2 you can calculate bytes from bits with xp.iinfo(dtype).bits // 8

@keewis
Copy link
Collaborator Author

keewis commented Apr 12, 2024

For 2 you can calculate bytes from bits with xp.iinfo(dtype).bits // 8

that works, although for floating dtypes there's xp.finfo. I've pushed an implementation, which should probably be refactored (I couldn't find examples on how to do this, so I'm accessing __array_namespace__ directly).

@Illviljan
Copy link
Contributor

Illviljan commented Apr 12, 2024

yeah, here's my version I was playing around with:

import array_api_strict as xp

a = xp.asarray([1, 2, 3], dtype=xp.int8)


def itemsize(x) -> int:
    """
    Length of one array element in bytes.

    Parameters
    ----------
    x :
        array to get the itemsize of.

    Examples
    --------
    >>> import array_api_strict as xp

    Bool

    >>> x = xp.asarray([0,1,2], dtype=xp.bool)
    >>> itemsize(xp.astype(x, xp.bool))
    1

    Integer

    >>> itemsize(xp.astype(x, xp.int8))
    1
    >>> itemsize(xp.astype(x, xp.int16))
    2
    >>> itemsize(xp.astype(x, xp.int32))
    4
    >>> itemsize(xp.astype(x, xp.int64))
    8

    Floating

    >>> itemsize(xp.astype(x, xp.float32))
    4
    >>> itemsize(xp.astype(x, xp.float64))
    8

    Floating complex

    >>> itemsize(xp.astype(x, xp.complex64))
    4
    >>> itemsize(xp.astype(x, xp.complex128))
    8
    """
    xp = x.__array_namespace__() if hasattr(x, "__array_namespace__") else np
    dtype = x.dtype

    if xp.isdtype(dtype, kind="bool"):
        return 1
    elif xp.isdtype(dtype, kind="integral"):
        return xp.iinfo(dtype).bits // 8
    else:
        return xp.finfo(dtype).bits // 8

To get xp you can use:

def _get_data_namespace(x: NamedArray[Any, Any]) -> ModuleType:

@keewis keewis requested a review from shoyer May 16, 2024 14:23
@keewis
Copy link
Collaborator Author

keewis commented May 16, 2024

@shoyer, could I ask for another review? I think I changed the code to what we discussed yesterday, but I might have missed something.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

I made a few suggestions of code comments for future readers.

xarray/core/dtypes.py Outdated Show resolved Hide resolved
xarray/core/dtypes.py Outdated Show resolved Hide resolved
@keewis
Copy link
Collaborator Author

keewis commented May 16, 2024

the remaining two errors on the upstream-dev CI are due to a bug in numpy.isdtype (numpy/numpy#26441). This appears to only have been an issue on numpy's main, not their 2.0 branch / the rc, so this should work as soon as the fix is uploaded to the nightly wheels.

@keewis
Copy link
Collaborator Author

keewis commented May 21, 2024

looks like this should finally be ready for merging?

@keewis keewis added the plan to merge Final call for comments label May 21, 2024
@dcherian
Copy link
Contributor

👏 🫡

image

@dcherian dcherian merged commit 026aa7c into pydata:main May 22, 2024
26 of 28 checks passed
@keewis keewis deleted the numpy2-array-api branch May 22, 2024 21:05
andersy005 added a commit that referenced this pull request May 23, 2024
* main:
  array api-related upstream-dev failures (#8854)
  (fix): equality check against singleton `PandasExtensionArray` (#9032)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array API standard Support for the Python array API standard plan to merge Final call for comments run-upstream Run upstream CI topic-arrays related to flexible array support
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants