Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from pydata:main #565

Merged
merged 26 commits into from
Jul 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
879b06b
Cleanup test_coding_times.py (#9223)
Illviljan Jul 10, 2024
7ff5d8d
Use reshape and ravel from duck_array_ops in coding/times.py (#9225)
Illviljan Jul 11, 2024
eb0fbd7
Use duckarray assertions in test_coding_times (#9226)
Illviljan Jul 11, 2024
ff15a08
Fix time indexing regression in `convert_calendar` (#9192)
hmaarrfk Jul 11, 2024
7087ca4
`numpy` 2 compatibility in the `netcdf4` and `h5netcdf` backends (#9136)
keewis Jul 11, 2024
e12aa44
`numpy` 2 compatibility in the iris code paths (#9156)
keewis Jul 11, 2024
c1965c2
switch the documentation to run with `numpy>=2` (#9177)
keewis Jul 11, 2024
a69815f
exclude the bots from the release notes (#9235)
keewis Jul 11, 2024
b8aaa53
Add a `.drop_attrs` method (#8258)
max-sixty Jul 11, 2024
42fd510
Update _typing.py
Illviljan Jul 11, 2024
78c8d46
Revert "Update _typing.py"
Illviljan Jul 11, 2024
f85da8a
Test main push
Illviljan Jul 11, 2024
5edd249
Revert "Test main push"
Illviljan Jul 11, 2024
0eac740
Allow mypy to run in vscode (#9239)
max-sixty Jul 13, 2024
d8b7644
Fix typing for test_plot.py (#9234)
Illviljan Jul 13, 2024
9c26ca7
Added a space to the documentation (#9247)
ChrisCleaner Jul 15, 2024
076c0c2
test push
keewis Jul 15, 2024
7477fd1
Per-variable specification of boolean parameters in open_dataset (#9218)
Ostheer Jul 16, 2024
71fce9b
Enable pandas type checking (#9213)
headtr1ck Jul 17, 2024
5d9d984
fix fallback isdtype method (#9250)
headtr1ck Jul 17, 2024
c2aebd8
Fix mypy on main (#9252)
max-sixty Jul 17, 2024
b55c783
Grouper, Resampler as public api (#8840)
dcherian Jul 18, 2024
3013fb4
Update dropna docstring (#9257)
TomNicholas Jul 18, 2024
39d5b39
Delete ``base`` and ``loffset`` parameters to resample (#9233)
dcherian Jul 19, 2024
07307b4
groupby, resample: Deprecate some positional args (#9236)
dcherian Jul 19, 2024
91a52dc
Add encode_cf_datetime benchmark (#9262)
spencerkclark Jul 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
changelog:
exclude:
authors:
- dependabot
- pre-commit-ci
18 changes: 18 additions & 0 deletions asv_bench/benchmarks/coding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import numpy as np

import xarray as xr

from . import parameterized


@parameterized(["calendar"], [("standard", "noleap")])
class EncodeCFDatetime:
def setup(self, calendar):
self.units = "days since 2000-01-01"
self.dtype = np.dtype("int64")
self.times = xr.date_range(
"2000", freq="D", periods=10000, calendar=calendar
).values

def time_encode_cf_datetime(self, calendar):
xr.coding.times.encode_cf_datetime(self.times, self.units, calendar, self.dtype)
5 changes: 2 additions & 3 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ $conda remove -y numba numbagg sparse
# temporarily remove numexpr
$conda remove -y numexpr
# temporarily remove backends
$conda remove -y cf_units hdf5 h5py netcdf4 pydap
$conda remove -y pydap
# forcibly remove packages to avoid artifacts
$conda remove -y --force \
numpy \
Expand All @@ -37,8 +37,7 @@ python -m pip install \
numpy \
scipy \
matplotlib \
pandas \
h5py
pandas
# for some reason pandas depends on pyarrow already.
# Remove once a `pyarrow` version compiled with `numpy>=2.0` is on `conda-forge`
python -m pip install \
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
- pint>=0.22
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies:
- nbsphinx
- netcdf4>=1.5
- numba
- numpy>=1.21,<2
- numpy>=2
- packaging>=21.3
- pandas>=1.4,!=2.1.0
- pooch
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
# - pint>=0.22
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies:
- numba
- numbagg
- numexpr
- numpy<2
- numpy
- opt_einsum
- packaging
- pandas
Expand Down
4 changes: 4 additions & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -693,3 +693,7 @@

coding.times.CFTimedeltaCoder
coding.times.CFDatetimeCoder

core.groupers.Grouper
core.groupers.Resampler
core.groupers.EncodedGroups
25 changes: 21 additions & 4 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ Dataset contents
Dataset.drop_duplicates
Dataset.drop_dims
Dataset.drop_encoding
Dataset.drop_attrs
Dataset.set_coords
Dataset.reset_coords
Dataset.convert_calendar
Expand Down Expand Up @@ -306,6 +307,7 @@ DataArray contents
DataArray.drop_indexes
DataArray.drop_duplicates
DataArray.drop_encoding
DataArray.drop_attrs
DataArray.reset_coords
DataArray.copy
DataArray.convert_calendar
Expand Down Expand Up @@ -801,6 +803,18 @@ DataArray
DataArrayGroupBy.dims
DataArrayGroupBy.groups

Grouper Objects
---------------

.. currentmodule:: xarray.core

.. autosummary::
:toctree: generated/

groupers.BinGrouper
groupers.UniqueGrouper
groupers.TimeResampler


Rolling objects
===============
Expand Down Expand Up @@ -1026,17 +1040,20 @@ DataArray
Accessors
=========

.. currentmodule:: xarray
.. currentmodule:: xarray.core

.. autosummary::
:toctree: generated/

core.accessor_dt.DatetimeAccessor
core.accessor_dt.TimedeltaAccessor
core.accessor_str.StringAccessor
accessor_dt.DatetimeAccessor
accessor_dt.TimedeltaAccessor
accessor_str.StringAccessor


Custom Indexes
==============
.. currentmodule:: xarray

.. autosummary::
:toctree: generated/

Expand Down
3 changes: 3 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,8 @@
"Variable": "~xarray.Variable",
"DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
"DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
"Grouper": "~xarray.core.groupers.Grouper",
"Resampler": "~xarray.core.groupers.Resampler",
# objects without namespace: numpy
"ndarray": "~numpy.ndarray",
"MaskedArray": "~numpy.ma.MaskedArray",
Expand All @@ -169,6 +171,7 @@
"CategoricalIndex": "~pandas.CategoricalIndex",
"TimedeltaIndex": "~pandas.TimedeltaIndex",
"DatetimeIndex": "~pandas.DatetimeIndex",
"IntervalIndex": "~pandas.IntervalIndex",
"Series": "~pandas.Series",
"DataFrame": "~pandas.DataFrame",
"Categorical": "~pandas.Categorical",
Expand Down
4 changes: 2 additions & 2 deletions doc/getting-started-guide/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,9 +352,9 @@ Some packages may have additional functionality beyond what is shown here. You c
How does xarray handle missing values?
--------------------------------------

**xarray can handle missing values using ``np.NaN``**
**xarray can handle missing values using ``np.nan``**

- ``np.NaN`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.NaN`` is a constant value in NumPy that represents "Not a Number" or missing values.
- ``np.nan`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.nan`` is a constant value in NumPy that represents "Not a Number" or missing values.

- Most of xarray's computation methods are designed to automatically handle missing values appropriately.

Expand Down
4 changes: 2 additions & 2 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ However, the functions also take missing values in the data into account:

.. ipython:: python

data = xr.DataArray([np.NaN, 2, 4])
data = xr.DataArray([np.nan, 2, 4])
weights = xr.DataArray([8, 1, 1])

data.weighted(weights).mean()
Expand All @@ -444,7 +444,7 @@ If the weights add up to to 0, ``sum`` returns 0:

data.weighted(weights).sum()

and ``mean``, ``std`` and ``var`` return ``NaN``:
and ``mean``, ``std`` and ``var`` return ``nan``:

.. ipython:: python

Expand Down
93 changes: 82 additions & 11 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. currentmodule:: xarray

.. _groupby:

GroupBy: Group and Bin Data
Expand All @@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
- Apply some function to each group.
- Combine your groups back into a single data object.

Group by operations work on both :py:class:`~xarray.Dataset` and
:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
Group by operations work on both :py:class:`Dataset` and
:py:class:`DataArray` objects. Most of the examples focus on grouping by
a single one-dimensional variable, although support for grouping
over a multi-dimensional variable has recently been implemented. Note that for
one-dimensional data, it is usually faster to rely on pandas' implementation of
the same pipeline.

.. tip::

To substantially improve the performance of GroupBy operations, particularly
with dask `install the flox package <https://flox.readthedocs.io>`_. flox
`Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
of GroupBy operations, particularly with dask. flox
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
by allowing grouping by multiple variables, and lazy grouping by dask arrays.
If installed, Xarray will automatically use flox by default.

Split
~~~~~
Expand Down Expand Up @@ -87,7 +90,7 @@ Binning
Sometimes you don't want to use all the unique values to determine the groups
but instead want to "bin" the data into coarser groups. You could always create
a customized coordinate, but xarray facilitates this via the
:py:meth:`~xarray.Dataset.groupby_bins` method.
:py:meth:`Dataset.groupby_bins` method.

.. ipython:: python

Expand All @@ -110,7 +113,7 @@ Apply
~~~~~

To apply a function to each group, you can use the flexible
:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
concatenated back together along the group axis:

.. ipython:: python
Expand All @@ -121,8 +124,8 @@ concatenated back together along the group axis:

arr.groupby("letters").map(standardize)

GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
aggregation function:

.. ipython:: python
Expand Down Expand Up @@ -183,7 +186,7 @@ Iterating and Squeezing
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
a GroupBy object. This behaviour is being removed.
You can always squeeze explicitly later with the Dataset or DataArray
:py:meth:`~xarray.DataArray.squeeze` methods.
:py:meth:`DataArray.squeeze` methods.

.. ipython:: python

Expand Down Expand Up @@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)

Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
may be desirable:

.. ipython:: python
Expand All @@ -232,3 +235,71 @@ applying your function, and then unstacking the result:

stacked = da.stack(gridcell=["ny", "nx"])
stacked.groupby("gridcell").sum(...).unstack("gridcell")

.. _groupby.groupers:

Grouper Objects
~~~~~~~~~~~~~~~

Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
variables with a combination of categorical grouping, binning, and resampling; or more specializations like
spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
extension point using ``Grouper`` objects.

.. tip::

See the `grouper design`_ doc for more detail on the motivation and design ideas behind
Grouper objects.

.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md

For now Xarray provides three specialized Grouper objects:

1. :py:class:`groupers.UniqueGrouper` for categorical grouping
2. :py:class:`groupers.BinGrouper` for binned grouping
3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate

These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
That is,

.. code-block:: python

ds.groupby("x")

is identical to

.. code-block:: python

from xarray.groupers import UniqueGrouper

ds.groupby(x=UniqueGrouper())

; and

.. code-block:: python

ds.groupby_bins("x", bins=bins)

is identical to

.. code-block:: python

from xarray.groupers import BinGrouper

ds.groupby(x=BinGrouper(bins))

and

.. code-block:: python

ds.resample(time="ME")

is identical to

.. code-block:: python

from xarray.groupers import TimeResampler

ds.resample(time=TimeResampler("ME"))
4 changes: 2 additions & 2 deletions doc/user-guide/interpolation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,8 +292,8 @@ Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.
axes[0].set_title("Raw data")

# Interpolated data
new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.sizes["lon"] * 4)
new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.sizes["lat"] * 4)
new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)
new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)
dsi = ds.interp(lat=new_lat, lon=new_lon)
dsi.air.plot(ax=axes[1])
@savefig interpolation_sample3.png width=8in
Expand Down
2 changes: 1 addition & 1 deletion doc/user-guide/terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ complete examples, please consult the relevant documentation.*
combined_ds

lazy
Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
Lazily-evaluated operations do not load data into memory until necessary. Instead of doing calculations
right away, xarray lets you plan what calculations you want to do, like finding the
average temperature in a dataset.This planning is called "lazy evaluation." Later, when
you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"
Expand Down
5 changes: 3 additions & 2 deletions doc/user-guide/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,9 +239,10 @@ If the array type you want to generate has an array API-compliant top-level name
you can use this neat trick:

.. ipython:: python
:okwarning:

from numpy import array_api as xp # available in numpy 1.26.0
import numpy as xp # compatible in numpy 2.0

# use `import numpy.array_api as xp` in numpy>=1.23,<2.0

from hypothesis.extra.array_api import make_strategies_namespace

Expand Down
Loading
Loading