Skip to content

Commit

Permalink
Merge branch 'main' into test_coding_times_cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
Illviljan committed Aug 4, 2024
2 parents 1d6077e + 1ac19c4 commit 4db0c4e
Show file tree
Hide file tree
Showing 92 changed files with 2,296 additions and 1,867 deletions.
5 changes: 5 additions & 0 deletions .github/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
changelog:
exclude:
authors:
- dependabot
- pre-commit-ci
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ repos:
- id: mixed-line-ending
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.4.7'
rev: 'v0.5.0'
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
Expand All @@ -30,7 +30,7 @@ repos:
additional_dependencies: ["black==24.4.2"]
- id: blackdoc-autoupdate-black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.0
rev: v1.10.1
hooks:
- id: mypy
# Copied from setup.cfg
Expand Down
11 changes: 5 additions & 6 deletions HOW_TO_RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,13 @@ upstream https://github.com/pydata/xarray (push)
```sh
git fetch upstream --tags
```
This will return a list of all the contributors since the last release:
Then run
```sh
git log "$(git tag --sort=v:refname | tail -1).." --format=%aN | sort -u | perl -pe 's/\n/$1, /'
```
This will return the total number of contributors:
```sh
git log "$(git tag --sort=v:refname | tail -1).." --format=%aN | sort -u | wc -l
python ci/release_contributors.py
```
(needs `gitpython` and `toolz` / `cytoolz`)

and copy the output.
3. Write a release summary: ~50 words describing the high level features. This
will be used in the release emails, tweets, GitHub release notes, etc.
4. Look over whats-new.rst and the docs. Make sure "What's New" is complete
Expand Down
18 changes: 18 additions & 0 deletions asv_bench/benchmarks/coding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import numpy as np

import xarray as xr

from . import parameterized


@parameterized(["calendar"], [("standard", "noleap")])
class EncodeCFDatetime:
def setup(self, calendar):
self.units = "days since 2000-01-01"
self.dtype = np.dtype("int64")
self.times = xr.date_range(
"2000", freq="D", periods=10000, calendar=calendar
).values

def time_encode_cf_datetime(self, calendar):
xr.coding.times.encode_cf_datetime(self.times, self.units, calendar, self.dtype)
5 changes: 2 additions & 3 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ $conda remove -y numba numbagg sparse
# temporarily remove numexpr
$conda remove -y numexpr
# temporarily remove backends
$conda remove -y cf_units hdf5 h5py netcdf4 pydap
$conda remove -y pydap
# forcibly remove packages to avoid artifacts
$conda remove -y --force \
numpy \
Expand All @@ -37,8 +37,7 @@ python -m pip install \
numpy \
scipy \
matplotlib \
pandas \
h5py
pandas
# for some reason pandas depends on pyarrow already.
# Remove once a `pyarrow` version compiled with `numpy>=2.0` is on `conda-forge`
python -m pip install \
Expand Down
49 changes: 49 additions & 0 deletions ci/release_contributors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import re
import textwrap

import git
from tlz.itertoolz import last, unique

co_author_re = re.compile(r"Co-authored-by: (?P<name>[^<]+?) <(?P<email>.+)>")


def main():
repo = git.Repo(".")

most_recent_release = last(repo.tags)

# extract information from commits
contributors = {}
for commit in repo.iter_commits(f"{most_recent_release.name}.."):
matches = co_author_re.findall(commit.message)
if matches:
contributors.update({email: name for name, email in matches})
contributors[commit.author.email] = commit.author.name

# deduplicate and ignore
# TODO: extract ignores from .github/release.yml
ignored = ["dependabot", "pre-commit-ci"]
unique_contributors = unique(
contributor
for contributor in contributors.values()
if contributor.removesuffix("[bot]") not in ignored
)

sorted_ = sorted(unique_contributors)
if len(sorted_) > 1:
names = f"{', '.join(sorted_[:-1])} and {sorted_[-1]}"
else:
names = "".join(sorted_)

statement = textwrap.dedent(
f"""\
Thanks to the {len(sorted_)} contributors to this release:
{names}
""".rstrip()
)

print(statement)


if __name__ == "__main__":
main()
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
- pint>=0.22
Expand Down
3 changes: 2 additions & 1 deletion ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- bottleneck
- cartopy
- cfgrib
- kerchunk
- dask-core>=2022.1
- dask-expr
- hypothesis>=6.75.8
Expand All @@ -21,7 +22,7 @@ dependencies:
- nbsphinx
- netcdf4>=1.5
- numba
- numpy>=1.21,<2
- numpy>=2
- packaging>=21.3
- pandas>=1.4,!=2.1.0
- pooch
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
# - pint>=0.22
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies:
- numba
- numbagg
- numexpr
- numpy<2
- numpy
- opt_einsum
- packaging
- pandas
Expand Down
4 changes: 4 additions & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -693,3 +693,7 @@

coding.times.CFTimedeltaCoder
coding.times.CFDatetimeCoder

groupers.Grouper
groupers.Resampler
groupers.EncodedGroups
25 changes: 21 additions & 4 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ Dataset contents
Dataset.drop_duplicates
Dataset.drop_dims
Dataset.drop_encoding
Dataset.drop_attrs
Dataset.set_coords
Dataset.reset_coords
Dataset.convert_calendar
Expand Down Expand Up @@ -306,6 +307,7 @@ DataArray contents
DataArray.drop_indexes
DataArray.drop_duplicates
DataArray.drop_encoding
DataArray.drop_attrs
DataArray.reset_coords
DataArray.copy
DataArray.convert_calendar
Expand Down Expand Up @@ -801,6 +803,18 @@ DataArray
DataArrayGroupBy.dims
DataArrayGroupBy.groups

Grouper Objects
---------------

.. currentmodule:: xarray

.. autosummary::
:toctree: generated/

groupers.BinGrouper
groupers.UniqueGrouper
groupers.TimeResampler


Rolling objects
===============
Expand Down Expand Up @@ -1026,17 +1040,20 @@ DataArray
Accessors
=========

.. currentmodule:: xarray
.. currentmodule:: xarray.core

.. autosummary::
:toctree: generated/

core.accessor_dt.DatetimeAccessor
core.accessor_dt.TimedeltaAccessor
core.accessor_str.StringAccessor
accessor_dt.DatetimeAccessor
accessor_dt.TimedeltaAccessor
accessor_str.StringAccessor


Custom Indexes
==============
.. currentmodule:: xarray

.. autosummary::
:toctree: generated/

Expand Down
30 changes: 30 additions & 0 deletions doc/combined.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"version": 1,
"refs": {
".zgroup": "{\"zarr_format\":2}",
"foo/.zarray": "{\"chunks\":[4,5],\"compressor\":null,\"dtype\":\"<f8\",\"fill_value\":\"NaN\",\"filters\":null,\"order\":\"C\",\"shape\":[4,5],\"zarr_format\":2}",
"foo/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\",\"y\"],\"coordinates\":\"z\"}",
"foo/0.0": [
"saved_on_disk.h5",
8192,
160
],
"x/.zarray": "{\"chunks\":[4],\"compressor\":null,\"dtype\":\"<i8\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[4],\"zarr_format\":2}",
"x/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\"]}",
"x/0": [
"saved_on_disk.h5",
8352,
32
],
"y/.zarray": "{\"chunks\":[5],\"compressor\":null,\"dtype\":\"<i8\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[5],\"zarr_format\":2}",
"y/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"y\"],\"calendar\":\"proleptic_gregorian\",\"units\":\"days since 2000-01-01 00:00:00\"}",
"y/0": [
"saved_on_disk.h5",
8384,
40
],
"z/.zarray": "{\"chunks\":[4],\"compressor\":null,\"dtype\":\"|O\",\"fill_value\":null,\"filters\":[{\"allow_nan\":true,\"check_circular\":true,\"encoding\":\"utf-8\",\"ensure_ascii\":true,\"id\":\"json2\",\"indent\":null,\"separators\":[\",\",\":\"],\"skipkeys\":false,\"sort_keys\":true,\"strict\":true}],\"order\":\"C\",\"shape\":[4],\"zarr_format\":2}",
"z/0": "[\"a\",\"b\",\"c\",\"d\",\"|O\",[4]]",
"z/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\"]}"
}
}
6 changes: 5 additions & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,11 @@
extlinks = {
"issue": ("https://github.com/pydata/xarray/issues/%s", "GH%s"),
"pull": ("https://github.com/pydata/xarray/pull/%s", "PR%s"),
"discussion": ("https://github.com/pydata/xarray/discussions/%s", "D%s"),
}

# sphinx-copybutton configurations
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.{3,}: | {5,8}: "
copybutton_prompt_is_regexp = True

# nbsphinx configurations
Expand Down Expand Up @@ -158,6 +159,8 @@
"Variable": "~xarray.Variable",
"DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
"DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
"Grouper": "~xarray.groupers.Grouper",
"Resampler": "~xarray.groupers.Resampler",
# objects without namespace: numpy
"ndarray": "~numpy.ndarray",
"MaskedArray": "~numpy.ma.MaskedArray",
Expand All @@ -169,6 +172,7 @@
"CategoricalIndex": "~pandas.CategoricalIndex",
"TimedeltaIndex": "~pandas.TimedeltaIndex",
"DatetimeIndex": "~pandas.DatetimeIndex",
"IntervalIndex": "~pandas.IntervalIndex",
"Series": "~pandas.Series",
"DataFrame": "~pandas.DataFrame",
"Categorical": "~pandas.Categorical",
Expand Down
1 change: 1 addition & 0 deletions doc/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Geosciences
- `climpred <https://climpred.readthedocs.io>`_: Analysis of ensemble forecast models for climate prediction.
- `geocube <https://corteva.github.io/geocube>`_: Tool to convert geopandas vector data into rasterized xarray data.
- `GeoWombat <https://github.com/jgrss/geowombat>`_: Utilities for analysis of remotely sensed and gridded raster data at scale (easily tame Landsat, Sentinel, Quickbird, and PlanetScope).
- `grib2io <https://github.com/NOAA-MDL/grib2io>`_: Utility to work with GRIB2 files including an xarray backend, DASK support for parallel reading in open_mfdataset, lazy loading of data, editing of GRIB2 attributes and GRIB2IO DataArray attrs, and spatial interpolation and reprojection of GRIB2 messages and GRIB2IO Datasets/DataArrays for both grid to grid and grid to stations.
- `gsw-xarray <https://github.com/DocOtak/gsw-xarray>`_: a wrapper around `gsw <https://teos-10.github.io/GSW-Python>`_ that adds CF compliant attributes when possible, units, name.
- `infinite-diff <https://github.com/spencerahill/infinite-diff>`_: xarray-based finite-differencing, focused on gridded climate/meteorology data
- `marc_analysis <https://github.com/darothen/marc_analysis>`_: Analysis package for CESM/MARC experiments and output.
Expand Down
4 changes: 2 additions & 2 deletions doc/getting-started-guide/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,9 +352,9 @@ Some packages may have additional functionality beyond what is shown here. You c
How does xarray handle missing values?
--------------------------------------

**xarray can handle missing values using ``np.NaN``**
**xarray can handle missing values using ``np.nan``**

- ``np.NaN`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.NaN`` is a constant value in NumPy that represents "Not a Number" or missing values.
- ``np.nan`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.nan`` is a constant value in NumPy that represents "Not a Number" or missing values.

- Most of xarray's computation methods are designed to automatically handle missing values appropriately.

Expand Down
4 changes: 2 additions & 2 deletions doc/getting-started-guide/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Required dependencies

- Python (3.9 or later)
- `numpy <https://www.numpy.org/>`__ (1.23 or later)
- `packaging <https://packaging.pypa.io/en/latest/#>`__ (22 or later)
- `pandas <https://pandas.pydata.org/>`__ (1.5 or later)
- `packaging <https://packaging.pypa.io/en/latest/#>`__ (23.1 or later)
- `pandas <https://pandas.pydata.org/>`__ (2.0 or later)

.. _optional-dependencies:

Expand Down
2 changes: 1 addition & 1 deletion doc/internals/how-to-add-new-backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ How to add a new backend
------------------------

Adding a new backend for read support to Xarray does not require
to integrate any code in Xarray; all you need to do is:
one to integrate any code in Xarray; all you need to do is:

- Create a class that inherits from Xarray :py:class:`~xarray.backends.BackendEntrypoint`
and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint`
Expand Down
4 changes: 2 additions & 2 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ However, the functions also take missing values in the data into account:

.. ipython:: python
data = xr.DataArray([np.NaN, 2, 4])
data = xr.DataArray([np.nan, 2, 4])
weights = xr.DataArray([8, 1, 1])
data.weighted(weights).mean()
Expand All @@ -444,7 +444,7 @@ If the weights add up to to 0, ``sum`` returns 0:
data.weighted(weights).sum()
and ``mean``, ``std`` and ``var`` return ``NaN``:
and ``mean``, ``std`` and ``var`` return ``nan``:

.. ipython:: python
Expand Down
16 changes: 16 additions & 0 deletions doc/user-guide/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,12 @@ loaded into Dask or not:
Automatic parallelization with ``apply_ufunc`` and ``map_blocks``
-----------------------------------------------------------------

.. tip::

Some problems can become embarassingly parallel and thus easy to parallelize
automatically by rechunking to a frequency, e.g. ``ds.chunk(time=TimeResampler("YE"))``.
See :py:meth:`Dataset.chunk` for more.

Almost all of xarray's built-in operations work on Dask arrays. If you want to
use a function that isn't wrapped by xarray, and have it applied in parallel on
each block of your xarray object, you have three options:
Expand Down Expand Up @@ -551,6 +557,16 @@ larger chunksizes.

Check out the `dask documentation on chunks <https://docs.dask.org/en/latest/array-chunks.html>`_.

.. tip::

Many time domain problems become amenable to an embarassingly parallel or blockwise solution
(e.g. using :py:func:`xarray.map_blocks`, :py:func:`dask.array.map_blocks`, or
:py:func:`dask.array.blockwise`) by rechunking to a frequency along the time dimension.
Provide :py:class:`xarray.groupers.TimeResampler` objects to :py:meth:`Dataset.chunk` to do so.
For example ``ds.chunk(time=TimeResampler("MS"))`` will set the chunks so that a month of
data is contained in one chunk. The resulting chunk sizes need not be uniform, depending on
the frequency of the data, and the calendar.


Optimization Tips
-----------------
Expand Down
Loading

0 comments on commit 4db0c4e

Please sign in to comment.