Skip to content

Commit

Permalink
Use pooch (#1889)
Browse files Browse the repository at this point in the history
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
- This PR relies on changes to be merged in
Ouranosinc/xclim-testdata#29
- [x] Tests for the changes have been added (for bug fixes / features)
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] CHANGELOG.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?

* Replaces the logic for file gathering and caching from the in-house
developed version to instead use `pooch`.
  * In order to fetch testing data, one can now use the following:
  ```python
  from xclim.testing.utils import nimbus

  n = nimbus()
# from a fork of xclim-testdata:
nimbus(repo="https://github.com/Me/My_Repo", branch="my_test_branch")
  file = n.fetch("some_folder/some_data.nc")
  ```
* Removes the remote GitHub calls for every file request (which was
performed by `_get()`).
* Exports most of the file request and cache handling to `pooch`, while
maintaining a relatively unchanged API for users.
* (To be confirmed) Speeds up the delivery of test data to tests by
reducing the amount of redundant calls to fixtures and relying on a
single pooch instance of pooch to prevent multiple setup stages.

### Does this PR introduce a breaking change?

Absolutely. `get_file` and `open_dataset` no longer fetch remote files
from GitHub. Instead, a locally-stored `registry.txt` file contains all
the checksums of all files needed to run the tests and returns the
appropriate file from a locally-held cache. If the file checksum does
not match the expected value, it will attempt to replace it from the
remote storage.

Additionally, the `md5` files that accompanied all testing data files
are now obsolete thanks to the use of the registry. The testing data is
now versioned according to the `xclim-testdata` version/tag.

All the `prefetch` logic baked into the `pytest` calls has been removed,
making the setup code much easier to follow. There is no longer a need
to run `$ xclim prefetch_testing_data` unless users are running on
Windows (for the very first run of `pytest` only).

There are now three environment variables to help developers:
- XCLIM_TESTDATA_BRANCH
    - Controls the branch name of `xclim-testdata`.
- XCLIM_TESTDATA_CACHE_DIR
    - Controls the local folder to be used when fetching the test data.
- XCLIM_TESTDATA_REPO_URL
    - Controls the repository URL for `xclim-testdata` (for forks) 

`platformdirs` is no longer a hard dependency. The default cache
directory will only be determined if `pooch` is installed.

### Other information:

There is still a lot of potential here to tighten this up; I'd like to
land on a design that is clean and easily portable to other projects.

What is unchanged is that `pytest` will still do the following on every
run:
1. Check that a locally stored copy of the test data exists in a
platform-dependent default location and, if not found, will fetch a
copy.
2. Each worker of `pytest` creates its own copy of the test data, which
is delivered by its own `pooch` instance, written to a threadsafe
temporary directory
3. The equivalent to the `get_file()` fixture is now `nimbus.fetch()`,
providing the absolute paths to files, respective of platform and
workers.

Many tests related to testing the file accessors have also been removed
(as these are now out of scope).
  • Loading branch information
Zeitsperre authored Aug 28, 2024
2 parents d434dac + 5b589f5 commit c3045b1
Show file tree
Hide file tree
Showing 26 changed files with 773 additions and 838 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ on:
- submitted

env:
XCLIM_TESTDATA_BRANCH: v2023.12.14
XCLIM_TESTDATA_BRANCH: v2024.8.23

concurrency:
# For a given workflow, if we push to the same branch, cancel all previous builds on that branch except on main.
Expand Down
19 changes: 17 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,29 @@
Changelog
=========

v0.53.0
v0.53.0 (unreleased)
--------------------
Contributors to this version: Adrien Lamarche (:user:`LamAdr`).
Contributors to this version: Adrien Lamarche (:user:`LamAdr`), Trevor James Smith (:user:`Zeitsperre`).

Bug fixes
^^^^^^^^^
* Fixed a small inefficiency in ``_otc_adjust`` (:pull:`1890`).

Breaking changes
^^^^^^^^^^^^^^^^
* `platformdirs` is no longer a direct dependency of `xclim`, but `pooch` is required to use many of the new testing functions (installable via `pip install pooch` or `pip install 'xclim[dev]'`). (:pull:`1889`).

Internal changes
^^^^^^^^^^^^^^^^
* The `Ouranosinc/xclim-testdata` repository has been restructured for better organization and to make better use of `pooch` and data registries for testing data fetching (see: `xclim-testdata PR/29 <https://github.com/Ouranosinc/xclim-testdata/pull/29>`_). (:pull:`1889`).
* The ``xclim.testing`` module has been refactored to make use of `pooch` with file registries. Several testing functions have been removed as a result: (:pull:`1889`)
* ``xclim.testing.utils.open_dataset`` now uses a `pooch` instance to deliver locally-stored datasets. Its call signature has also changed.
* ``xclim`` now accepts more environment variables to control the behaviour of the testing setup functions. These include ``XCLIM_TESTDATA_BRANCH``, ``XCLIM_TESTDATA_REPO_URL``, and ``XCLIM_TESTDATA_CACHE_DIR``.
* ``xclim.testing.utils.get_file``, ``xclim.testing.utils.get_local_testdata``, ``xclim.testing.utils.list_datasets``, and ``xclim.testing.utils.file_md5_checksum`` have been removed.
* ``xclim.testing.utils.nimbus`` replaces much of this functionality. See the `xclim` documentation for more information.
* Many tests focused on evaluating the normal operation of remote file access tools under ``xclim.testing`` have been removed. (:pull:`1889`).
* Setup and teardown functions that were found under ``tests/conftest.py`` have been optimized to reduce redundant calls when running ``pytest xclim``. Some obsolete `pytest` fixtures have also been removed.(:pull:`1889`).

v0.52.0 (2024-08-08)
--------------------
Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Hui-Min Wang (:user:`Hem-W`), Éric Dupuis (:user:`coxipi`), Sarah Gammon (:user:`SarahG-579462`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`), Adrien Lamarche (:user:`LamAdr`).
Expand Down
17 changes: 9 additions & 8 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,21 +269,22 @@ Updating Testing Data

If your code changes require changes to the testing data of `xclim` (i.e.: modifications to existing datasets or new datasets), these changes must be made via a Pull Request at the `xclim-testdata repository`_.

`xclim` allows for developers to test specific branches/versions of `xclim-testdata` via the `XCLIM_TESTDATA_BRANCH` environment variable, either through export, e.g.::
`xclim` allows for developers to test specific branches/versions or forks of the `xclim-testdata` repository via the `XCLIM_TESTDATA_BRANCH` and `XCLIM_TESTDATA_REPO` environment variables, respectively, either through export, e.g.::

$ export XCLIM_TESTDATA_BRANCH="my_new_branch_of_testing_data"
$ export XCLIM_TESTDATA_REPO="https://github.com/my_username/xclim-testdata"

$ pytest
# or, alternatively:
$ tox

or by setting the variable at runtime::

$ env XCLIM_TESTDATA_BRANCH="my_new_branch_of_testing_data" pytest
$ env XCLIM_TESTDATA_BRANCH="my_new_branch_of_testing_data" XCLIM_TESTDATA_REPO="https://github.com/my_username/xclim-testdata" pytest
# or, alternatively:
$ env XCLIM_TESTDATA_BRANCH="my_new_branch_of_testing_data" tox
$ env XCLIM_TESTDATA_BRANCH="my_new_branch_of_testing_data" XCLIM_TESTDATA_REPO="https://github.com/my_username/xclim-testdata" tox

This will ensure that tests load the testing data from this branch before running.
This will ensure that tests load the appropriate testing data from this branch or repository before running.

If you anticipate not having internet access, we suggest prefetching the testing data from `xclim-testdata repository`_ and storing it in your local cache. This can be done by running the following console command::

Expand All @@ -296,7 +297,7 @@ If your development branch relies on a specific branch of `Ouranosinc/xclim-test

or, alternatively, with the `--branch` option::

$ xclim prefetch_testing_data --branch my_new_branch_of_testing_data
$ xclim prefetch_testing_data --branch my_new_branch_of_testing_data --repo "https://github.com/my_username/xclim-testdata"

If you wish to test a specific branch using GitHub CI, this can be set in `.github/workflows/main.yml`:

Expand All @@ -306,7 +307,7 @@ If you wish to test a specific branch using GitHub CI, this can be set in `.gith
XCLIM_TESTDATA_BRANCH: my_new_branch_of_testing_data
.. warning::
In order for a Pull Request to be allowed to merge to main development branch, this variable must match the latest tagged commit name on `xclim-testdata repository`_.
In order for a Pull Request to be allowed to merge to the `main` development branch, this variable must match the latest tagged commit name on `xclim-testdata repository`_.
We suggest merging changed testing data first, tagging a new version of `xclim-testdata`, then re-running tests on your Pull Request at `Ouranosinc/xclim` with the newest tag.

Running Tests in Offline Mode
Expand All @@ -323,8 +324,8 @@ or, alternatively, using `tox` ::

$ tox -e offline

These options will disable all network calls and skip tests marked with the `requires_internet` marker.
The `--allow-unix-socket` option is required to allow the `pytest-xdist`_ plugin to function properly.
These options will disable all network calls and skip tests marked with the ``requires_internet`` marker.
The ``--allow-unix-socket`` option is required to allow the `pytest-xdist`_ plugin to function properly.

Tips
----
Expand Down
6 changes: 3 additions & 3 deletions docs/notebooks/analogs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
"from __future__ import annotations\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from xarray.coding.calendar_ops import convert_calendar\n",
"\n",
"from xclim import analog\n",
"from xclim.core.calendar import convert_calendar\n",
"from xclim.testing import open_dataset"
]
},
Expand Down Expand Up @@ -105,7 +105,7 @@
"outputs": [],
"source": [
"fig, axs = plt.subplots(nrows=3, figsize=(6, 6), sharex=True)\n",
"sim_std = convert_calendar(sim, \"default\")\n",
"sim_std = convert_calendar(sim, \"standard\")\n",
"obs_chibou = obs.sel(lat=sim.lat, lon=sim.lon, method=\"nearest\")\n",
"\n",
"for ax, var in zip(axs, obs_chibou.data_vars.keys()):\n",
Expand Down Expand Up @@ -258,7 +258,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
2 changes: 0 additions & 2 deletions docs/notebooks/cli.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,6 @@
"metadata": {},
"outputs": [],
"source": [
"from __future__ import annotations\n",
"\n",
"import warnings\n",
"\n",
"import numpy as np\n",
Expand Down
3 changes: 1 addition & 2 deletions docs/notebooks/customize.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@
"\n",
"import xarray as xr\n",
"\n",
"import xclim\n",
"from xclim.testing import open_dataset"
"import xclim"
]
},
{
Expand Down
6 changes: 2 additions & 4 deletions docs/notebooks/ensembles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -155,8 +155,6 @@
},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import xarray as xr\n",
"\n",
"# Set display to HTML style (for fancy output)\n",
Expand All @@ -165,10 +163,10 @@
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline\n",
"\n",
"from xclim import ensembles\n",
"\n",
"%matplotlib inline\n",
"\n",
"ens = ensembles.create_ensemble(data_folder.glob(\"ens_tas_m*.nc\")).load()\n",
"ens.close()"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/extendxclim.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -599,7 +599,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.12.5"
},
"toc": {
"base_numbering": 1,
Expand Down
7 changes: 3 additions & 4 deletions docs/notebooks/sdba-advanced.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
"from __future__ import annotations\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import nc_time_axis\n",
"import numpy as np\n",
"import xarray as xr\n",
"\n",
Expand Down Expand Up @@ -429,8 +428,9 @@
"metadata": {},
"outputs": [],
"source": [
"from xarray.coding.calendar_ops import convert_calendar\n",
"\n",
"import xclim.sdba as sdba\n",
"from xclim.core.calendar import convert_calendar\n",
"from xclim.core.units import convert_units_to\n",
"from xclim.testing import open_dataset\n",
"\n",
Expand Down Expand Up @@ -751,7 +751,6 @@
"source": [
"from matplotlib import pyplot as plt\n",
"\n",
"import xclim as xc\n",
"from xclim import sdba\n",
"from xclim.testing import open_dataset\n",
"\n",
Expand Down Expand Up @@ -880,7 +879,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.12.5"
},
"toc": {
"base_numbering": 1,
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/sdba.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.12.5"
},
"toc": {
"base_numbering": 1,
Expand Down
1 change: 0 additions & 1 deletion docs/notebooks/units.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
"import xarray as xr\n",
"\n",
"import xclim\n",
"from xclim import indices\n",
"from xclim.core import units\n",
"from xclim.testing import open_dataset\n",
"\n",
Expand Down
11 changes: 3 additions & 8 deletions docs/notebooks/usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"import xarray as xr\n",
"\n",
"import xclim.indices\n",
"from xclim import testing"
"from xclim.testing import open_dataset"
]
},
{
Expand All @@ -48,7 +48,7 @@
"# ds = xr.open_dataset(\"your_file.nc\")\n",
"\n",
"# For this example, let's use a test dataset from xclim:\n",
"ds = testing.open_dataset(\"ERA5/daily_surface_cancities_1990-1993.nc\")\n",
"ds = open_dataset(\"ERA5/daily_surface_cancities_1990-1993.nc\")\n",
"ds.tas"
]
},
Expand Down Expand Up @@ -164,11 +164,6 @@
"Resampling to a daily frequency and running the same indicator succeeds, but we will still get warnings from the CF metadata checks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -387,7 +382,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
7 changes: 3 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ dependencies:
- flox
- lmoments3 # Required for some Jupyter notebooks
# Testing and development dependencies
- black ==24.4.2
- black ==24.8.0
- blackdoc ==0.3.9
- bump-my-version >=0.24.3
- bump-my-version >=0.25.4
- cairosvg
- codespell ==2.3.0
- coverage >=7.5.0
Expand All @@ -54,7 +54,6 @@ dependencies:
- nc-time-axis >=1.4.1
- notebook
- pandas-stubs >=2.2
- platformdirs >=3.2
- pooch >=1.8.0
- pre-commit >=3.7
- pybtex >=0.24.0
Expand All @@ -74,7 +73,7 @@ dependencies:
- tokenize-rt >=5.2.0
- tox >=4.16.0
# - tox-conda # Will be added when a [email protected]+ compatible plugin is released.
- vulture # ==2.11 # The conda-forge version is out of date.
- vulture ==2.11
- xdoctest >=1.1.5
- yamllint >=1.35.1
- pip >=24.0
Expand Down
2 changes: 0 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ dependencies = [
"packaging >=24.0",
"pandas >=2.2",
"pint >=0.18",
"platformdirs >=3.2",
"pyarrow >=15.0.0", # Strongly encouraged for pandas v2.2.0+
"pyyaml >=6.0.1",
"scikit-learn >=0.21.3",
Expand Down Expand Up @@ -79,7 +78,6 @@ dev = [
"nbval >=0.11.0",
"pandas-stubs >=2.2",
"pip >=24.0",
"platformdirs >=3.2",
"pooch >=1.8.0",
"pre-commit >=3.7",
"pylint >=3.2.4",
Expand Down
Loading

0 comments on commit c3045b1

Please sign in to comment.