Skip to content

Commit

Permalink
Merge pull request #248 from Ouranosinc/packaging-fixes
Browse files Browse the repository at this point in the history
Packaging fixes
  • Loading branch information
RondeauG authored Sep 13, 2023
2 parents d2d261a + 7d92329 commit 6e893a4
Show file tree
Hide file tree
Showing 37 changed files with 694 additions and 34,795 deletions.
2 changes: 1 addition & 1 deletion .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ formats:
build:
os: ubuntu-22.04
tools:
python: "mambaforge-4.10"
python: "mambaforge-22.9"
jobs:
post_create_environment:
- pip install . --no-deps
Expand Down
6 changes: 6 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ Internal changes
^^^^^^^^^^^^^^^^
* Fixed pre-commit's pretty-format-json so it ignores notebooks. (:pull:`254`).
* Fixed the labeler so docs/CI isn't automatically added for contributions by new collaborators. (:pull:`254`).
* Made it so that `tests` are no longer treated as an installable package. (:pull:`248`).
* Renamed the pytest marker from `requires_docs` to `requires_netcdf`. (:pull:`248`).
* Included the documentation in the source distribution, while excluding the NetCDF files. (:pull:`248`).
* Reduced the size of the files in /docs/notebooks/samples and changed the Notebooks and tests accordingly. (:issue:`247`, :pull:`248`).
* Added a new `xscen.testing` module with the `datablock_3d` function previously located in `/tests/conftest.py`. (:pull:`248`).
* New function `xscen.testing.fake_data` to generate fake data for testing. (:pull:`248`).

v0.7.1 (2023-08-23)
-------------------
Expand Down
16 changes: 4 additions & 12 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,19 @@ include CONTRIBUTING.rst
include HISTORY.rst
include LICENSE
include README.rst
include requirements_dev.txt
include requirements_docs.txt

recursive-include xscen *.json *.yml *.py *.csv
recursive-include tests *
recursive-include docs notebooks *.rst conf.py Makefile make.bat *.jpg *.png *.gif *.ipynb *.csv *.json *.yml *.md
recursive-include docs notebooks samples *.csv *.json

recursive-exclude * __pycache__
recursive-exclude * *.py[co]
recursive-exclude docs notebooks *.rst conf.py Makefile make.bat *.jpg *.png *.gif *.ipynb *.csv *.json *.yml *.md
recursive-exclude docs notebooks samples *.csv *.json
recursive-exclude conda *.yml
recursive-exclude templates *.csv *.json *.py *.yml
recursive-exclude docs notebooks samples tutorial *.nc

exclude .cruft.json
exclude .editorconfig
exclude .gitlab-ci.yml
exclude .gitmodules
exclude .pre-commit-config.yaml
exclude .readthedocs.yml
exclude .secrets.baseline
exclude .yamllint.yaml
exclude .*
exclude Makefile
exclude environment.yml
exclude environment-dev.yml
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,6 @@ This package was created with Cookiecutter_ and the `Ouranosinc/cookiecutter-pyp
:target: https://pypi.python.org/pypi/xscen
:alt: Supported Python Versions

.. |status| image:: https://www.repostatus.org/badges/latest/wip.svg
:target: https://www.repostatus.org/#wip
:alt: Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
.. |status| image:: https://www.repostatus.org/badges/latest/active.svg
:target: https://www.repostatus.org/#active
:alt: Project Status: Active The project has reached a stable, usable state and is being actively developed.
112 changes: 84 additions & 28 deletions docs/notebooks/1_catalog.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,11 @@
"\n",
"from xscen import DataCatalog, ProjectCatalog\n",
"\n",
"# Prepare a dummy folder where data will be put\n",
"output_folder = Path().absolute() / \"_data\"\n",
"output_folder.mkdir(exist_ok=True)\n",
"\n",
"\n",
"DC = DataCatalog(f\"{Path().absolute()}/samples/pangeo-cmip6.json\")\n",
"\n",
"DC"
]
},
Expand All @@ -96,7 +95,7 @@
"outputs": [],
"source": [
"# Access the catalog\n",
"DC.df"
"DC.df[0:3]"
]
},
{
Expand Down Expand Up @@ -173,7 +172,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Regex: Find all entries that start with \"rcp\"\n",
"# Regex: Find all entries that start with \"ssp\"\n",
"print(DC.search(experiment=\"^ssp\").unique(\"experiment\"))"
]
},
Expand All @@ -195,8 +194,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Regex: Find all experiments except the exact string \"ssp245\"\n",
"print(DC.search(experiment=\"^(?!ssp245$).*$\").unique(\"experiment\"))"
"# Regex: Find all experiments except the exact string \"ssp126\"\n",
"print(DC.search(experiment=\"^(?!ssp126$).*$\").unique(\"experiment\"))"
]
},
{
Expand Down Expand Up @@ -257,12 +256,12 @@
"- `restrict_members` is used to limit the results to a maximum number of realizations for each source.\n",
"- `restrict_warming_level` is used to limit the results to only datasets that are present in the csv used for calculating warming levels.\n",
"\n",
"Note that compared to `search`, the result of `search_data_catalog` is a dictionary with one entry per unique ID. A given unique ID might contain multiple datasets as per `intake-esm`'s definition, because it groups catalog lines per *id - domain - processing_level - xrfreq*. Thus, it separates model data that exists at different frequencies.\n",
"Note that compared to `search`, the result of `search_data_catalog` is a dictionary with one entry per unique ID. A given unique ID might contain multiple datasets as per `intake-esm`'s definition, because it groups catalog lines per *id - domain - processing_level - xrfreq*. Thus, it would separate model data that exists at different frequencies.\n",
"\n",
"\n",
"#### Example 1: Simple dataset\n",
"#### Example 1: Multiple variables and frequencies + Historical and future\n",
"\n",
"Let's start by searching for CMIP6 data that has subdaily precipitation, daily temperature and the land fraction data. The main difference compared to searching for reference datasets is that in most cases, `match_hist_and_fut` will be required to match *historical* simulations to their future counterparts. This works for both CMIP5 and CMIP6 nomenclatures."
"Let's start by searching for CMIP6 data that has subdaily precipitation, daily minimum temperature and the land fraction data. The main difference compared to searching for reference datasets is that in most cases, `match_hist_and_fut` will be required to match *historical* simulations to their future counterparts. This works for both CMIP5 and CMIP6 nomenclatures."
]
},
{
Expand All @@ -276,8 +275,8 @@
"source": [
"import xscen as xs\n",
"\n",
"variables_and_freqs = {\"tas\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
"other_search_criteria = {\"institution\": [\"NOAA-GFDL\"], \"experiment\": [\"ssp585\"]}\n",
"variables_and_freqs = {\"tasmin\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
"other_search_criteria = {\"institution\": [\"NOAA-GFDL\"]}\n",
"\n",
"cat_sim = xs.search_data_catalogs(\n",
" data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
Expand All @@ -289,12 +288,34 @@
"cat_sim"
]
},
{
"cell_type": "markdown",
"id": "82535e6c",
"metadata": {},
"source": [
"If required, at this stage, a dataset can be looked at in more details. If we examine the results (look at the 'date_start' and 'date_end' columns), we'll see that it successfully found historical simulations in the *CMIP* activity and renamed both their *activity* and *experiment* to match the future simulations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6e5bd7e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"cat_sim[\"ScenarioMIP_NOAA-GFDL_GFDL-CM4_ssp585_r1i1p1f1_gr1\"].df"
]
},
{
"cell_type": "markdown",
"id": "85ee34fe",
"metadata": {},
"source": [
"Two simulations correspond to the search criteria, but as can be seen from the results, it is the same simulation on 2 different grids (`gr1` and `gr2`). If desired, `restrict_resolution` can be called to choose the finest or coarsest grid in such cases."
"#### Example 2: Restricting results\n",
"\n",
"The two previous search results were the same simulation, but on 2 different grids (`gr1` and `gr2`). If desired, `restrict_resolution` can be called to choose the finest or coarsest grid."
]
},
{
Expand All @@ -306,7 +327,7 @@
},
"outputs": [],
"source": [
"variables_and_freqs = {\"tas\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
"variables_and_freqs = {\"tasmin\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
"other_search_criteria = {\"institution\": [\"NOAA-GFDL\"], \"experiment\": [\"ssp585\"]}\n",
"\n",
"cat_sim = xs.search_data_catalogs(\n",
Expand All @@ -322,30 +343,65 @@
},
{
"cell_type": "markdown",
"id": "82535e6c",
"id": "fcd847c0-0ea8-46ad-bc28-9b73edd627bc",
"metadata": {},
"source": [
"If required, at this stage a dataset can be looked at in more details. If we examine the results (look at the 'date_start' and 'date_end' columns), we'll see that it successfully found historical simulations in the *CMIP* activity and renamed both their *activity* and *experiment* to match the future simulations."
"Similarly, if we search for historical NorESM2-MM data, we'll find that it has 3 members. If desired, `restrict_members` can be called to choose a maximum number of realization per model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6e5bd7e",
"metadata": {
"tags": []
},
"id": "6c5ce0fc-2f25-4b55-bf65-5f140f07e331",
"metadata": {},
"outputs": [],
"source": [
"cat_sim[\"ScenarioMIP_NOAA-GFDL_GFDL-CM4_ssp585_r1i1p1f1_gr1\"].df"
"variables_and_freqs = {\"tasmin\": \"D\"}\n",
"other_search_criteria = {\"source\": [\"NorESM2-MM\"], \"experiment\": [\"historical\"]}\n",
"\n",
"cat_sim = xs.search_data_catalogs(\n",
" data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
" variables_and_freqs=variables_and_freqs,\n",
" other_search_criteria=other_search_criteria,\n",
" restrict_members={\"ordered\": 2},\n",
")\n",
"\n",
"cat_sim"
]
},
{
"cell_type": "markdown",
"id": "4fd28f58-5ab7-4d65-8906-2197592c8c94",
"metadata": {},
"source": [
"Finally, `restrict_warming_level` can be used to be sure that the results either exist in `xscen`'s warming level database (if a boolean), or reach a given warming level."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3300b1d7-e37b-4aa4-991e-99609bb1adea",
"metadata": {},
"outputs": [],
"source": [
"variables_and_freqs = {\"tasmin\": \"D\"}\n",
"\n",
"cat_sim = xs.search_data_catalogs(\n",
" data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
" variables_and_freqs=variables_and_freqs,\n",
" match_hist_and_fut=True,\n",
" restrict_warming_level=True, # In this case all models exist in our database, so nothing gets eliminated.\n",
")\n",
"\n",
"cat_sim"
]
},
{
"cell_type": "markdown",
"id": "6bddc58d",
"metadata": {},
"source": [
"#### Example 2: Advanced search\n",
"#### Example 3: Search for data that can be computed from what's available\n",
"\n",
"`allow_resampling` and `allow_conversion` are powerful search tools to find data that doesn't explicitely exist in the catalog, but that can easily be computed."
]
Expand Down Expand Up @@ -375,7 +431,7 @@
"id": "b33b2ad7",
"metadata": {},
"source": [
"If we examine the SSP5-8.5 results, we'll see that while it failed to find *evspsblpot*, it successfully understood that *tasmin* and *tasmax* can be used to compute it. It also understood that daily *tas* is a valid search result for `{tas: YS}`, since it can be aggregated."
"If we examine the SSP5-8.5 results, we'll see that while it failed to find *evspsblpot*, it successfully understood that *tasmin* and *tasmax* can be used to compute it. It also understood that daily *tasmin* and *tasmax* is a valid search result for `{tas: YS}`, since it can be computed first, then aggregated to a yearly frequency."
]
},
{
Expand Down Expand Up @@ -413,15 +469,15 @@
" other_search_criteria={\n",
" \"source\": [\"NorESM2-MM\"],\n",
" \"processing_level\": [\"raw\"],\n",
" \"experiment\": [\"ssp370\"],\n",
" \"experiment\": [\"ssp585\"],\n",
" },\n",
" match_hist_and_fut=True,\n",
" allow_resampling=True,\n",
" allow_conversion=True,\n",
")\n",
"print(\n",
" cat_sim_adv_multifreq[\n",
" \"ScenarioMIP_NCC_NorESM2-MM_ssp370_r1i1p1f1_gn\"\n",
" \"ScenarioMIP_NCC_NorESM2-MM_ssp585_r1i1p1f1_gn\"\n",
" ]._requested_variable_freqs\n",
")"
]
Expand All @@ -435,7 +491,7 @@
"\n",
"The `allow_conversion` argument is built upon `xclim`'s virtual indicators module and `intake-esm`'s [DerivedVariableRegistry](https://ncar.github.io/esds/posts/2021/intake-esm-derived-variables/) in a way that should be seamless to the user. It works by using the methods defined in `xscen/xclim_modules/conversions.yml` to add a registry of *derived* variables that exist virtually through computation methods.\n",
"\n",
"In the example above, we can see that the search failed to find *evspsblpot* within *NORESM2-MM*, but understood that *tasmin* and *tasmax* could be used to estimate it using `xclim`'s `potential_evapotranspiration`.\n",
"In the example above, we can see that the search failed to find *evspsblpot* within *NorESM2-MM*, but understood that *tasmin* and *tasmax* could be used to estimate it using `xclim`'s `potential_evapotranspiration`.\n",
"\n",
"Most use cases should already be covered by the aforementioned file. The preferred way to add new methods is to [submit a new indicator to xclim](https://xclim.readthedocs.io/en/stable/contributing.html), and then to add a call to that indicator in `conversions.yml`. In the case where this is not possible or where the transformation would be out of scope for `xclim`, the calculation can be implemented into `xscen/xclim_modules/conversions.py` instead.\n",
"\n",
Expand Down Expand Up @@ -678,7 +734,7 @@
"outputs": [],
"source": [
"# Create fake files for the example:\n",
"root = Path(\".\").absolute() / \"parser_examples\"\n",
"root = Path(\".\").absolute() / \"_data\" / \"parser_examples\"\n",
"root.mkdir(exist_ok=True)\n",
"\n",
"paths = [\n",
Expand Down Expand Up @@ -1006,7 +1062,7 @@
"import shutil as sh\n",
"\n",
"# Create the destination folder\n",
"root = Path(\".\").absolute() / \"path_builder_examples\"\n",
"root = Path(\".\").absolute() / \"_data\" / \"path_builder_examples\"\n",
"root.mkdir(exist_ok=True)\n",
"\n",
"# Get new names:\n",
Expand Down Expand Up @@ -1044,7 +1100,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 6e893a4

Please sign in to comment.