Merge pull request #248 from Ouranosinc/packaging-fixes

Packaging fixes
Ouranosinc · Sep 13, 2023 · 6e893a4 · 6e893a4
2 parents d2d261a + 7d92329
commit 6e893a4
Show file tree

Hide file tree

Showing 37 changed files with 694 additions and 34,795 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -10,7 +10,7 @@ formats:
 build:
   os: ubuntu-22.04
   tools:
-    python: "mambaforge-4.10"
+    python: "mambaforge-22.9"
   jobs:
     post_create_environment:
       - pip install . --no-deps

diff --git a/HISTORY.rst b/HISTORY.rst
@@ -26,6 +26,12 @@ Internal changes
 ^^^^^^^^^^^^^^^^
 * Fixed pre-commit's pretty-format-json so it ignores notebooks. (:pull:`254`).
 * Fixed the labeler so docs/CI isn't automatically added for contributions by new collaborators. (:pull:`254`).
+* Made it so that `tests` are no longer treated as an installable package. (:pull:`248`).
+* Renamed the pytest marker from `requires_docs` to `requires_netcdf`. (:pull:`248`).
+* Included the documentation in the source distribution, while excluding the NetCDF files. (:pull:`248`).
+* Reduced the size of the files in /docs/notebooks/samples and changed the Notebooks and tests accordingly. (:issue:`247`, :pull:`248`).
+* Added a new `xscen.testing` module with the `datablock_3d` function previously located in `/tests/conftest.py`. (:pull:`248`).
+* New function `xscen.testing.fake_data` to generate fake data for testing. (:pull:`248`).
 
 v0.7.1 (2023-08-23)
 -------------------

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -3,27 +3,19 @@ include CONTRIBUTING.rst
 include HISTORY.rst
 include LICENSE
 include README.rst
-include requirements_dev.txt
-include requirements_docs.txt
 
 recursive-include xscen *.json *.yml *.py *.csv
 recursive-include tests *
+recursive-include docs notebooks *.rst conf.py Makefile make.bat *.jpg *.png *.gif *.ipynb *.csv *.json *.yml *.md
+recursive-include docs notebooks samples *.csv *.json
 
 recursive-exclude * __pycache__
 recursive-exclude * *.py[co]
-recursive-exclude docs notebooks *.rst conf.py Makefile make.bat *.jpg *.png *.gif *.ipynb *.csv *.json *.yml *.md
-recursive-exclude docs notebooks samples *.csv *.json
 recursive-exclude conda *.yml
 recursive-exclude templates *.csv *.json *.py *.yml
+recursive-exclude docs notebooks samples tutorial *.nc
 
-exclude .cruft.json
-exclude .editorconfig
-exclude .gitlab-ci.yml
-exclude .gitmodules
-exclude .pre-commit-config.yaml
-exclude .readthedocs.yml
-exclude .secrets.baseline
-exclude .yamllint.yaml
+exclude .*
 exclude Makefile
 exclude environment.yml
 exclude environment-dev.yml

diff --git a/README.rst b/README.rst
@@ -70,6 +70,6 @@ This package was created with Cookiecutter_ and the `Ouranosinc/cookiecutter-pyp
         :target: https://pypi.python.org/pypi/xscen
         :alt: Supported Python Versions
 
-.. |status| image:: https://www.repostatus.org/badges/latest/wip.svg
-        :target: https://www.repostatus.org/#wip
-        :alt: Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
+.. |status| image:: https://www.repostatus.org/badges/latest/active.svg
+        :target: https://www.repostatus.org/#active
+        :alt: Project Status: Active  The project has reached a stable, usable state and is being actively developed.
diff --git a/docs/notebooks/1_catalog.ipynb b/docs/notebooks/1_catalog.ipynb
@@ -71,12 +71,11 @@
     "\n",
     "from xscen import DataCatalog, ProjectCatalog\n",
     "\n",
+    "# Prepare a dummy folder where data will be put\n",
     "output_folder = Path().absolute() / \"_data\"\n",
     "output_folder.mkdir(exist_ok=True)\n",
     "\n",
-    "\n",
     "DC = DataCatalog(f\"{Path().absolute()}/samples/pangeo-cmip6.json\")\n",
-    "\n",
     "DC"
    ]
   },
@@ -96,7 +95,7 @@
    "outputs": [],
    "source": [
     "# Access the catalog\n",
-    "DC.df"
+    "DC.df[0:3]"
    ]
   },
   {
@@ -173,7 +172,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Regex: Find all entries that start with \"rcp\"\n",
+    "# Regex: Find all entries that start with \"ssp\"\n",
     "print(DC.search(experiment=\"^ssp\").unique(\"experiment\"))"
    ]
   },
@@ -195,8 +194,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Regex: Find all experiments except the exact string \"ssp245\"\n",
-    "print(DC.search(experiment=\"^(?!ssp245$).*$\").unique(\"experiment\"))"
+    "# Regex: Find all experiments except the exact string \"ssp126\"\n",
+    "print(DC.search(experiment=\"^(?!ssp126$).*$\").unique(\"experiment\"))"
    ]
   },
   {
@@ -257,12 +256,12 @@
     "- `restrict_members` is used to limit the results to a maximum number of realizations for each source.\n",
     "- `restrict_warming_level` is used to limit the results to only datasets that are present in the csv used for calculating warming levels.\n",
     "\n",
-    "Note that compared to `search`, the result of `search_data_catalog` is a dictionary with one entry per unique ID. A given unique ID might contain multiple datasets as per `intake-esm`'s definition, because it groups catalog lines per *id - domain - processing_level - xrfreq*. Thus, it separates model data that exists at different frequencies.\n",
+    "Note that compared to `search`, the result of `search_data_catalog` is a dictionary with one entry per unique ID. A given unique ID might contain multiple datasets as per `intake-esm`'s definition, because it groups catalog lines per *id - domain - processing_level - xrfreq*. Thus, it would separate model data that exists at different frequencies.\n",
     "\n",
     "\n",
-    "#### Example 1: Simple dataset\n",
+    "#### Example 1: Multiple variables and frequencies + Historical and future\n",
     "\n",
-    "Let's start by searching for CMIP6 data that has subdaily precipitation, daily temperature and the land fraction data. The main difference compared to searching for reference datasets is that in most cases, `match_hist_and_fut` will be required to match *historical* simulations to their future counterparts. This works for both CMIP5 and CMIP6 nomenclatures."
+    "Let's start by searching for CMIP6 data that has subdaily precipitation, daily minimum temperature and the land fraction data. The main difference compared to searching for reference datasets is that in most cases, `match_hist_and_fut` will be required to match *historical* simulations to their future counterparts. This works for both CMIP5 and CMIP6 nomenclatures."
    ]
   },
   {
@@ -276,8 +275,8 @@
    "source": [
     "import xscen as xs\n",
     "\n",
-    "variables_and_freqs = {\"tas\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
-    "other_search_criteria = {\"institution\": [\"NOAA-GFDL\"], \"experiment\": [\"ssp585\"]}\n",
+    "variables_and_freqs = {\"tasmin\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
+    "other_search_criteria = {\"institution\": [\"NOAA-GFDL\"]}\n",
     "\n",
     "cat_sim = xs.search_data_catalogs(\n",
     "    data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
@@ -289,12 +288,34 @@
     "cat_sim"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "82535e6c",
+   "metadata": {},
+   "source": [
+    "If required, at this stage, a dataset can be looked at in more details. If we examine the results (look at the 'date_start' and 'date_end' columns), we'll see that it successfully found historical simulations in the *CMIP* activity and renamed both their *activity* and *experiment* to match the future simulations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a6e5bd7e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "cat_sim[\"ScenarioMIP_NOAA-GFDL_GFDL-CM4_ssp585_r1i1p1f1_gr1\"].df"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "85ee34fe",
    "metadata": {},
    "source": [
-    "Two simulations correspond to the search criteria, but as can be seen from the results, it is the same simulation on 2 different grids (`gr1` and `gr2`). If desired, `restrict_resolution` can be called to choose the finest or coarsest grid in such cases."
+    "#### Example 2: Restricting results\n",
+    "\n",
+    "The two previous search results were the same simulation, but on 2 different grids (`gr1` and `gr2`). If desired, `restrict_resolution` can be called to choose the finest or coarsest grid."
    ]
   },
   {
@@ -306,7 +327,7 @@
    },
    "outputs": [],
    "source": [
-    "variables_and_freqs = {\"tas\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
+    "variables_and_freqs = {\"tasmin\": \"D\", \"pr\": \"3H\", \"sftlf\": \"fx\"}\n",
     "other_search_criteria = {\"institution\": [\"NOAA-GFDL\"], \"experiment\": [\"ssp585\"]}\n",
     "\n",
     "cat_sim = xs.search_data_catalogs(\n",
@@ -322,30 +343,65 @@
   },
   {
    "cell_type": "markdown",
-   "id": "82535e6c",
+   "id": "fcd847c0-0ea8-46ad-bc28-9b73edd627bc",
    "metadata": {},
    "source": [
-    "If required, at this stage a dataset can be looked at in more details. If we examine the results (look at the 'date_start' and 'date_end' columns), we'll see that it successfully found historical simulations in the *CMIP* activity and renamed both their *activity* and *experiment* to match the future simulations."
+    "Similarly, if we search for historical NorESM2-MM data, we'll find that it has 3 members. If desired, `restrict_members` can be called to choose a maximum number of realization per model."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a6e5bd7e",
-   "metadata": {
-    "tags": []
-   },
+   "id": "6c5ce0fc-2f25-4b55-bf65-5f140f07e331",
+   "metadata": {},
    "outputs": [],
    "source": [
-    "cat_sim[\"ScenarioMIP_NOAA-GFDL_GFDL-CM4_ssp585_r1i1p1f1_gr1\"].df"
+    "variables_and_freqs = {\"tasmin\": \"D\"}\n",
+    "other_search_criteria = {\"source\": [\"NorESM2-MM\"], \"experiment\": [\"historical\"]}\n",
+    "\n",
+    "cat_sim = xs.search_data_catalogs(\n",
+    "    data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
+    "    variables_and_freqs=variables_and_freqs,\n",
+    "    other_search_criteria=other_search_criteria,\n",
+    "    restrict_members={\"ordered\": 2},\n",
+    ")\n",
+    "\n",
+    "cat_sim"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fd28f58-5ab7-4d65-8906-2197592c8c94",
+   "metadata": {},
+   "source": [
+    "Finally, `restrict_warming_level` can be used to be sure that the results either exist in `xscen`'s warming level database (if a boolean), or reach a given warming level."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3300b1d7-e37b-4aa4-991e-99609bb1adea",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "variables_and_freqs = {\"tasmin\": \"D\"}\n",
+    "\n",
+    "cat_sim = xs.search_data_catalogs(\n",
+    "    data_catalogs=[f\"{Path().absolute()}/samples/pangeo-cmip6.json\"],\n",
+    "    variables_and_freqs=variables_and_freqs,\n",
+    "    match_hist_and_fut=True,\n",
+    "    restrict_warming_level=True,  # In this case all models exist in our database, so nothing gets eliminated.\n",
+    ")\n",
+    "\n",
+    "cat_sim"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "6bddc58d",
    "metadata": {},
    "source": [
-    "#### Example 2: Advanced search\n",
+    "#### Example 3: Search for data that can be computed from what's available\n",
     "\n",
     "`allow_resampling` and `allow_conversion` are powerful search tools to find data that doesn't explicitely exist in the catalog, but that can easily be computed."
    ]
@@ -375,7 +431,7 @@
    "id": "b33b2ad7",
    "metadata": {},
    "source": [
-    "If we examine the SSP5-8.5 results, we'll see that while it failed to find *evspsblpot*, it successfully understood that *tasmin* and *tasmax* can be used to compute it. It also understood that daily *tas* is a valid search result for `{tas: YS}`, since it can be aggregated."
+    "If we examine the SSP5-8.5 results, we'll see that while it failed to find *evspsblpot*, it successfully understood that *tasmin* and *tasmax* can be used to compute it. It also understood that daily *tasmin* and *tasmax* is a valid search result for `{tas: YS}`, since it can be computed first, then aggregated to a yearly frequency."
    ]
   },
   {
@@ -413,15 +469,15 @@
     "    other_search_criteria={\n",
     "        \"source\": [\"NorESM2-MM\"],\n",
     "        \"processing_level\": [\"raw\"],\n",
-    "        \"experiment\": [\"ssp370\"],\n",
+    "        \"experiment\": [\"ssp585\"],\n",
     "    },\n",
     "    match_hist_and_fut=True,\n",
     "    allow_resampling=True,\n",
     "    allow_conversion=True,\n",
     ")\n",
     "print(\n",
     "    cat_sim_adv_multifreq[\n",
-    "        \"ScenarioMIP_NCC_NorESM2-MM_ssp370_r1i1p1f1_gn\"\n",
+    "        \"ScenarioMIP_NCC_NorESM2-MM_ssp585_r1i1p1f1_gn\"\n",
     "    ]._requested_variable_freqs\n",
     ")"
    ]
@@ -435,7 +491,7 @@
     "\n",
     "The `allow_conversion` argument is built upon `xclim`'s virtual indicators module and `intake-esm`'s [DerivedVariableRegistry](https://ncar.github.io/esds/posts/2021/intake-esm-derived-variables/) in a way that should be seamless to the user. It works by using the methods defined in `xscen/xclim_modules/conversions.yml` to add a registry of *derived* variables that exist virtually through computation methods.\n",
     "\n",
-    "In the example above, we can see that the search failed to find *evspsblpot* within *NORESM2-MM*, but understood that *tasmin* and *tasmax* could be used to estimate it using `xclim`'s `potential_evapotranspiration`.\n",
+    "In the example above, we can see that the search failed to find *evspsblpot* within *NorESM2-MM*, but understood that *tasmin* and *tasmax* could be used to estimate it using `xclim`'s `potential_evapotranspiration`.\n",
     "\n",
     "Most use cases should already be covered by the aforementioned file. The preferred way to add new methods is to [submit a new indicator to xclim](https://xclim.readthedocs.io/en/stable/contributing.html), and then to add a call to that indicator in `conversions.yml`. In the case where this is not possible or where the transformation would be out of scope for `xclim`, the calculation can be implemented into `xscen/xclim_modules/conversions.py` instead.\n",
     "\n",
@@ -678,7 +734,7 @@
    "outputs": [],
    "source": [
     "# Create fake files for the example:\n",
-    "root = Path(\".\").absolute() / \"parser_examples\"\n",
+    "root = Path(\".\").absolute() / \"_data\" / \"parser_examples\"\n",
     "root.mkdir(exist_ok=True)\n",
     "\n",
     "paths = [\n",
@@ -1006,7 +1062,7 @@
     "import shutil as sh\n",
     "\n",
     "# Create the destination folder\n",
-    "root = Path(\".\").absolute() / \"path_builder_examples\"\n",
+    "root = Path(\".\").absolute() / \"_data\" / \"path_builder_examples\"\n",
     "root.mkdir(exist_ok=True)\n",
     "\n",
     "# Get new names:\n",
@@ -1044,7 +1100,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,