Address a few FutureWarnings (#380)

### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #xyz - [x] (If applicable) Documentation has been added / updated (for bug fixes / features). - [x] (If applicable) Tests have been added. - [x] This PR does not seem to break the templates. - [x] CHANGES.rst has been updated (with summary of main changes). - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added. ### What kind of change does this PR introduce? * Addresses a few FutureWarning I encountered recently: * `groupby` will change the default to `observed=True`. I think that our implementation here does not care about `observed`, even if we use categoricals, but I'm not 100% sure. We could use `observed=False` to ensure no breaking change. * https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html * https://towardsdatascience.com/be-careful-when-using-pandas-groupby-with-categorical-data-type-a1d31f66b162 * Changed a few of the old `pandas` codes that were missed. * Changed `pd.unique` to `np.unique` * `pd.unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.` * `intake_esm` no longer spams the warning about `applymap`, so our fix was removed. It still has the "observed=True" spam, however. * Changed an implementation of inplace modifications to a DataFrame. * `FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.` * Added a temporary fix for the `flox` spam in the documentation. ### Does this PR introduce a breaking change? - To avoid breaking changes, 'Y' and 'M' are still allowed in `date_parser`, so no. ### Other information: -
Ouranosinc · Apr 11, 2024 · 32b360f · 32b360f
2 parents fc0ae6e + 3d38b8c
commit 32b360f
Show file tree

Hide file tree

Showing 10 changed files with 100 additions and 25 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -21,6 +21,7 @@ Internal changes
 * Added more tests. (:pull:`366`, :pull:`367`, :pull:`372`).
 * Refactored ``xs.spatial.subset`` into smaller functions. (:pull:`367`).
 * An `encoding` argument was added to ``xs.config.load_config``. (:pull:`370`).
+* Various small fixes to the code to address FutureWarnings. (:pull:`380`).
 
 Bug fixes
 ^^^^^^^^^

diff --git a/docs/notebooks/2_getting_started.ipynb b/docs/notebooks/2_getting_started.ipynb
@@ -1,5 +1,24 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eb10a72a-9ea1-4414-922b-0ea1aaea0648",
+   "metadata": {
+    "nbsphinx": "hidden"
+   },
+   "outputs": [],
+   "source": [
+    "# Remove flox spam\n",
+    "\n",
+    "import logging\n",
+    "\n",
+    "# Get the logger for the 'flox' package\n",
+    "logger = logging.getLogger(\"flox\")\n",
+    "# Set the logging level to WARNING\n",
+    "logger.setLevel(logging.WARNING)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "4f220a85",
@@ -1481,7 +1500,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/3_diagnostics.ipynb b/docs/notebooks/3_diagnostics.ipynb
@@ -1,5 +1,24 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d513b8c4-0cb4-429b-b169-e0d8d40c795f",
+   "metadata": {
+    "nbsphinx": "hidden"
+   },
+   "outputs": [],
+   "source": [
+    "# Remove flox spam\n",
+    "\n",
+    "import logging\n",
+    "\n",
+    "# Get the logger for the 'flox' package\n",
+    "logger = logging.getLogger(\"flox\")\n",
+    "# Set the logging level to WARNING\n",
+    "logger.setLevel(logging.WARNING)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -484,7 +503,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/4_ensembles.ipynb b/docs/notebooks/4_ensembles.ipynb
@@ -1,5 +1,23 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "nbsphinx": "hidden"
+   },
+   "outputs": [],
+   "source": [
+    "# Remove flox spam\n",
+    "\n",
+    "import logging\n",
+    "\n",
+    "# Get the logger for the 'flox' package\n",
+    "logger = logging.getLogger(\"flox\")\n",
+    "# Set the logging level to WARNING\n",
+    "logger.setLevel(logging.WARNING)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -36,7 +54,9 @@
     "\n",
     "for d in datasets:\n",
     "    ds = open_dataset(datasets[d]).isel(lon=slice(0, 4), lat=slice(0, 4))\n",
-    "    ds = xs.climatological_mean(ds, window=30, periods=[[1981, 2010], [2021, 2050]])\n",
+    "    ds = xs.climatological_op(\n",
+    "        ds, op=\"mean\", window=30, periods=[[1981, 2010], [2021, 2050]]\n",
+    "    )\n",
     "    datasets[d] = xs.compute_deltas(ds, reference_horizon=\"1981-2010\")\n",
     "    datasets[d].attrs[\"cat:id\"] = d  # Required by build_reduction_data\n",
     "    datasets[d].attrs[\"cat:xrfreq\"] = \"AS-JAN\""
@@ -270,7 +290,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.5"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/5_warminglevels.ipynb b/docs/notebooks/5_warminglevels.ipynb
@@ -1,5 +1,24 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f1899896-70a1-4efb-80e6-8765b95f4388",
+   "metadata": {
+    "nbsphinx": "hidden"
+   },
+   "outputs": [],
+   "source": [
+    "# Remove flox spam\n",
+    "\n",
+    "import logging\n",
+    "\n",
+    "# Get the logger for the 'flox' package\n",
+    "logger = logging.getLogger(\"flox\")\n",
+    "# Set the logging level to WARNING\n",
+    "logger.setLevel(logging.WARNING)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "3e311475",
@@ -483,7 +502,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/6_config.ipynb b/docs/notebooks/6_config.ipynb
@@ -277,7 +277,7 @@
     "import xarray as xr\n",
     "\n",
     "# Create a dummy dataset\n",
-    "time = pd.date_range(\"1951-01-01\", \"2100-01-01\", freq=\"AS-JAN\")\n",
+    "time = pd.date_range(\"1951-01-01\", \"2100-01-01\", freq=\"YS-JAN\")\n",
     "da = xr.DataArray([0] * len(time), coords={\"time\": time})\n",
     "da.name = \"test\"\n",
     "ds = da.to_dataset()\n",
@@ -378,7 +378,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.11"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/xscen/__init__.py b/xscen/__init__.py
@@ -75,9 +75,3 @@ def warning_on_one_line(
     "Pass observed=False to retain current behavior or observed=True to adopt the future default "
     "and silence this warning.",
 )
-warnings.filterwarnings(
-    "ignore",
-    category=FutureWarning,
-    module="intake_esm",
-    message="DataFrame.applymap has been deprecated. Use DataFrame.map instead.",
-)
diff --git a/xscen/catutils.py b/xscen/catutils.py
@@ -634,11 +634,13 @@ def parse_directory(  # noqa: C901
 
     # translate xrfreq into frequencies and vice-versa
     if {"xrfreq", "frequency"}.issubset(df.columns):
-        df["xrfreq"].fillna(
-            df["frequency"].apply(CV.frequency_to_xrfreq, default=pd.NA), inplace=True
+        df.fillna(
+            {"xrfreq": df["frequency"].apply(CV.frequency_to_xrfreq, default=pd.NA)},
+            inplace=True,
         )
-        df["frequency"].fillna(
-            df["xrfreq"].apply(CV.xrfreq_to_frequency, default=pd.NA), inplace=True
+        df.fillna(
+            {"frequency": df["xrfreq"].apply(CV.xrfreq_to_frequency, default=pd.NA)},
+            inplace=True,
         )
 
     # Parse dates
@@ -757,7 +759,7 @@ def parse_from_ds(  # noqa: C901
             attrs["variable"] = tuple(sorted(variables))
         elif name in ("frequency", "xrfreq") and time is not None and time.size > 3:
             # round to the minute to catch floating point imprecision
-            freq = xr.infer_freq(time.round("T"))
+            freq = xr.infer_freq(time.round("min"))
             if freq:
                 if "xrfreq" in names:
                     attrs["xrfreq"] = freq

diff --git a/xscen/extract.py b/xscen/extract.py
@@ -175,7 +175,7 @@ def extract_dataset(  # noqa: C901
     )
 
     out_dict = {}
-    for xrfreq in pd.unique([x for y in variables_and_freqs.values() for x in y]):
+    for xrfreq in np.unique([x for y in variables_and_freqs.values() for x in y]):
         ds = xr.Dataset()
         attrs = {}
         # iterate on the datasets, in reverse timedelta order
@@ -814,7 +814,8 @@ def search_data_catalogs(  # noqa: C901
                             valid_tp = []
                             for var, group in varcat.df.groupby(
                                 varcat.esmcat.aggregation_control.groupby_attrs
-                                + ["variable"]
+                                + ["variable"],
+                                observed=True,
                             ):
                                 valid_tp.append(
                                     subset_file_coverage(

diff --git a/xscen/utils.py b/xscen/utils.py
@@ -172,7 +172,7 @@ def date_parser(  # noqa: C901
     date : str, cftime.datetime, pd.Timestamp, datetime.datetime, pd.Period
         Date to be converted
     end_of_period : bool or str
-        If 'Y' or 'M', the returned date will be the end of the year or month that contains the received date.
+        If 'YE' or 'ME', the returned date will be the end of the year or month that contains the received date.
         If True, the period is inferred from the date's precision, but `date` must be a string, otherwise nothing is done.
     out_dtype : str
         Choices are 'datetime', 'period' or 'str'
@@ -245,12 +245,12 @@ def _parse_date(date, fmts):
 
     if isinstance(end_of_period, str) or (end_of_period is True and fmt):
         quasiday = (pd.Timedelta(1, "d") - pd.Timedelta(1, "s")).as_unit(date.unit)
-        if end_of_period == "Y" or "m" not in fmt:
+        if end_of_period in ["Y", "YE"] or "m" not in fmt:
             date = (
-                pd.tseries.frequencies.to_offset("A-DEC").rollforward(date) + quasiday
+                pd.tseries.frequencies.to_offset("YE-DEC").rollforward(date) + quasiday
             )
-        elif end_of_period == "M" or "d" not in fmt:
-            date = pd.tseries.frequencies.to_offset("M").rollforward(date) + quasiday
+        elif end_of_period in ["M", "ME"] or "d" not in fmt:
+            date = pd.tseries.frequencies.to_offset("ME").rollforward(date) + quasiday
         # TODO: Implement subdaily ?
 
     if out_dtype == "str":