From 71c78fc3b353fc72b82ae6bfc2426b42fcc10359 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Wed, 5 Jul 2023 08:25:14 +0100
Subject: [PATCH 01/14] datatypes notebook

---
 examples/AA_datatypes_and_datasets.ipynb |  18 +-
 examples/datasets/data_conversions.ipynb | 238 +++++++++++++++++++++++
 2 files changed, 252 insertions(+), 4 deletions(-)
 create mode 100644 examples/datasets/data_conversions.ipynb
diff --git a/examples/AA_datatypes_and_datasets.ipynb b/examples/AA_datatypes_and_datasets.ipynb
index 4a82197063..0bbec22f95 100644
--- a/examples/AA_datatypes_and_datasets.ipynb
+++ b/examples/AA_datatypes_and_datasets.ipynb
@@ -748,13 +748,23 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "              dask_series  np.ndarray  pd.DataFrame  pd.Series  xr.DataArray\ndask_series             1           1             1          1             1\nnp.ndarray              1           1             1          1             1\npd.DataFrame            1           1             1          1             1\npd.Series               1           1             1          1             1\nxr.DataArray            1           1             1          1             1",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>dask_series</th>\n      <th>np.ndarray</th>\n      <th>pd.DataFrame</th>\n      <th>pd.Series</th>\n      <th>xr.DataArray</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>dask_series</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>np.ndarray</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>pd.DataFrame</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>pd.Series</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>xr.DataArray</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "from aeon.datatypes._convert import _conversions_defined\n",
     "\n",
-    "_conversions_defined(scitype=\"Panel\")"
+    "_conversions_defined(scitype=\"Series\")"
    ]
   },
   {
diff --git a/examples/datasets/data_conversions.ipynb b/examples/datasets/data_conversions.ipynb
new file mode 100644
index 0000000000..225eaf9d47
--- /dev/null
+++ b/examples/datasets/data_conversions.ipynb
@@ -0,0 +1,238 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Data conversions in aeon\n",
+    "\n",
+    "We recommend you follow the data storage described in the [data storage notebook](examples/datasets/data_storage.ipynb)\n",
+    "which can be summarised as follows: Use `pd.Series` or `pd.DataFrame` for forecasting\n",
+    " and for classification, clustering and regression, use 3D numpy of shape `(n_cases,\n",
+    " n_channels, n_timepoints)` if your collection of time series are equal length, or a\n",
+    "  list of 2D numpy of length `[n_cases]` if not equal length. All are [data loaders]\n",
+    "  (examples/datasets/data_loading.ipynb)  use this format.\n",
+    "\n",
+    "However, `aeon` provides a range of converters in the `datatypes` package. These are\n",
+    "grouped into converters for single series and converters for collections of series"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Series Converters\n",
+    "\n",
+    "Single time series can be stored in the following data structures\n",
+    "\n",
+    "pd.Series: a univariate time series\n",
+    "pd.DataFrame: a univariate or multivariate time series\n",
+    "np.ndarray: 2D numpy.ndarray of shape `(n_timepoints, n_channels)`.\n",
+    "xr.DataArray: a univariate or multivariate time series\n",
+    "dask_series: Dask DataFrame: a univariate or multivariate time series\n",
+    "\n",
+    "NOTE the 2D numpy array representation is not consistent with that used in\n",
+    "collections. This is an unfortunate difference that is a result of legacy design and\n",
+    "norms in different research fields. We recommend not using numpy arrays with\n",
+    "forecasting.\n",
+    "\n",
+    "Conversion to and from these data structures is fairly straightforward. `aeon` contains\n",
+    "converters that are part of the legacy code base. There is a wrapper to hide all this\n",
+    " code, but we also show under the hood. This code is not likely to be maintained."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "xarray.core.dataarray.DataArray"
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "from aeon.datatypes import convert\n",
+    "\n",
+    "numpyarray = np.random.random(size=(100, 1))\n",
+    "series = convert(numpyarray, from_type=\"np.ndarray\", to_type=\"xr.DataArray\")\n",
+    "type(series)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "All the actual converter functions for series are in the following file `aeon.datatypes._series._convert`. We stress,\n",
+    "this is legacy code. `aeon` thinks it better the user is responsible for getting the\n",
+    "data into the best format for the estimators."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "pandas.core.frame.DataFrame"
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from aeon.datatypes._series._convert import (\n",
+    "    convert_mvs_to_dask_as_series,\n",
+    "    convert_Mvs_to_xrdatarray_as_Series,\n",
+    "    convert_np_to_MvS_as_Series,\n",
+    ")\n",
+    "\n",
+    "pd_dataframe = convert_np_to_MvS_as_Series(numpyarray)\n",
+    "type(pd_dataframe)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "dask.dataframe.core.DataFrame"
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dask_dataframe = convert_mvs_to_dask_as_series(pd_dataframe)\n",
+    "type(dask_dataframe)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "xarray.core.dataarray.DataArray"
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "xrarray = convert_Mvs_to_xrdatarray_as_Series(pd_dataframe)\n",
+    "type(xrarray)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "outputs": [],
+   "source": [],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Collections Converters\n",
+    "\n",
+    "Previously, collections of time series were called panels (a term from econometrics,\n",
+    "not machine learning), and there are still references to panel. Collections can be\n",
+    "stored as follows\n",
+    "\n",
+    "numpy3D: 3D np.array of format (n_instances, n_channels, n_timepoints)\n",
+    "np-list:\n",
+    "\n",
+    "\n",
+    "MTYPE_REGISTER_PANEL = [\n",
+    "    (\n",
+    "        \"nested_univ\",\n",
+    "        \"Panel\",\n",
+    "        \"pd.DataFrame with one column per channel, pd.Series in cells\",\n",
+    "    ),\n",
+    "    (\n",
+    "        \"numpy3D\",\n",
+    "        \"Panel\",\n",
+    "        \"3D np.array of format (n_instances, n_channels, n_timepoints)\",\n",
+    "    ),\n",
+    "    (\n",
+    "        \"numpyflat\",\n",
+    "        \"Panel\",\n",
+    "        \"2D np.array of format (n_instances, n_columns*n_timepoints)\",\n",
+    "    ),\n",
+    "    (\"pd-multiindex\", \"Panel\", \"pd.DataFrame with multi-index (instances, timepoints)\"),\n",
+    "    (\"pd-wide\", \"Panel\", \"pd.DataFrame in wide format, cols = (instance*timepoints)\"),\n",
+    "    (\n",
+    "        \"pd-long\",\n",
+    "        \"Panel\",\n",
+    "        \"pd.DataFrame in long format, cols = (index, time_index, column)\",\n",
+    "    ),\n",
+    "    (\"df-list\", \"Panel\", \"list of pd.DataFrame\"),\n",
+    "    (\n",
+    "        \"dask_panel\",\n",
+    "        \"Panel\",\n",
+    "        \"dask frame with one instance and one time index, as per dask_to_pd convention\",\n",
+    "    ),\n",
+    "    (\n",
+    "        \"np-list\",\n",
+    "        \"Panel\",\n",
+    "        \"list of n_cases, each case a 2D np.array of shape (n_channels, series_length)\",\n",
+    "    ),\n",
+    "]\n"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

From 6817086a3af64f4fdcccbf9339b532118998ebf8 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Wed, 5 Jul 2023 09:33:28 +0100
Subject: [PATCH 02/14] datatypes notebook

---
 aeon/classification/tests/test_base.py        |   2 +-
 aeon/datasets/_dataframe_loaders.py           |   2 +-
 aeon/datatypes/_check.py                      |   2 +-
 .../{_panel => _collection}/__init__.py       |  12 +-
 .../{_panel => _collection}/_check.py         |   0
 .../{_panel => _collection}/_convert.py       |   2 +-
 .../{_panel => _collection}/_examples.py      |   0
 .../{_panel => _collection}/_registry.py      |   9 +-
 aeon/datatypes/_convert.py                    |   2 +-
 aeon/datatypes/_examples.py                   |  10 +-
 aeon/datatypes/_hierarchical/_check.py        |   2 +-
 aeon/datatypes/_registry.py                   |  10 +-
 aeon/datatypes/tests/test_panel_converters.py |   4 +-
 .../tests/test_series_to_panel_converters.py  |   2 +-
 aeon/forecasting/base/tests/test_base.py      |   2 +-
 aeon/transformations/collection/segment.py    |   2 +-
 aeon/transformations/collection/tsfresh.py    |   2 +-
 aeon/utils/_testing/estimator_checks.py       |   2 +-
 aeon/utils/validation/panel.py                |   4 +-
 examples/datasets/data_conversions.ipynb      | 216 ++++++++++++------
 20 files changed, 183 insertions(+), 104 deletions(-)
 rename aeon/datatypes/{_panel => _collection}/__init__.py (51%)
 rename aeon/datatypes/{_panel => _collection}/_check.py (100%)
 rename aeon/datatypes/{_panel => _collection}/_convert.py (99%)
 rename aeon/datatypes/{_panel => _collection}/_examples.py (100%)
 rename aeon/datatypes/{_panel => _collection}/_registry.py (77%)

diff --git a/aeon/classification/tests/test_base.py b/aeon/classification/tests/test_base.py
index b90bcbccd7..c2fa94cd2e 100644
--- a/aeon/classification/tests/test_base.py
+++ b/aeon/classification/tests/test_base.py
@@ -9,7 +9,7 @@
 
 from aeon.classification import DummyClassifier
 from aeon.classification.base import BaseClassifier
-from aeon.datatypes._panel._convert import (
+from aeon.datatypes._collection._convert import (
     from_nested_to_dflist_adp,
     from_nested_to_multi_index,
 )
diff --git a/aeon/datasets/_dataframe_loaders.py b/aeon/datasets/_dataframe_loaders.py
index dbd5483620..e83424d6a1 100644
--- a/aeon/datasets/_dataframe_loaders.py
+++ b/aeon/datasets/_dataframe_loaders.py
@@ -24,7 +24,7 @@
 
 from aeon.datasets._data_generators import _convert_tsf_to_hierarchical
 from aeon.datatypes import MTYPE_LIST_HIERARCHICAL, convert
-from aeon.datatypes._panel._convert import from_long_to_nested
+from aeon.datatypes._collection._convert import from_long_to_nested
 
 DIRNAME = "data"
 MODULE = os.path.dirname(__file__)
diff --git a/aeon/datatypes/_check.py b/aeon/datatypes/_check.py
index 3861b640a0..23f600dbec 100644
--- a/aeon/datatypes/_check.py
+++ b/aeon/datatypes/_check.py
@@ -29,8 +29,8 @@
 import numpy as np
 
 from aeon.datatypes._alignment import check_dict_Alignment
+from aeon.datatypes._collection import check_dict_Panel
 from aeon.datatypes._hierarchical import check_dict_Hierarchical
-from aeon.datatypes._panel import check_dict_Panel
 from aeon.datatypes._proba import check_dict_Proba
 from aeon.datatypes._registry import AMBIGUOUS_MTYPES, SCITYPE_LIST, mtype_to_scitype
 from aeon.datatypes._series import check_dict_Series
diff --git a/aeon/datatypes/_panel/__init__.py b/aeon/datatypes/_collection/__init__.py
similarity index 51%
rename from aeon/datatypes/_panel/__init__.py
rename to aeon/datatypes/_collection/__init__.py
index bab0affda4..02aa9e0c06 100644
--- a/aeon/datatypes/_panel/__init__.py
+++ b/aeon/datatypes/_collection/__init__.py
@@ -1,16 +1,16 @@
 # -*- coding: utf-8 -*-
 """Module exports: Panel type checkers, converters and mtype inference."""
 
-from aeon.datatypes._panel._check import check_dict as check_dict_Panel
-from aeon.datatypes._panel._convert import convert_dict as convert_dict_Panel
-from aeon.datatypes._panel._examples import example_dict as example_dict_Panel
-from aeon.datatypes._panel._examples import (
+from aeon.datatypes._collection._check import check_dict as check_dict_Panel
+from aeon.datatypes._collection._convert import convert_dict as convert_dict_Panel
+from aeon.datatypes._collection._examples import example_dict as example_dict_Panel
+from aeon.datatypes._collection._examples import (
     example_dict_lossy as example_dict_lossy_Panel,
 )
-from aeon.datatypes._panel._examples import (
+from aeon.datatypes._collection._examples import (
     example_dict_metadata as example_dict_metadata_Panel,
 )
-from aeon.datatypes._panel._registry import MTYPE_LIST_PANEL, MTYPE_REGISTER_PANEL
+from aeon.datatypes._collection._registry import MTYPE_LIST_PANEL, MTYPE_REGISTER_PANEL
 
 __all__ = [
     "check_dict_Panel",
diff --git a/aeon/datatypes/_panel/_check.py b/aeon/datatypes/_collection/_check.py
similarity index 100%
rename from aeon/datatypes/_panel/_check.py
rename to aeon/datatypes/_collection/_check.py
diff --git a/aeon/datatypes/_panel/_convert.py b/aeon/datatypes/_collection/_convert.py
similarity index 99%
rename from aeon/datatypes/_panel/_convert.py
rename to aeon/datatypes/_collection/_convert.py
index fe3b2f1ee5..4b5d4fdd9a 100644
--- a/aeon/datatypes/_panel/_convert.py
+++ b/aeon/datatypes/_collection/_convert.py
@@ -34,8 +34,8 @@
     "convert_dict",
 ]
 
+from aeon.datatypes._collection._registry import MTYPE_LIST_PANEL
 from aeon.datatypes._convert_utils._convert import _extend_conversions
-from aeon.datatypes._panel._registry import MTYPE_LIST_PANEL
 from aeon.utils.validation._dependencies import _check_soft_dependencies
 
 # dictionary indexed by triples of types
diff --git a/aeon/datatypes/_panel/_examples.py b/aeon/datatypes/_collection/_examples.py
similarity index 100%
rename from aeon/datatypes/_panel/_examples.py
rename to aeon/datatypes/_collection/_examples.py
diff --git a/aeon/datatypes/_panel/_registry.py b/aeon/datatypes/_collection/_registry.py
similarity index 77%
rename from aeon/datatypes/_panel/_registry.py
rename to aeon/datatypes/_collection/_registry.py
index 94e0f39da6..51f1f6d79b 100644
--- a/aeon/datatypes/_panel/_registry.py
+++ b/aeon/datatypes/_collection/_registry.py
@@ -1,5 +1,5 @@
 # -*- coding: utf-8 -*-
-"""Registry of mtypes for Panel scitype. See datatypes._registry for API."""
+"""Registry of mtypes for Collections. See datatypes._registry for API."""
 
 import pandas as pd
 
@@ -19,12 +19,12 @@
     (
         "numpy3D",
         "Panel",
-        "3D np.array of format (n_instances, n_channels, n_timepoints)",
+        "3D np.ndarray of format (n_cases, n_channels, n_timepoints)",
     ),
     (
         "numpyflat",
         "Panel",
-        "2D np.array of format (n_instances, n_columns*n_timepoints)",
+        "2D np.ndarray of format (n_cases, n_channels*n_timepoints)",
     ),
     ("pd-multiindex", "Panel", "pd.DataFrame with multi-index (instances, timepoints)"),
     ("pd-wide", "Panel", "pd.DataFrame in wide format, cols = (instance*timepoints)"),
@@ -42,7 +42,8 @@
     (
         "np-list",
         "Panel",
-        "list of n_cases, each case a 2D np.array of shape (n_channels, series_length)",
+        "list of length [n_cases], each case a 2D np.ndarray of shape (n_channels, "
+        "n_timepoints)",
     ),
 ]
 
diff --git a/aeon/datatypes/_convert.py b/aeon/datatypes/_convert.py
index c06b5d4965..4afa17ce81 100644
--- a/aeon/datatypes/_convert.py
+++ b/aeon/datatypes/_convert.py
@@ -71,8 +71,8 @@
 import pandas as pd
 
 from aeon.datatypes._check import mtype as infer_mtype
+from aeon.datatypes._collection import convert_dict_Panel
 from aeon.datatypes._hierarchical import convert_dict_Hierarchical
-from aeon.datatypes._panel import convert_dict_Panel
 from aeon.datatypes._proba import convert_dict_Proba
 from aeon.datatypes._registry import AMBIGUOUS_MTYPES, mtype_to_scitype
 from aeon.datatypes._series import convert_dict_Series
diff --git a/aeon/datatypes/_examples.py b/aeon/datatypes/_examples.py
index be06bc52f9..d860d1f26c 100644
--- a/aeon/datatypes/_examples.py
+++ b/aeon/datatypes/_examples.py
@@ -23,16 +23,16 @@
 ]
 
 from aeon.datatypes._alignment import example_dict_Alignment
+from aeon.datatypes._collection import (
+    example_dict_lossy_Panel,
+    example_dict_metadata_Panel,
+    example_dict_Panel,
+)
 from aeon.datatypes._hierarchical import (
     example_dict_Hierarchical,
     example_dict_lossy_Hierarchical,
     example_dict_metadata_Hierarchical,
 )
-from aeon.datatypes._panel import (
-    example_dict_lossy_Panel,
-    example_dict_metadata_Panel,
-    example_dict_Panel,
-)
 from aeon.datatypes._proba import (
     example_dict_lossy_Proba,
     example_dict_metadata_Proba,
diff --git a/aeon/datatypes/_hierarchical/_check.py b/aeon/datatypes/_hierarchical/_check.py
index 1181180dfe..4c29115a2e 100644
--- a/aeon/datatypes/_hierarchical/_check.py
+++ b/aeon/datatypes/_hierarchical/_check.py
@@ -44,7 +44,7 @@
 
 import numpy as np
 
-from aeon.datatypes._panel._check import check_pdmultiindex_panel
+from aeon.datatypes._collection._check import check_pdmultiindex_panel
 from aeon.utils.validation._dependencies import _check_soft_dependencies
 
 
diff --git a/aeon/datatypes/_registry.py b/aeon/datatypes/_registry.py
index 57fcc2d8cf..d13a63e240 100644
--- a/aeon/datatypes/_registry.py
+++ b/aeon/datatypes/_registry.py
@@ -43,16 +43,16 @@
     MTYPE_LIST_ALIGNMENT,
     MTYPE_REGISTER_ALIGNMENT,
 )
+from aeon.datatypes._collection._registry import (
+    MTYPE_LIST_PANEL,
+    MTYPE_REGISTER_PANEL,
+    MTYPE_SOFT_DEPS_PANEL,
+)
 from aeon.datatypes._hierarchical._registry import (
     MTYPE_LIST_HIERARCHICAL,
     MTYPE_REGISTER_HIERARCHICAL,
     MTYPE_SOFT_DEPS_HIERARCHICAL,
 )
-from aeon.datatypes._panel._registry import (
-    MTYPE_LIST_PANEL,
-    MTYPE_REGISTER_PANEL,
-    MTYPE_SOFT_DEPS_PANEL,
-)
 from aeon.datatypes._proba._registry import MTYPE_LIST_PROBA, MTYPE_REGISTER_PROBA
 from aeon.datatypes._series._registry import (
     MTYPE_LIST_SERIES,
diff --git a/aeon/datatypes/tests/test_panel_converters.py b/aeon/datatypes/tests/test_panel_converters.py
index adc68997c0..7a0737b368 100644
--- a/aeon/datatypes/tests/test_panel_converters.py
+++ b/aeon/datatypes/tests/test_panel_converters.py
@@ -6,12 +6,12 @@
 
 from aeon.datasets import make_example_long_table, make_example_multi_index_dataframe
 from aeon.datatypes._adapter import convert_from_multiindex_to_listdataset
-from aeon.datatypes._panel._check import (
+from aeon.datatypes._collection._check import (
     are_columns_nested,
     check_nplist_panel,
     is_nested_dataframe,
 )
-from aeon.datatypes._panel._convert import (
+from aeon.datatypes._collection._convert import (
     from_2d_array_to_nested,
     from_3d_numpy_to_2d_array,
     from_3d_numpy_to_multi_index,
diff --git a/aeon/datatypes/tests/test_series_to_panel_converters.py b/aeon/datatypes/tests/test_series_to_panel_converters.py
index 152d5ea5ec..5181900b82 100644
--- a/aeon/datatypes/tests/test_series_to_panel_converters.py
+++ b/aeon/datatypes/tests/test_series_to_panel_converters.py
@@ -4,7 +4,7 @@
 import numpy as np
 import pandas as pd
 
-from aeon.datatypes._panel._convert import from_3d_numpy_to_multi_index
+from aeon.datatypes._collection._convert import from_3d_numpy_to_multi_index
 from aeon.datatypes._series_as_panel import (
     convert_Panel_to_Series,
     convert_Series_to_Panel,
diff --git a/aeon/forecasting/base/tests/test_base.py b/aeon/forecasting/base/tests/test_base.py
index bb1ae83a6a..bd98210fd7 100644
--- a/aeon/forecasting/base/tests/test_base.py
+++ b/aeon/forecasting/base/tests/test_base.py
@@ -12,7 +12,7 @@
 from pandas.testing import assert_series_equal
 
 from aeon.datatypes import check_is_mtype, convert
-from aeon.datatypes._panel._convert import from_nested_to_multi_index
+from aeon.datatypes._collection._convert import from_nested_to_multi_index
 from aeon.datatypes._utilities import get_cutoff, get_window
 from aeon.forecasting.arima import ARIMA
 from aeon.utils._testing.collection import make_3d_test_data, make_nested_dataframe_data
diff --git a/aeon/transformations/collection/segment.py b/aeon/transformations/collection/segment.py
index a6af5b87b1..6757b3ce73 100644
--- a/aeon/transformations/collection/segment.py
+++ b/aeon/transformations/collection/segment.py
@@ -6,7 +6,7 @@
 import pandas as pd
 from sklearn.utils import check_random_state
 
-from aeon.datatypes._panel._convert import _concat_nested_arrays, _get_time_index
+from aeon.datatypes._collection._convert import _concat_nested_arrays, _get_time_index
 from aeon.transformations.base import BaseTransformer
 from aeon.utils.validation import check_window_length
 
diff --git a/aeon/transformations/collection/tsfresh.py b/aeon/transformations/collection/tsfresh.py
index 21146b4e48..f47a02a2bd 100644
--- a/aeon/transformations/collection/tsfresh.py
+++ b/aeon/transformations/collection/tsfresh.py
@@ -5,7 +5,7 @@
 __author__ = ["AyushmaanSeth", "mloning", "Alwin Wang", "MatthewMiddlehurst"]
 __all__ = ["TSFreshFeatureExtractor", "TSFreshRelevantFeatureExtractor"]
 
-from aeon.datatypes._panel._convert import from_3d_numpy_to_long
+from aeon.datatypes._collection._convert import from_3d_numpy_to_long
 from aeon.transformations.collection.base import BaseCollectionTransformer
 from aeon.utils.validation import check_n_jobs
 from aeon.utils.validation._dependencies import _check_soft_dependencies
diff --git a/aeon/utils/_testing/estimator_checks.py b/aeon/utils/_testing/estimator_checks.py
index 6b4c5b1b76..6a4021f24d 100644
--- a/aeon/utils/_testing/estimator_checks.py
+++ b/aeon/utils/_testing/estimator_checks.py
@@ -17,7 +17,7 @@
 from aeon.classification.base import BaseClassifier
 from aeon.classification.early_classification import BaseEarlyClassifier
 from aeon.clustering.base import BaseClusterer
-from aeon.datatypes._panel._check import is_nested_dataframe
+from aeon.datatypes._collection._check import is_nested_dataframe
 from aeon.forecasting.base import BaseForecaster
 from aeon.regression.base import BaseRegressor
 from aeon.tests._config import VALID_ESTIMATOR_TYPES
diff --git a/aeon/utils/validation/panel.py b/aeon/utils/validation/panel.py
index 8799b3d063..71c848e703 100644
--- a/aeon/utils/validation/panel.py
+++ b/aeon/utils/validation/panel.py
@@ -12,8 +12,8 @@
 import pandas as pd
 from sklearn.utils.validation import check_consistent_length
 
-from aeon.datatypes._panel._check import is_nested_dataframe
-from aeon.datatypes._panel._convert import (
+from aeon.datatypes._collection._check import is_nested_dataframe
+from aeon.datatypes._collection._convert import (
     from_3d_numpy_to_nested,
     from_nested_to_3d_numpy,
 )
diff --git a/examples/datasets/data_conversions.ipynb b/examples/datasets/data_conversions.ipynb
index 225eaf9d47..2276c1a4a3 100644
--- a/examples/datasets/data_conversions.ipynb
+++ b/examples/datasets/data_conversions.ipynb
@@ -26,20 +26,23 @@
     "\n",
     "Single time series can be stored in the following data structures\n",
     "\n",
-    "pd.Series: a univariate time series\n",
-    "pd.DataFrame: a univariate or multivariate time series\n",
-    "np.ndarray: 2D numpy.ndarray of shape `(n_timepoints, n_channels)`.\n",
-    "xr.DataArray: a univariate or multivariate time series\n",
-    "dask_series: Dask DataFrame: a univariate or multivariate time series\n",
+    "- \"pd.Series\": Pandas Series storing a univariate time series\n",
+    "- \"pd.DataFrame\": Pandas DataFrame storing a univariate or multivariate time series\n",
+    "- \"np.ndarray\": numpy 2d array for series of shape `(n_timepoints, n_channels)`.\n",
+    "- \"xr.DataArray\": xarray DataArray a for a univariate or multivariate time series\n",
+    "- \"dask_series\": Dask DataFrame for a univariate or multivariate time series\n",
     "\n",
-    "NOTE the 2D numpy array representation is not consistent with that used in\n",
+    "The above strings are used to internally specify each different data structure. NOTE the\n",
+    " 2D numpy array representation is not consistent with that used in\n",
     "collections. This is an unfortunate difference that is a result of legacy design and\n",
-    "norms in different research fields. We recommend not using numpy arrays with\n",
-    "forecasting.\n",
+    "norms in different research fields.\n",
     "\n",
-    "Conversion to and from these data structures is fairly straightforward. `aeon` contains\n",
-    "converters that are part of the legacy code base. There is a wrapper to hide all this\n",
-    " code, but we also show under the hood. This code is not likely to be maintained."
+    "Conversion to and from these data structures is fairly straightforward, but we\n",
+    "provide tools to help. `aeon` contains converters that are wrapped by the method\n",
+    "`convert`. This method will attempt to convert from one of the five types to another,\n",
+    " and raise an exception if the conversion is invalid (e.g. if the object is not in\n",
+    " fact of type \"from_type\"). Note that series estimators will attempt to automatically\n",
+    "  perform this conversion to the specified internal type of that estimator."
    ],
    "metadata": {
     "collapsed": false
@@ -47,13 +50,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 27,
    "outputs": [
     {
      "data": {
       "text/plain": "xarray.core.dataarray.DataArray"
      },
-     "execution_count": 8,
+     "execution_count": 27,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -74,9 +77,8 @@
   {
    "cell_type": "markdown",
    "source": [
-    "All the actual converter functions for series are in the following file `aeon.datatypes._series._convert`. We stress,\n",
-    "this is legacy code. `aeon` thinks it better the user is responsible for getting the\n",
-    "data into the best format for the estimators."
+    "the method `convert` wraps actual converter functions in the file `aeon.datatypes\n",
+    "._series._convert`. Some examples below"
    ],
    "metadata": {
     "collapsed": false
@@ -84,13 +86,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 28,
    "outputs": [
     {
      "data": {
       "text/plain": "pandas.core.frame.DataFrame"
      },
-     "execution_count": 9,
+     "execution_count": 28,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -111,13 +113,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 29,
    "outputs": [
     {
      "data": {
       "text/plain": "dask.dataframe.core.DataFrame"
      },
-     "execution_count": 10,
+     "execution_count": 29,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -132,13 +134,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 30,
    "outputs": [
     {
      "data": {
       "text/plain": "xarray.core.dataarray.DataArray"
      },
-     "execution_count": 11,
+     "execution_count": 30,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -151,11 +153,82 @@
     "collapsed": false
    }
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Collections Converters\n",
+    "\n",
+    "Previously, collections of time series were called panels (a term from econometrics,\n",
+    "not machine learning), and there are still references to panel. The main\n",
+    "data structures for storing collections are as follows\n",
+    "\n",
+    "- \"numpy3D\": 3D np.ndarray of format `(n_cases, n_channels, n_timepoints)`\n",
+    "- \"np-list\": python list of 2D numpy array of length `[n_cases]`, each of shape\n",
+    "`(n_channels, n_timepoints_i)`\n",
+    "- \"df-list\": python list of 2D pd.DataFrames of length `[n_cases]`, each a of shape\n",
+    "`(n_timepoints_i, n_channels)`\n",
+    "- \"numpyflat\": 2D np.ndarray of format `(n_cases, n_channels*n_timepoints)`\n",
+    "\n",
+    "Other supported types which may be useful in forecasting are\n",
+    "\n",
+    "- \"nested_univ\": a pd.DataFrame of shape `(n_cases, n_channels)` where each cell is a\n",
+    " pd.Series of length `(n_timepoints)`\n",
+    " - \"pd-multiindex\": pd.DataFrame with multi-index `(cases, timepoints)`\n",
+    " - \"pd-wide\": pd.DataFrame in wide format, `cols = (instance*timepoints)`\n",
+    " - \"dask_panel\": dask frame with one instance and one time index\n",
+    "\n",
+    "As with series, conversion is performed with the method `convert` and auto conversion\n",
+    " happens in estimator base classes. These wrap methods in `aeon.datatypes\n",
+    "._collection._convert`"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
   {
    "cell_type": "code",
-   "execution_count": 11,
-   "outputs": [],
-   "source": [],
+   "execution_count": 35,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " Type = <class 'list'>, type first <class 'numpy.ndarray'> shape first (3, 100)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 10 multivariate time series with 3 channels of length 100 in \"numpy3D\" format\n",
+    "multi = np.random.random(size=(10, 3, 100))\n",
+    "np_list = convert(multi, from_type=\"numpy3D\", to_type=\"np-list\")\n",
+    "print(\n",
+    "    f\" Type = {type(np_list)}, type first {type(np_list[0])} shape first \"\n",
+    "    f\"{np_list[0].shape}\"\n",
+    ")"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " Type = <class 'list'>, type first <class 'pandas.core.frame.DataFrame'> shape first (100, 3)\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_list = convert(multi, from_type=\"numpy3D\", to_type=\"df-list\")\n",
+    "print(\n",
+    "    f\" Type = {type(df_list)}, type first {type(df_list[0])} shape first \"\n",
+    "    f\"{df_list[0].shape}\"\n",
+    ")"
+   ],
    "metadata": {
     "collapsed": false
    }
@@ -163,51 +236,56 @@
   {
    "cell_type": "markdown",
    "source": [
-    "# Collections Converters\n",
-    "\n",
-    "Previously, collections of time series were called panels (a term from econometrics,\n",
-    "not machine learning), and there are still references to panel. Collections can be\n",
-    "stored as follows\n",
-    "\n",
-    "numpy3D: 3D np.array of format (n_instances, n_channels, n_timepoints)\n",
-    "np-list:\n",
-    "\n",
+    "Note again the difference in storage convention: series in 2D numpy are stored in `\n",
+    "(n_channels, n_timepoints)`, whereas with dataframes, they are in shape `\n",
+    "(n_timepoints, n_channels)`. We know this is confusing, and are thinking about the\n",
+    "best way of reconciling this distinction. See [this issue](https://github\n",
+    ".com/aeon-toolkit/aeon/issues/537). The actual converter functions are here\n"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " Type = <class 'pandas.core.frame.DataFrame'>,shape (3000, 4)\n"
+     ]
+    }
+   ],
+   "source": [
+    "from aeon.datatypes._collection._convert import (\n",
+    "    from_3d_numpy_to_long,\n",
+    "    from_3d_numpy_to_multi_index,\n",
+    ")\n",
     "\n",
-    "MTYPE_REGISTER_PANEL = [\n",
-    "    (\n",
-    "        \"nested_univ\",\n",
-    "        \"Panel\",\n",
-    "        \"pd.DataFrame with one column per channel, pd.Series in cells\",\n",
-    "    ),\n",
-    "    (\n",
-    "        \"numpy3D\",\n",
-    "        \"Panel\",\n",
-    "        \"3D np.array of format (n_instances, n_channels, n_timepoints)\",\n",
-    "    ),\n",
-    "    (\n",
-    "        \"numpyflat\",\n",
-    "        \"Panel\",\n",
-    "        \"2D np.array of format (n_instances, n_columns*n_timepoints)\",\n",
-    "    ),\n",
-    "    (\"pd-multiindex\", \"Panel\", \"pd.DataFrame with multi-index (instances, timepoints)\"),\n",
-    "    (\"pd-wide\", \"Panel\", \"pd.DataFrame in wide format, cols = (instance*timepoints)\"),\n",
-    "    (\n",
-    "        \"pd-long\",\n",
-    "        \"Panel\",\n",
-    "        \"pd.DataFrame in long format, cols = (index, time_index, column)\",\n",
-    "    ),\n",
-    "    (\"df-list\", \"Panel\", \"list of pd.DataFrame\"),\n",
-    "    (\n",
-    "        \"dask_panel\",\n",
-    "        \"Panel\",\n",
-    "        \"dask frame with one instance and one time index, as per dask_to_pd convention\",\n",
-    "    ),\n",
-    "    (\n",
-    "        \"np-list\",\n",
-    "        \"Panel\",\n",
-    "        \"list of n_cases, each case a 2D np.array of shape (n_channels, series_length)\",\n",
-    "    ),\n",
-    "]\n"
+    "long = from_3d_numpy_to_long(multi)\n",
+    "print(f\" Type = {type(long)},shape {long.shape}\")"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " Type = <class 'pandas.core.frame.DataFrame'>,shape (1000, 3)\n"
+     ]
+    }
+   ],
+   "source": [
+    "mi = from_3d_numpy_to_multi_index(multi)\n",
+    "print(f\" Type = {type(mi)},shape {mi.shape}\")"
    ],
    "metadata": {
     "collapsed": false

From 8821a6e0e76f18621cc4e2616a27f9fa2ab83b63 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Wed, 5 Jul 2023 10:09:15 +0100
Subject: [PATCH 03/14] revert collection to panel to find circular import

---
 aeon/classification/tests/test_base.py        |  2 +-
 aeon/datasets/_dataframe_loaders.py           |  2 +-
 aeon/datatypes/_check.py                      |  2 +-
 aeon/datatypes/_convert.py                    |  2 +-
 aeon/datatypes/_examples.py                   | 10 ++---
 aeon/datatypes/_hierarchical/_check.py        |  2 +-
 .../{_collection => _panel}/__init__.py       | 12 +++---
 .../{_collection => _panel}/_check.py         |  0
 .../{_collection => _panel}/_convert.py       |  2 +-
 .../{_collection => _panel}/_examples.py      |  0
 .../{_collection => _panel}/_registry.py      |  0
 aeon/datatypes/_registry.py                   | 10 ++---
 aeon/datatypes/tests/test_panel_converters.py |  4 +-
 .../tests/test_series_to_panel_converters.py  |  2 +-
 aeon/forecasting/base/tests/test_base.py      |  2 +-
 aeon/transformations/collection/segment.py    |  2 +-
 aeon/transformations/collection/tsfresh.py    |  2 +-
 aeon/utils/_testing/estimator_checks.py       |  2 +-
 aeon/utils/validation/panel.py                |  4 +-
 examples/AA_datatypes_and_datasets.ipynb      | 39 +++++++++++--------
 20 files changed, 54 insertions(+), 47 deletions(-)
 rename aeon/datatypes/{_collection => _panel}/__init__.py (51%)
 rename aeon/datatypes/{_collection => _panel}/_check.py (100%)
 rename aeon/datatypes/{_collection => _panel}/_convert.py (99%)
 rename aeon/datatypes/{_collection => _panel}/_examples.py (100%)
 rename aeon/datatypes/{_collection => _panel}/_registry.py (100%)

diff --git a/aeon/classification/tests/test_base.py b/aeon/classification/tests/test_base.py
index c2fa94cd2e..b90bcbccd7 100644
--- a/aeon/classification/tests/test_base.py
+++ b/aeon/classification/tests/test_base.py
@@ -9,7 +9,7 @@
 
 from aeon.classification import DummyClassifier
 from aeon.classification.base import BaseClassifier
-from aeon.datatypes._collection._convert import (
+from aeon.datatypes._panel._convert import (
     from_nested_to_dflist_adp,
     from_nested_to_multi_index,
 )
diff --git a/aeon/datasets/_dataframe_loaders.py b/aeon/datasets/_dataframe_loaders.py
index e83424d6a1..dbd5483620 100644
--- a/aeon/datasets/_dataframe_loaders.py
+++ b/aeon/datasets/_dataframe_loaders.py
@@ -24,7 +24,7 @@
 
 from aeon.datasets._data_generators import _convert_tsf_to_hierarchical
 from aeon.datatypes import MTYPE_LIST_HIERARCHICAL, convert
-from aeon.datatypes._collection._convert import from_long_to_nested
+from aeon.datatypes._panel._convert import from_long_to_nested
 
 DIRNAME = "data"
 MODULE = os.path.dirname(__file__)
diff --git a/aeon/datatypes/_check.py b/aeon/datatypes/_check.py
index 23f600dbec..3861b640a0 100644
--- a/aeon/datatypes/_check.py
+++ b/aeon/datatypes/_check.py
@@ -29,8 +29,8 @@
 import numpy as np
 
 from aeon.datatypes._alignment import check_dict_Alignment
-from aeon.datatypes._collection import check_dict_Panel
 from aeon.datatypes._hierarchical import check_dict_Hierarchical
+from aeon.datatypes._panel import check_dict_Panel
 from aeon.datatypes._proba import check_dict_Proba
 from aeon.datatypes._registry import AMBIGUOUS_MTYPES, SCITYPE_LIST, mtype_to_scitype
 from aeon.datatypes._series import check_dict_Series
diff --git a/aeon/datatypes/_convert.py b/aeon/datatypes/_convert.py
index 4afa17ce81..c06b5d4965 100644
--- a/aeon/datatypes/_convert.py
+++ b/aeon/datatypes/_convert.py
@@ -71,8 +71,8 @@
 import pandas as pd
 
 from aeon.datatypes._check import mtype as infer_mtype
-from aeon.datatypes._collection import convert_dict_Panel
 from aeon.datatypes._hierarchical import convert_dict_Hierarchical
+from aeon.datatypes._panel import convert_dict_Panel
 from aeon.datatypes._proba import convert_dict_Proba
 from aeon.datatypes._registry import AMBIGUOUS_MTYPES, mtype_to_scitype
 from aeon.datatypes._series import convert_dict_Series
diff --git a/aeon/datatypes/_examples.py b/aeon/datatypes/_examples.py
index d860d1f26c..be06bc52f9 100644
--- a/aeon/datatypes/_examples.py
+++ b/aeon/datatypes/_examples.py
@@ -23,16 +23,16 @@
 ]
 
 from aeon.datatypes._alignment import example_dict_Alignment
-from aeon.datatypes._collection import (
-    example_dict_lossy_Panel,
-    example_dict_metadata_Panel,
-    example_dict_Panel,
-)
 from aeon.datatypes._hierarchical import (
     example_dict_Hierarchical,
     example_dict_lossy_Hierarchical,
     example_dict_metadata_Hierarchical,
 )
+from aeon.datatypes._panel import (
+    example_dict_lossy_Panel,
+    example_dict_metadata_Panel,
+    example_dict_Panel,
+)
 from aeon.datatypes._proba import (
     example_dict_lossy_Proba,
     example_dict_metadata_Proba,
diff --git a/aeon/datatypes/_hierarchical/_check.py b/aeon/datatypes/_hierarchical/_check.py
index 4c29115a2e..1181180dfe 100644
--- a/aeon/datatypes/_hierarchical/_check.py
+++ b/aeon/datatypes/_hierarchical/_check.py
@@ -44,7 +44,7 @@
 
 import numpy as np
 
-from aeon.datatypes._collection._check import check_pdmultiindex_panel
+from aeon.datatypes._panel._check import check_pdmultiindex_panel
 from aeon.utils.validation._dependencies import _check_soft_dependencies
 
 
diff --git a/aeon/datatypes/_collection/__init__.py b/aeon/datatypes/_panel/__init__.py
similarity index 51%
rename from aeon/datatypes/_collection/__init__.py
rename to aeon/datatypes/_panel/__init__.py
index 02aa9e0c06..bab0affda4 100644
--- a/aeon/datatypes/_collection/__init__.py
+++ b/aeon/datatypes/_panel/__init__.py
@@ -1,16 +1,16 @@
 # -*- coding: utf-8 -*-
 """Module exports: Panel type checkers, converters and mtype inference."""
 
-from aeon.datatypes._collection._check import check_dict as check_dict_Panel
-from aeon.datatypes._collection._convert import convert_dict as convert_dict_Panel
-from aeon.datatypes._collection._examples import example_dict as example_dict_Panel
-from aeon.datatypes._collection._examples import (
+from aeon.datatypes._panel._check import check_dict as check_dict_Panel
+from aeon.datatypes._panel._convert import convert_dict as convert_dict_Panel
+from aeon.datatypes._panel._examples import example_dict as example_dict_Panel
+from aeon.datatypes._panel._examples import (
     example_dict_lossy as example_dict_lossy_Panel,
 )
-from aeon.datatypes._collection._examples import (
+from aeon.datatypes._panel._examples import (
     example_dict_metadata as example_dict_metadata_Panel,
 )
-from aeon.datatypes._collection._registry import MTYPE_LIST_PANEL, MTYPE_REGISTER_PANEL
+from aeon.datatypes._panel._registry import MTYPE_LIST_PANEL, MTYPE_REGISTER_PANEL
 
 __all__ = [
     "check_dict_Panel",
diff --git a/aeon/datatypes/_collection/_check.py b/aeon/datatypes/_panel/_check.py
similarity index 100%
rename from aeon/datatypes/_collection/_check.py
rename to aeon/datatypes/_panel/_check.py
diff --git a/aeon/datatypes/_collection/_convert.py b/aeon/datatypes/_panel/_convert.py
similarity index 99%
rename from aeon/datatypes/_collection/_convert.py
rename to aeon/datatypes/_panel/_convert.py
index 4b5d4fdd9a..fe3b2f1ee5 100644
--- a/aeon/datatypes/_collection/_convert.py
+++ b/aeon/datatypes/_panel/_convert.py
@@ -34,8 +34,8 @@
     "convert_dict",
 ]
 
-from aeon.datatypes._collection._registry import MTYPE_LIST_PANEL
 from aeon.datatypes._convert_utils._convert import _extend_conversions
+from aeon.datatypes._panel._registry import MTYPE_LIST_PANEL
 from aeon.utils.validation._dependencies import _check_soft_dependencies
 
 # dictionary indexed by triples of types
diff --git a/aeon/datatypes/_collection/_examples.py b/aeon/datatypes/_panel/_examples.py
similarity index 100%
rename from aeon/datatypes/_collection/_examples.py
rename to aeon/datatypes/_panel/_examples.py
diff --git a/aeon/datatypes/_collection/_registry.py b/aeon/datatypes/_panel/_registry.py
similarity index 100%
rename from aeon/datatypes/_collection/_registry.py
rename to aeon/datatypes/_panel/_registry.py
diff --git a/aeon/datatypes/_registry.py b/aeon/datatypes/_registry.py
index d13a63e240..57fcc2d8cf 100644
--- a/aeon/datatypes/_registry.py
+++ b/aeon/datatypes/_registry.py
@@ -43,16 +43,16 @@
     MTYPE_LIST_ALIGNMENT,
     MTYPE_REGISTER_ALIGNMENT,
 )
-from aeon.datatypes._collection._registry import (
-    MTYPE_LIST_PANEL,
-    MTYPE_REGISTER_PANEL,
-    MTYPE_SOFT_DEPS_PANEL,
-)
 from aeon.datatypes._hierarchical._registry import (
     MTYPE_LIST_HIERARCHICAL,
     MTYPE_REGISTER_HIERARCHICAL,
     MTYPE_SOFT_DEPS_HIERARCHICAL,
 )
+from aeon.datatypes._panel._registry import (
+    MTYPE_LIST_PANEL,
+    MTYPE_REGISTER_PANEL,
+    MTYPE_SOFT_DEPS_PANEL,
+)
 from aeon.datatypes._proba._registry import MTYPE_LIST_PROBA, MTYPE_REGISTER_PROBA
 from aeon.datatypes._series._registry import (
     MTYPE_LIST_SERIES,
diff --git a/aeon/datatypes/tests/test_panel_converters.py b/aeon/datatypes/tests/test_panel_converters.py
index 7a0737b368..adc68997c0 100644
--- a/aeon/datatypes/tests/test_panel_converters.py
+++ b/aeon/datatypes/tests/test_panel_converters.py
@@ -6,12 +6,12 @@
 
 from aeon.datasets import make_example_long_table, make_example_multi_index_dataframe
 from aeon.datatypes._adapter import convert_from_multiindex_to_listdataset
-from aeon.datatypes._collection._check import (
+from aeon.datatypes._panel._check import (
     are_columns_nested,
     check_nplist_panel,
     is_nested_dataframe,
 )
-from aeon.datatypes._collection._convert import (
+from aeon.datatypes._panel._convert import (
     from_2d_array_to_nested,
     from_3d_numpy_to_2d_array,
     from_3d_numpy_to_multi_index,
diff --git a/aeon/datatypes/tests/test_series_to_panel_converters.py b/aeon/datatypes/tests/test_series_to_panel_converters.py
index 5181900b82..152d5ea5ec 100644
--- a/aeon/datatypes/tests/test_series_to_panel_converters.py
+++ b/aeon/datatypes/tests/test_series_to_panel_converters.py
@@ -4,7 +4,7 @@
 import numpy as np
 import pandas as pd
 
-from aeon.datatypes._collection._convert import from_3d_numpy_to_multi_index
+from aeon.datatypes._panel._convert import from_3d_numpy_to_multi_index
 from aeon.datatypes._series_as_panel import (
     convert_Panel_to_Series,
     convert_Series_to_Panel,
diff --git a/aeon/forecasting/base/tests/test_base.py b/aeon/forecasting/base/tests/test_base.py
index bd98210fd7..bb1ae83a6a 100644
--- a/aeon/forecasting/base/tests/test_base.py
+++ b/aeon/forecasting/base/tests/test_base.py
@@ -12,7 +12,7 @@
 from pandas.testing import assert_series_equal
 
 from aeon.datatypes import check_is_mtype, convert
-from aeon.datatypes._collection._convert import from_nested_to_multi_index
+from aeon.datatypes._panel._convert import from_nested_to_multi_index
 from aeon.datatypes._utilities import get_cutoff, get_window
 from aeon.forecasting.arima import ARIMA
 from aeon.utils._testing.collection import make_3d_test_data, make_nested_dataframe_data
diff --git a/aeon/transformations/collection/segment.py b/aeon/transformations/collection/segment.py
index 6757b3ce73..a6af5b87b1 100644
--- a/aeon/transformations/collection/segment.py
+++ b/aeon/transformations/collection/segment.py
@@ -6,7 +6,7 @@
 import pandas as pd
 from sklearn.utils import check_random_state
 
-from aeon.datatypes._collection._convert import _concat_nested_arrays, _get_time_index
+from aeon.datatypes._panel._convert import _concat_nested_arrays, _get_time_index
 from aeon.transformations.base import BaseTransformer
 from aeon.utils.validation import check_window_length
 
diff --git a/aeon/transformations/collection/tsfresh.py b/aeon/transformations/collection/tsfresh.py
index f47a02a2bd..21146b4e48 100644
--- a/aeon/transformations/collection/tsfresh.py
+++ b/aeon/transformations/collection/tsfresh.py
@@ -5,7 +5,7 @@
 __author__ = ["AyushmaanSeth", "mloning", "Alwin Wang", "MatthewMiddlehurst"]
 __all__ = ["TSFreshFeatureExtractor", "TSFreshRelevantFeatureExtractor"]
 
-from aeon.datatypes._collection._convert import from_3d_numpy_to_long
+from aeon.datatypes._panel._convert import from_3d_numpy_to_long
 from aeon.transformations.collection.base import BaseCollectionTransformer
 from aeon.utils.validation import check_n_jobs
 from aeon.utils.validation._dependencies import _check_soft_dependencies
diff --git a/aeon/utils/_testing/estimator_checks.py b/aeon/utils/_testing/estimator_checks.py
index 6a4021f24d..6b4c5b1b76 100644
--- a/aeon/utils/_testing/estimator_checks.py
+++ b/aeon/utils/_testing/estimator_checks.py
@@ -17,7 +17,7 @@
 from aeon.classification.base import BaseClassifier
 from aeon.classification.early_classification import BaseEarlyClassifier
 from aeon.clustering.base import BaseClusterer
-from aeon.datatypes._collection._check import is_nested_dataframe
+from aeon.datatypes._panel._check import is_nested_dataframe
 from aeon.forecasting.base import BaseForecaster
 from aeon.regression.base import BaseRegressor
 from aeon.tests._config import VALID_ESTIMATOR_TYPES
diff --git a/aeon/utils/validation/panel.py b/aeon/utils/validation/panel.py
index 71c848e703..8799b3d063 100644
--- a/aeon/utils/validation/panel.py
+++ b/aeon/utils/validation/panel.py
@@ -12,8 +12,8 @@
 import pandas as pd
 from sklearn.utils.validation import check_consistent_length
 
-from aeon.datatypes._collection._check import is_nested_dataframe
-from aeon.datatypes._collection._convert import (
+from aeon.datatypes._panel._check import is_nested_dataframe
+from aeon.datatypes._panel._convert import (
     from_3d_numpy_to_nested,
     from_nested_to_3d_numpy,
 )
diff --git a/examples/AA_datatypes_and_datasets.ipynb b/examples/AA_datatypes_and_datasets.ipynb
index 0bbec22f95..34ba963c25 100644
--- a/examples/AA_datatypes_and_datasets.ipynb
+++ b/examples/AA_datatypes_and_datasets.ipynb
@@ -15,7 +15,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -87,9 +87,26 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "ename": "ImportError",
+     "evalue": "cannot import name 'MTYPE_LIST_SERIES' from partially initialized module 'aeon.datatypes._registry' (most likely due to a circular import) (C:\\Code\\aeon\\aeon\\datatypes\\_registry.py)",
+     "output_type": "error",
+     "traceback": [
+      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
+      "\u001B[1;31mImportError\u001B[0m                               Traceback (most recent call last)",
+      "Cell \u001B[1;32mIn[5], line 2\u001B[0m\n\u001B[0;32m      1\u001B[0m \u001B[38;5;66;03m# import to retrieve examples\u001B[39;00m\n\u001B[1;32m----> 2\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m get_examples\n",
+      "File \u001B[1;32mC:\\Code\\aeon\\aeon\\datatypes\\__init__.py:6\u001B[0m\n\u001B[0;32m      2\u001B[0m \u001B[38;5;124;03m\"\"\"Module exports: data type definitions, checks, validation, fixtures, converters.\"\"\"\u001B[39;00m\n\u001B[0;32m      4\u001B[0m __author__ \u001B[38;5;241m=\u001B[39m [\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mfkiraly\u001B[39m\u001B[38;5;124m\"\u001B[39m]\n\u001B[1;32m----> 6\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_check\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m (\n\u001B[0;32m      7\u001B[0m     check_is_mtype,\n\u001B[0;32m      8\u001B[0m     check_is_scitype,\n\u001B[0;32m      9\u001B[0m     check_raise,\n\u001B[0;32m     10\u001B[0m     mtype,\n\u001B[0;32m     11\u001B[0m     scitype,\n\u001B[0;32m     12\u001B[0m )\n\u001B[0;32m     13\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_convert\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m convert, convert_to\n\u001B[0;32m     14\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_examples\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m get_examples\n",
+      "File \u001B[1;32mC:\\Code\\aeon\\aeon\\datatypes\\_check.py:35\u001B[0m\n\u001B[0;32m     33\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_hierarchical\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m check_dict_Hierarchical\n\u001B[0;32m     34\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_proba\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m check_dict_Proba\n\u001B[1;32m---> 35\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m AMBIGUOUS_MTYPES, SCITYPE_LIST, mtype_to_scitype\n\u001B[0;32m     36\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m check_dict_Series\n\u001B[0;32m     37\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_table\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m check_dict_Table\n",
+      "File \u001B[1;32mC:\\Code\\aeon\\aeon\\datatypes\\_registry.py:57\u001B[0m\n\u001B[0;32m     51\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_hierarchical\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m (\n\u001B[0;32m     52\u001B[0m     MTYPE_LIST_HIERARCHICAL,\n\u001B[0;32m     53\u001B[0m     MTYPE_REGISTER_HIERARCHICAL,\n\u001B[0;32m     54\u001B[0m     MTYPE_SOFT_DEPS_HIERARCHICAL,\n\u001B[0;32m     55\u001B[0m )\n\u001B[0;32m     56\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_proba\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m MTYPE_LIST_PROBA, MTYPE_REGISTER_PROBA\n\u001B[1;32m---> 57\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m (\n\u001B[0;32m     58\u001B[0m     MTYPE_LIST_SERIES,\n\u001B[0;32m     59\u001B[0m     MTYPE_REGISTER_SERIES,\n\u001B[0;32m     60\u001B[0m     MTYPE_SOFT_DEPS_SERIES,\n\u001B[0;32m     61\u001B[0m )\n\u001B[0;32m     62\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_table\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m MTYPE_LIST_TABLE, MTYPE_REGISTER_TABLE\n\u001B[0;32m     64\u001B[0m MTYPE_REGISTER \u001B[38;5;241m=\u001B[39m []\n",
+      "File \u001B[1;32mC:\\Code\\aeon\\aeon\\datatypes\\_series\\__init__.py:5\u001B[0m\n\u001B[0;32m      2\u001B[0m \u001B[38;5;124;03m\"\"\"Module exports: Series type checkers, converters and mtype inference.\"\"\"\u001B[39;00m\n\u001B[0;32m      4\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_check\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m check_dict \u001B[38;5;28;01mas\u001B[39;00m check_dict_Series\n\u001B[1;32m----> 5\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_convert\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m convert_dict \u001B[38;5;28;01mas\u001B[39;00m convert_dict_Series\n\u001B[0;32m      6\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_examples\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m example_dict \u001B[38;5;28;01mas\u001B[39;00m example_dict_Series\n\u001B[0;32m      7\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_series\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_examples\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m (\n\u001B[0;32m      8\u001B[0m     example_dict_lossy \u001B[38;5;28;01mas\u001B[39;00m example_dict_lossy_Series,\n\u001B[0;32m      9\u001B[0m )\n",
+      "File \u001B[1;32mC:\\Code\\aeon\\aeon\\datatypes\\_series\\_convert.py:41\u001B[0m\n\u001B[0;32m     37\u001B[0m \u001B[38;5;66;03m##############################################################\u001B[39;00m\n\u001B[0;32m     38\u001B[0m \u001B[38;5;66;03m# methods to convert one machine type to another machine type\u001B[39;00m\n\u001B[0;32m     39\u001B[0m \u001B[38;5;66;03m##############################################################\u001B[39;00m\n\u001B[0;32m     40\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_convert_utils\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_convert\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m _extend_conversions\n\u001B[1;32m---> 41\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mdatatypes\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_registry\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m MTYPE_LIST_SERIES\n\u001B[0;32m     42\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mutils\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mvalidation\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01m_dependencies\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m _check_soft_dependencies\n\u001B[0;32m     44\u001B[0m convert_dict \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mdict\u001B[39m()\n",
+      "\u001B[1;31mImportError\u001B[0m: cannot import name 'MTYPE_LIST_SERIES' from partially initialized module 'aeon.datatypes._registry' (most likely due to a circular import) (C:\\Code\\aeon\\aeon\\datatypes\\_registry.py)"
+     ]
+    }
+   ],
    "source": [
     "# import to retrieve examples\n",
     "from aeon.datatypes import get_examples"
@@ -748,23 +765,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "              dask_series  np.ndarray  pd.DataFrame  pd.Series  xr.DataArray\ndask_series             1           1             1          1             1\nnp.ndarray              1           1             1          1             1\npd.DataFrame            1           1             1          1             1\npd.Series               1           1             1          1             1\nxr.DataArray            1           1             1          1             1",
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>dask_series</th>\n      <th>np.ndarray</th>\n      <th>pd.DataFrame</th>\n      <th>pd.Series</th>\n      <th>xr.DataArray</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>dask_series</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>np.ndarray</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>pd.DataFrame</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>pd.Series</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>xr.DataArray</th>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "from aeon.datatypes._convert import _conversions_defined\n",
     "\n",
-    "_conversions_defined(scitype=\"Series\")"
+    "_conversions_defined(scitype=\"Panel\")"
    ]
   },
   {

From 77f37ca2f915db3b1900d30e8b328557c181ee4a Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Wed, 5 Jul 2023 10:18:00 +0100
Subject: [PATCH 04/14] revert notebook to _panel

---
 examples/datasets/data_conversions.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/datasets/data_conversions.ipynb b/examples/datasets/data_conversions.ipynb
index 2276c1a4a3..a228a7affd 100644
--- a/examples/datasets/data_conversions.ipynb
+++ b/examples/datasets/data_conversions.ipynb
@@ -179,7 +179,7 @@
     "\n",
     "As with series, conversion is performed with the method `convert` and auto conversion\n",
     " happens in estimator base classes. These wrap methods in `aeon.datatypes\n",
-    "._collection._convert`"
+    "._panel._convert`"
    ],
    "metadata": {
     "collapsed": false
@@ -259,7 +259,7 @@
     }
    ],
    "source": [
-    "from aeon.datatypes._collection._convert import (\n",
+    "from aeon.datatypes._panel._convert import (\n",
     "    from_3d_numpy_to_long,\n",
     "    from_3d_numpy_to_multi_index,\n",
     ")\n",

From c24b6ae9bc4c83e218179fd408e6fcbc3f516907 Mon Sep 17 00:00:00 2001
From: chrisholder <chrisholder987@hotmail.com>
Date: Wed, 5 Jul 2023 16:44:16 +0100
Subject: [PATCH 05/14] removed isinstance

---
 aeon/distances/_distance.py                   | 16 +++-
 aeon/distances/_erp.py                        | 95 +++++++++++--------
 .../tests/test_numba_distance_parameters.py   |  4 +-
 setup.cfg                                     | 18 ++--
 4 files changed, 79 insertions(+), 54 deletions(-)

diff --git a/aeon/distances/_distance.py b/aeon/distances/_distance.py
index c2da8f19ff..6674a685b3 100644
--- a/aeon/distances/_distance.py
+++ b/aeon/distances/_distance.py
@@ -132,7 +132,9 @@ def distance(
     elif metric == "lcss":
         return lcss_distance(x, y, kwargs.get("window"), kwargs.get("epsilon", 1.0))
     elif metric == "erp":
-        return erp_distance(x, y, kwargs.get("window"), kwargs.get("g", 0.0))
+        return erp_distance(
+            x, y, kwargs.get("window"), kwargs.get("g", 0.0), kwargs.get("g_arr", None)
+        )
     elif metric == "edr":
         return edr_distance(x, y, kwargs.get("window"), kwargs.get("epsilon"))
     elif metric == "twe":
@@ -243,7 +245,9 @@ def pairwise_distance(
             x, y, kwargs.get("window"), kwargs.get("epsilon", 1.0)
         )
     elif metric == "erp":
-        return erp_pairwise_distance(x, y, kwargs.get("window"), kwargs.get("g", 0.0))
+        return erp_pairwise_distance(
+            x, y, kwargs.get("window"), kwargs.get("g", 0.0), kwargs.get("g_arr", None)
+        )
     elif metric == "edr":
         return edr_pairwise_distance(x, y, kwargs.get("window"), kwargs.get("epsilon"))
     elif metric == "twe":
@@ -374,7 +378,9 @@ def alignment_path(
             x, y, kwargs.get("window"), kwargs.get("epsilon", 1.0)
         )
     elif metric == "erp":
-        return erp_alignment_path(x, y, kwargs.get("window"), kwargs.get("g", 0.0))
+        return erp_alignment_path(
+            x, y, kwargs.get("window"), kwargs.get("g", 0.0), kwargs.get("g_arr", None)
+        )
     elif metric == "edr":
         return edr_alignment_path(x, y, kwargs.get("window"), kwargs.get("epsilon"))
     elif metric == "twe":
@@ -460,7 +466,9 @@ def cost_matrix(
     elif metric == "lcss":
         return lcss_cost_matrix(x, y, kwargs.get("window"), kwargs.get("epsilon", 1.0))
     elif metric == "erp":
-        return erp_cost_matrix(x, y, kwargs.get("window"), kwargs.get("g", 0.0))
+        return erp_cost_matrix(
+            x, y, kwargs.get("window"), kwargs.get("g", 0.0), kwargs.get("g_arr", None)
+        )
     elif metric == "edr":
         return edr_cost_matrix(x, y, kwargs.get("window"), kwargs.get("epsilon"))
     elif metric == "twe":
diff --git a/aeon/distances/_erp.py b/aeon/distances/_erp.py
index 56e966ee32..fdf426b6ca 100644
--- a/aeon/distances/_erp.py
+++ b/aeon/distances/_erp.py
@@ -34,7 +34,8 @@ def erp_distance(
     x: np.ndarray,
     y: np.ndarray,
     window: float = None,
-    g: Union[float, np.ndarray] = 0.0,
+    g: float = 0.0,
+    g_arr: np.ndarray = None,
 ) -> float:
     """Compute the ERP distance between two time series.
 
@@ -58,10 +59,10 @@ def erp_distance(
     window: float, defaults=None
         The window to use for the bounding matrix. If None, no bounding matrix
         is used.
-    g: float or np.ndarray of shape (n_channels), defaults=0.
-        The reference value to penalise gaps. The default is 0. If it is an array
-        then it must be the length of the number of channels in x and y. If a single
-        value is provided then that value is used across each channel
+    g: float.
+        The reference value to penalise gaps. The default is 0.
+    g_arr: np.ndarray of shape (n_channels), defaults=None
+        Numpy array that must be the length of the number of channels in x and y.
 
     Returns
     -------
@@ -91,10 +92,10 @@ def erp_distance(
         _x = x.reshape((1, x.shape[0]))
         _y = y.reshape((1, y.shape[0]))
         bounding_matrix = create_bounding_matrix(_x.shape[1], _y.shape[1], window)
-        return _erp_distance(_x, _y, bounding_matrix, g)
+        return _erp_distance(_x, _y, bounding_matrix, g, g_arr)
     if x.ndim == 2 and y.ndim == 2:
         bounding_matrix = create_bounding_matrix(x.shape[1], y.shape[1], window)
-        return _erp_distance(x, y, bounding_matrix, g)
+        return _erp_distance(x, y, bounding_matrix, g, g_arr)
     raise ValueError("x and y must be 1D or 2D")
 
 
@@ -104,6 +105,7 @@ def erp_cost_matrix(
     y: np.ndarray,
     window: float = None,
     g: Union[float, np.ndarray] = 0.0,
+    g_arr: np.ndarray = None,
 ) -> np.ndarray:
     """Compute the ERP cost matrix between two time series.
 
@@ -121,10 +123,10 @@ def erp_cost_matrix(
     window: float, defaults=None
         The window to use for the bounding matrix. If None, no bounding matrix
         is used.
-    g: float or np.ndarray of shape (n_channels), defaults=0.
-        The reference value to penalise gaps. The default is 0. If it is an array
-        then it must be the length of the number of channels in x and y. If a single
-        value is provided then that value is used across each channel.
+    g: float.
+        The reference value to penalise gaps. The default is 0.
+    g_arr: np.ndarray of shape (n_channels), defaults=None
+        Numpy array that must be the length of the number of channels in x and y.
 
     Returns
     -------
@@ -158,10 +160,10 @@ def erp_cost_matrix(
         _x = x.reshape((1, x.shape[0]))
         _y = y.reshape((1, y.shape[0]))
         bounding_matrix = create_bounding_matrix(_x.shape[1], _y.shape[1], window)
-        return _erp_cost_matrix(_x, _y, bounding_matrix, g)
+        return _erp_cost_matrix(_x, _y, bounding_matrix, g, g_arr)
     if x.ndim == 2 and y.ndim == 2:
         bounding_matrix = create_bounding_matrix(x.shape[1], y.shape[1], window)
-        return _erp_cost_matrix(x, y, bounding_matrix, g)
+        return _erp_cost_matrix(x, y, bounding_matrix, g, g_arr)
     raise ValueError("x and y must be 1D or 2D")
 
 
@@ -170,9 +172,12 @@ def _erp_distance(
     x: np.ndarray,
     y: np.ndarray,
     bounding_matrix: np.ndarray,
-    g: Union[float, np.ndarray],
+    g: float,
+    g_arr: np.ndarray,
 ) -> float:
-    return _erp_cost_matrix(x, y, bounding_matrix, g)[x.shape[1] - 1, y.shape[1] - 1]
+    return _erp_cost_matrix(x, y, bounding_matrix, g, g_arr)[
+        x.shape[1] - 1, y.shape[1] - 1
+    ]
 
 
 @njit(cache=True, fastmath=True)
@@ -180,15 +185,16 @@ def _erp_cost_matrix(
     x: np.ndarray,
     y: np.ndarray,
     bounding_matrix: np.ndarray,
-    g: Union[float, np.ndarray],
+    g: float,
+    g_arr: np.ndarray,
 ) -> np.ndarray:
     x_size = x.shape[1]
     y_size = y.shape[1]
 
     cost_matrix = np.zeros((x_size + 1, y_size + 1))
 
-    gx_distance, x_sum = _precompute_g(x, g)
-    gy_distance, y_sum = _precompute_g(y, g)
+    gx_distance, x_sum = _precompute_g(x, g, g_arr)
+    gy_distance, y_sum = _precompute_g(y, g, g_arr)
 
     cost_matrix[1:, 0] = x_sum
     cost_matrix[0, 1:] = y_sum
@@ -208,15 +214,15 @@ def _erp_cost_matrix(
 
 @njit(cache=True, fastmath=True)
 def _precompute_g(
-    x: np.ndarray, g: Union[float, np.ndarray]
+    x: np.ndarray, g: float, g_array: np.ndarray
 ) -> Tuple[np.ndarray, float]:
     gx_distance = np.zeros(x.shape[1])
-    if isinstance(g, float):
+    if g_array is None:
         g_arr = np.full(x.shape[0], g)
     else:
-        if g.shape[0] != x.shape[0]:
+        if g_array.shape[0] != x.shape[0]:
             raise ValueError("g must be a float or an array with shape (x.shape[0],)")
-        g_arr = g
+        g_arr = g_array
     x_sum = 0
 
     for i in range(x.shape[1]):
@@ -231,7 +237,8 @@ def erp_pairwise_distance(
     X: np.ndarray,
     y: np.ndarray = None,
     window: float = None,
-    g: Union[float, np.ndarray] = 0.0,
+    g: float = 0.0,
+    g_arr: np.ndarray = None,
 ) -> np.ndarray:
     """Compute the erp pairwise distance between a set of time series.
 
@@ -251,10 +258,10 @@ def erp_pairwise_distance(
     window: float, default=None
         The window to use for the bounding matrix. If None, no bounding matrix
         is used.
-    g: float or np.ndarray of shape (n_channels), defaults=0
-        The reference value to penalise gaps. The default is 0. If it is an array
-        then it must be the length of the number of channels in x and y. If a single
-        value is provided then that value is used across each channel.
+    g: float.
+        The reference value to penalise gaps. The default is 0.
+    g_arr: np.ndarray of shape (n_channels), defaults=None
+        Numpy array that must be the length of the number of channels in x and y.
 
     Returns
     -------
@@ -297,18 +304,21 @@ def erp_pairwise_distance(
     if y is None:
         # To self
         if X.ndim == 3:
-            return _erp_pairwise_distance(X, window, g)
+            return _erp_pairwise_distance(X, window, g, g_arr)
         if X.ndim == 2:
             _X = X.reshape((X.shape[0], 1, X.shape[1]))
-            return _erp_pairwise_distance(_X, window, g)
+            return _erp_pairwise_distance(_X, window, g, g_arr)
         raise ValueError("x and y must be 2D or 3D arrays")
     _x, _y = reshape_pairwise_to_multiple(X, y)
-    return _erp_from_multiple_to_multiple_distance(_x, _y, window, g)
+    return _erp_from_multiple_to_multiple_distance(_x, _y, window, g, g_arr)
 
 
 @njit(cache=True, fastmath=True)
 def _erp_pairwise_distance(
-    X: np.ndarray, window: float, g: Union[float, np.ndarray]
+    X: np.ndarray,
+    window: float,
+    g: float,
+    g_arr: np.ndarray,
 ) -> np.ndarray:
     n_instances = X.shape[0]
     distances = np.zeros((n_instances, n_instances))
@@ -316,7 +326,7 @@ def _erp_pairwise_distance(
 
     for i in range(n_instances):
         for j in range(i + 1, n_instances):
-            distances[i, j] = _erp_distance(X[i], X[j], bounding_matrix, g)
+            distances[i, j] = _erp_distance(X[i], X[j], bounding_matrix, g, g_arr)
             distances[j, i] = distances[i, j]
 
     return distances
@@ -324,7 +334,11 @@ def _erp_pairwise_distance(
 
 @njit(cache=True, fastmath=True)
 def _erp_from_multiple_to_multiple_distance(
-    x: np.ndarray, y: np.ndarray, window: float, g: Union[float, np.ndarray]
+    x: np.ndarray,
+    y: np.ndarray,
+    window: float,
+    g: float,
+    g_arr: np.ndarray,
 ) -> np.ndarray:
     n_instances = x.shape[0]
     m_instances = y.shape[0]
@@ -333,7 +347,7 @@ def _erp_from_multiple_to_multiple_distance(
 
     for i in range(n_instances):
         for j in range(m_instances):
-            distances[i, j] = _erp_distance(x[i], y[j], bounding_matrix, g)
+            distances[i, j] = _erp_distance(x[i], y[j], bounding_matrix, g, g_arr)
     return distances
 
 
@@ -342,7 +356,8 @@ def erp_alignment_path(
     x: np.ndarray,
     y: np.ndarray,
     window: float = None,
-    g: Union[float, np.ndarray] = 0.0,
+    g: float = 0.0,
+    g_arr: np.ndarray = None,
 ) -> Tuple[List[Tuple[int, int]], float]:
     """Compute the erp alignment path between two time series.
 
@@ -360,10 +375,10 @@ def erp_alignment_path(
     window: float, default=None
         The window to use for the bounding matrix. If None, no bounding matrix
         is used.
-    g: float or np.ndarray of shape (n_channels), defaults=0.
-        The reference value to penalise gaps. The default is 0. If it is an array
-        then it must be the length of the number of channels in x and y. If a single
-        value is provided then that value is used across each channel.
+    g: float.
+        The reference value to penalise gaps. The default is 0.
+    g_arr: np.ndarray of shape (n_channels), defaults=None
+        Numpy array that must be the length of the number of channels in x and y.
 
     Returns
     -------
@@ -390,7 +405,7 @@ def erp_alignment_path(
     """
     bounding_matrix = create_bounding_matrix(x.shape[-1], y.shape[-1], window)
     cost_matrix = _add_inf_to_out_of_bounds_cost_matrix(
-        erp_cost_matrix(x, y, window, g), bounding_matrix
+        erp_cost_matrix(x, y, window, g, g_arr), bounding_matrix
     )
     return (
         compute_min_return_path(cost_matrix),
diff --git a/aeon/distances/tests/test_numba_distance_parameters.py b/aeon/distances/tests/test_numba_distance_parameters.py
index cded18a764..808f50147c 100644
--- a/aeon/distances/tests/test_numba_distance_parameters.py
+++ b/aeon/distances/tests/test_numba_distance_parameters.py
@@ -33,7 +33,9 @@ def _test_distance_params(
         curr_results = []
         for x, y in test_ts:
             if g_none:
-                param_dict["g"] = np.std([x, y], axis=0).sum(axis=1)
+                param_dict["g_arr"] = np.std([x, y], axis=0).sum(axis=1)
+                if "g" in param_dict:
+                    del param_dict["g"]
             results = []
             results.append(distance_func(x, y, **param_dict))
             results.append(distance(x, y, metric=distance_str, **param_dict))
diff --git a/setup.cfg b/setup.cfg
index 66eb4a7dae..2c57f80ab3 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -11,15 +11,15 @@ addopts =
     --ignore build_tools
     --ignore examples
     --ignore docs
-    --doctest-modules
-    --durations 10
-    --timeout 600
-    --cov aeon
-    --cov-report xml
-    --cov-report html
-    --showlocals
-    --matrixdesign True
-    -n auto
+;    --doctest-modules
+;    --durations 10
+;    --timeout 600
+;    --cov aeon
+;    --cov-report xml
+;    --cov-report html
+;    --showlocals
+;    --matrixdesign True
+;    -n auto
 filterwarnings =
     ignore::UserWarning
     ignore:numpy.dtype size changed

From 6017bc7dba3ad11928df2cf04f91821a81ba0234 Mon Sep 17 00:00:00 2001
From: chrisholder <chrisholder987@hotmail.com>
Date: Wed, 5 Jul 2023 17:03:08 +0100
Subject: [PATCH 06/14] fixed the bug

---
 .../metrics/averaging/_barycenter_averaging.py   |  5 +++--
 aeon/clustering/tests/test_k_means.py            | 16 ++++++++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/aeon/clustering/metrics/averaging/_barycenter_averaging.py b/aeon/clustering/metrics/averaging/_barycenter_averaging.py
index f788db621e..15cca3466f 100644
--- a/aeon/clustering/metrics/averaging/_barycenter_averaging.py
+++ b/aeon/clustering/metrics/averaging/_barycenter_averaging.py
@@ -110,7 +110,6 @@ def _ba_update(
 ) -> Tuple[np.ndarray, float]:
     X_size, X_dims, X_timepoints = X.shape
     sum = np.zeros(X_timepoints)
-
     alignment = np.zeros((X_dims, X_timepoints))
     cost = 0.0
     for i in range(X_size):
@@ -134,7 +133,9 @@ def _ba_update(
                 curr_ts, center, window, independent, c
             )
         else:
-            raise ValueError(f"Metric must be a known string, got {metric}")
+            # When numba version > 0.57 add more informative error with what metric
+            # was passed.
+            raise ValueError("Metric parameter invalid")
         for j, k in curr_alignment:
             alignment[:, k] += curr_ts[:, j]
             sum[k] += 1
diff --git a/aeon/clustering/tests/test_k_means.py b/aeon/clustering/tests/test_k_means.py
index f730b45e11..cb8f395eb7 100644
--- a/aeon/clustering/tests/test_k_means.py
+++ b/aeon/clustering/tests/test_k_means.py
@@ -179,3 +179,19 @@ def test_kmeans_dba():
 
     for val in proba:
         assert np.count_nonzero(val == 1.0) == 1
+
+def test_kmeans_bug():
+    import numpy as np
+    from aeon.clustering.k_means import TimeSeriesKMeans
+    X_train = np.random.random(size=(100, 1, 100))
+
+    k_means = TimeSeriesKMeans(
+        n_clusters=13,  # Number of desired centers
+        init_algorithm="forgy",  # Center initialisation technique
+        max_iter=10,  # Maximum number of iterations for refinement on training set
+        metric="dtw",  # Distance metric to use
+        averaging_method="dba",  # Averaging technique to use
+        random_state=1,
+    )
+
+    k_means.fit(X_train)
\ No newline at end of file

From 7c17e8ab885e12051fef91b33926b9ea6510004d Mon Sep 17 00:00:00 2001
From: chrisholder <chrisholder987@hotmail.com>
Date: Wed, 5 Jul 2023 17:03:26 +0100
Subject: [PATCH 07/14] setup

---
 setup.cfg | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/setup.cfg b/setup.cfg
index 2c57f80ab3..66eb4a7dae 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -11,15 +11,15 @@ addopts =
     --ignore build_tools
     --ignore examples
     --ignore docs
-;    --doctest-modules
-;    --durations 10
-;    --timeout 600
-;    --cov aeon
-;    --cov-report xml
-;    --cov-report html
-;    --showlocals
-;    --matrixdesign True
-;    -n auto
+    --doctest-modules
+    --durations 10
+    --timeout 600
+    --cov aeon
+    --cov-report xml
+    --cov-report html
+    --showlocals
+    --matrixdesign True
+    -n auto
 filterwarnings =
     ignore::UserWarning
     ignore:numpy.dtype size changed

From ad0cf250bbc112b857f8d3990d47bdac5237ad9b Mon Sep 17 00:00:00 2001
From: chrisholder <chrisholder987@hotmail.com>
Date: Wed, 5 Jul 2023 17:07:54 +0100
Subject: [PATCH 08/14] removed test

---
 aeon/clustering/tests/test_k_means.py | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/aeon/clustering/tests/test_k_means.py b/aeon/clustering/tests/test_k_means.py
index cb8f395eb7..f730b45e11 100644
--- a/aeon/clustering/tests/test_k_means.py
+++ b/aeon/clustering/tests/test_k_means.py
@@ -179,19 +179,3 @@ def test_kmeans_dba():
 
     for val in proba:
         assert np.count_nonzero(val == 1.0) == 1
-
-def test_kmeans_bug():
-    import numpy as np
-    from aeon.clustering.k_means import TimeSeriesKMeans
-    X_train = np.random.random(size=(100, 1, 100))
-
-    k_means = TimeSeriesKMeans(
-        n_clusters=13,  # Number of desired centers
-        init_algorithm="forgy",  # Center initialisation technique
-        max_iter=10,  # Maximum number of iterations for refinement on training set
-        metric="dtw",  # Distance metric to use
-        averaging_method="dba",  # Averaging technique to use
-        random_state=1,
-    )
-
-    k_means.fit(X_train)
\ No newline at end of file

From cfda9b50d3f1686d3a422129de4a59d6f835ea81 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Sun, 9 Jul 2023 22:03:00 +0100
Subject: [PATCH 09/14] new converters

---
 aeon/datasets/_data_generators.py             |  20 ++
 aeon/utils/validation/_convert_collection.py  | 192 +++++++++++++++++
 aeon/utils/validation/collection.py           | 193 ++++++++++++++++++
 .../utils/validation/tests/test_collection.py |  63 ++++++
 4 files changed, 468 insertions(+)
 create mode 100644 aeon/utils/validation/_convert_collection.py
 create mode 100644 aeon/utils/validation/collection.py
 create mode 100644 aeon/utils/validation/tests/test_collection.py

diff --git a/aeon/datasets/_data_generators.py b/aeon/datasets/_data_generators.py
index 8ae0d07f5d..697acfce40 100644
--- a/aeon/datasets/_data_generators.py
+++ b/aeon/datasets/_data_generators.py
@@ -179,6 +179,26 @@ def make_example_long_table(n_cases=50, n_channels=2, n_timepoints=20):
     return df
 
 
+def make_example_nested_dataframe(n_instances=10, n_channels=3, n_timepoints=20):
+    """Generate example nested dataframe, type "nested_univ".
+
+    Parameters
+    ----------
+    n_instances : int
+        Number of instances.
+    n_channels : int
+        Number of columns (series) in multi-indexed DataFrame.
+    n_timepoints : int
+        Number of timepoints per instance-column pair.
+
+    Returns
+    -------
+    nested_df : pd.DataFrame. each cell a pd.Series length n_timepoints
+
+    """
+    return None
+
+
 def make_example_multi_index_dataframe(n_instances=50, n_channels=3, n_timepoints=20):
     """Generate example multi-index DataFrame.
 
diff --git a/aeon/utils/validation/_convert_collection.py b/aeon/utils/validation/_convert_collection.py
new file mode 100644
index 0000000000..ef226dc26f
--- /dev/null
+++ b/aeon/utils/validation/_convert_collection.py
@@ -0,0 +1,192 @@
+# -*- coding: utf-8 -*-
+"""Collection data converters."""
+import numpy as np
+import pandas as pd
+
+from aeon.utils.validation.collection import DATA_TYPES
+
+convert_dict = dict()
+
+
+def convert_identity(obj, store=None):
+    """Convert identity."""
+    return obj
+
+
+# assign identity function to type conversion to self
+for x in DATA_TYPES:
+    convert_dict[(x, x)] = convert_identity
+
+
+def from_numpy3d_to_pd_multiindex(X):
+    """Convert numpy3D collection to pandas multi-index Panel.
+
+    Parameters
+    ----------
+    X : np.ndarray
+        3-dimensional NumPy array (n_instances, n_channels, n_timepoints)
+
+    Returns
+    -------
+    X_mi : pd.DataFrame
+        The multi-indexed pandas DataFrame
+    """
+    if X.ndim != 3:
+        msg = " ".join(
+            [
+                "Input should be 3-dimensional NumPy array with shape",
+                "(n_instances, n_channels, n_timepoints).",
+            ]
+        )
+        raise TypeError(msg)
+
+    n_instances, n_channels, n_timepoints = X.shape
+    multi_index = pd.MultiIndex.from_product(
+        [range(n_instances), range(n_channels), range(n_timepoints)],
+        names=["instances", "columns", "timepoints"],
+    )
+
+    X_mi = pd.DataFrame({"X": X.flatten()}, index=multi_index)
+    X_mi = X_mi.unstack(level="columns")
+    X_mi.columns = [f"var_{i}" for i in range(n_channels)]
+    return X_mi
+
+
+def from_numpy3d_to_nested_univ(X):
+    """Convert numpy3D collection to nested_univ pd.DataFrame.
+
+    Convert NumPy ndarray with shape (n_instances, n_channels, n_timepoints)
+    into nested pandas DataFrame (with time series as pandas Series in cells)
+
+    Parameters
+    ----------
+    X : np.ndarray
+        3-dimensional NumPy array (n_instances, n_channels, n_timepoints)
+
+    Returns
+    -------
+    df : pd.DataFrame
+    """
+    n_instances, n_channels, n_timepoints = X.shape
+    array_type = X.dtype
+    container = pd.Series
+    column_names = [f"var_{i}" for i in range(n_channels)]
+    column_list = []
+    for j, column in enumerate(column_names):
+        nested_column = (
+            pd.DataFrame(X[:, j, :])
+            .apply(lambda x: [container(x, dtype=array_type)], axis=1)
+            .str[0]
+            .rename(column)
+        )
+        column_list.append(nested_column)
+    df = pd.concat(column_list, axis=1)
+    return df
+
+
+def from_numpy3d_to_np_list(X, store=None):
+    """Convert 3D np.darray to a list of 2D numpy.
+
+    Converts 3D numpy array (n_instances, n_channels, n_timepoints) to
+    a 2D list length [n_instances] each of shape (n_channels, n_timepoints)
+
+    Parameters
+    ----------
+    X : np.ndarray
+        The input array with shape (n_instances, n_channels, n_timepoints)
+
+    Returns
+    -------
+    list : list [n_instances] np.ndarray
+        A list of np.ndarray
+    """
+    np_list = []
+    for arr in X:
+        np_list.append(arr)
+    return np_list
+
+
+def from_numpy3d_to_df_list(X, store=None):
+    """Convert 3D np.darray to a list of dataframes in wide format.
+
+    Converts 3D numpy array (n_instances, n_channels, n_timepoints) to
+    a 2D list length [n_instances] of pd.DataFrames shape (n_channels, n_timepoints)
+
+    Parameters
+    ----------
+    X : np.ndarray
+        The input array with shape (n_instances, n_channels, n_timepoints)
+
+    Returns
+    -------
+    df : pd.DataFrame
+    """
+    df_list = []
+    for arr in X:
+        df_list.append(pd.DataFrame(arr))
+    return df_list
+
+
+def from_numpy3d_to_pd_wide(X, store=None):
+    """Convert 3D np.darray to a list of dataframes in wide format.
+
+    Only valid with univariate time series. Converts 3D numpy array (n_instances, 1,
+    n_timepoints) to a dataframe (n_instances, n_timepoints)
+
+    Parameters
+    ----------
+    X : np.ndarray
+        The input array with shape (n_instances, 1, n_timepoints)
+
+    Returns
+    -------
+    df : a dataframe (n_instances, n_timepoints)
+
+    Raise
+    -----
+    ValueError if X has n_channels>1
+    """
+    if X.shape[1] > 1:
+        raise ValueError(
+            "Error, numpy3D passed with > 1 channel, cannot convert to " "pd-wide"
+        )
+    return pd.DataFrame(X.squeeze())
+
+
+def from_numpyflat_to_nested_univ(X):
+    """Convert np.ndarray to nested_univ format pd.DataFrame with a single column.
+
+    Parameters
+    ----------
+    X : np.ndarray shape (n_cases, n_timepoints)
+
+    Returns
+    -------
+    Xt : pd.DataFrame
+        DataFrame with a single column of pd.Series
+    """
+    container = pd.Series
+    n_instances, n_timepoints = X.shape
+    time_index = np.arange(n_timepoints)
+    kwargs = {"index": time_index}
+
+    Xt = pd.DataFrame(
+        pd.Series([container(X[i, :], **kwargs) for i in range(n_instances)])
+    )
+    return Xt
+
+
+def from_pd_wide_to_nested_univ(X):
+    """Convert wide pd.DataFrame to nested_univ format pd.DataFrame.
+
+    Parameters
+    ----------
+    X : pd.DataFrame shape (n_cases, n_timepoints)
+
+    Returns
+    -------
+    Xt : pd.DataFrame
+        Transformed DataFrame with a single column of pd.Series
+    """
+    X = X.to_numpy()
+    return from_numpyflat_to_nested_univ(X)
diff --git a/aeon/utils/validation/collection.py b/aeon/utils/validation/collection.py
new file mode 100644
index 0000000000..39fa22d045
--- /dev/null
+++ b/aeon/utils/validation/collection.py
@@ -0,0 +1,193 @@
+# -*- coding: utf-8 -*-
+"""Conversion and checking for collections of time series."""
+import numpy as np
+import pandas as pd
+
+from aeon.datatypes._panel._convert import convert_dict
+
+DATA_TYPES = [
+    "numpy3D",  # 3D np.ndarray of format (n_cases, n_channels, n_timepoints)
+    "np-list",  # python list of 2D numpy array of length [n_cases], each of shape (
+    # n_channels, n_timepoints_i)
+    "df-list",  # python list of 2D pd.DataFrames of length [n_cases], each a of
+    # shape (n_timepoints_i, n_channels)
+    "numpyflat",  # 2D np.ndarray of shape (n_cases, n_timepoints)
+    "pd-wide",  # 2D pd.DataFrame of shape (n_cases, n_timepoints)
+    "nested_univ",  # pd.DataFrame (n_cases, n_channels) with each cell a pd.Series,
+]
+# "pd-multiindex", d.DataFrame with multi-index,
+# "dask_panel": not used anywhere
+
+
+def convertX(X, to_type):
+    """Convert from one of DATA_TYPE to another.
+
+    Parameters
+    ----------
+    X : data structure.
+    to_type : string, one of DATA_TYPES
+
+    Returns
+    -------
+    Data structure conforming to "to_type"
+
+    Raises
+    ------
+    ValueError if
+        X pd.ndarray but wrong dimension
+        X is list but not of np.ndarray or p.DataFrame.
+        X is a pd.DataFrame on non float primitives.
+
+    Example
+    -------
+    >>> X=convertX(np.zeros(shape=(10, 3, 20)), "np-list")
+    >>> type(X)
+    list
+    """
+    input_type = get_type(X)
+    return convert_dict[(input_type, to_type, "Panel")](X)
+
+
+def get_type(X):
+    """Get the string identifier associated with different data structures.
+
+    Parameters
+    ----------
+    X : data structure.
+
+    Returns
+    -------
+    input_type : string, one of DATA_TYPES
+
+    Raises
+    ------
+    ValueError if
+        X pd.ndarray but wrong dimension
+        X is list but not of np.ndarray or p.DataFrame.
+        X is a pd.DataFrame on non float primitives.
+
+    Example
+    -------
+    >>> equal_length( np.zeros(shape=(10, 3, 20)), "numpy3D")
+    True
+    """
+    if isinstance(X, np.ndarray):  # “numpy3D” or numpyflat
+        if X.ndim == 3:
+            return "numpy3D"
+        elif X.ndim == 2:
+            return "numpyflat"
+        else:
+            raise ValueError("ERROR np.ndarray must be either 2D or 3D")
+    elif isinstance(X, list):  # np-list or df-list
+        if isinstance(X[0], np.ndarray):  # if one a numpy they must all be 2D numpy
+            for a in X:
+                if not (isinstance(a, np.ndarray) and a.ndim == 2):
+                    raise ValueError("ERROR np-list np.ndarray must be either 2D or 3D")
+            return "np-list"
+        elif isinstance(X[0], pd.DataFrame):
+            for a in X:
+                if not isinstance(a, pd.DataFrame):
+                    raise ValueError("ERROR df-list must only contain pd.DataFrame")
+            return "df-list"
+    elif isinstance(X, pd.DataFrame):  # Nested univariate, hierachical or pd-wide
+        if _is_nested_dataframe(X):
+            return "nested_univ"
+        if isinstance(X.index, pd.MultiIndex):
+            return "pd-multiindex"
+        elif _is_pd_wide(X):
+            return "pd-wide"
+        raise ValueError(
+            "ERROR unknown pd.DataFrame, contains non float values, "
+            "not hierarchical nor is it nested pd.Series"
+        )
+    #    if isinstance(X, dask.dataframe.core.DataFrame):
+    #        return "dask_panel"
+    raise ValueError(f"ERROR unknown input type {type(X)}")
+
+
+def equal_length(X, input_type):
+    """Test if X contains equal length time series.
+
+    Assumes input_type is a valid type (DATA_TYPES).
+
+    Parameters
+    ----------
+    X : data structure.
+    input_type : string, one of DATA_TYPES
+
+    Returns
+    -------
+    boolean: True if all series in X are equal length, False otherwise
+
+    Raises
+    ------
+    ValueError if input_type equals "dask_panel" or not in DATA_TYPES.
+
+    Example
+    -------
+    >>> equal_length( np.zeros(shape=(10, 3, 20)), "numpy3D")
+    True
+    """
+    always_equal = {"numpy3D", "numpyflat", "pd-wide"}
+    if input_type in always_equal:
+        return True
+    if input_type == "np-list":
+        first = X[0].shape[1]
+        for i in range(1, len(X)):
+            if X[i].shape[1] != first:
+                return False
+        return True
+    if input_type == "df-list":
+        first = X[0].shape[0]
+        for i in range(1, len(X)):
+            if X[i].shape[0] != first:
+                return False
+        return True
+    if input_type == "nested_univ":  # Nested univariate or hierachical
+        return _nested_uni_is_equal(X)
+    if input_type == "pd-multiindex":
+        # TEMPORARY: WORK OUT HOW TO TEST THESE
+        return True
+    #        raise ValueError(" Multi index not supported here ")
+    if input_type == "dask_panel":
+        raise ValueError(" DASK panel not supported here ")
+    raise ValueError(f" unknown input type {input_type}")
+    return False
+
+
+def has_missing(X, input_type):
+    """Check if X has missing values."""
+    #    if isinstance(X, np.ndarray):   # “numpy3D” or numpyflat
+    #    elif isinstance(X, list): # np-list or df-list
+    return False
+
+
+def _nested_uni_is_equal(X):
+    """Check whether series are unequal length."""
+    length = X.iloc[0, 0].size
+    for series in X.iloc[0]:
+        if series.size != length:
+            return False
+    return True
+
+
+def _is_nested_dataframe(X):
+    """Check if X is nested dataframe."""
+    # Otherwise check all entries are pd.Series
+    if not isinstance(X, pd.DataFrame):
+        return False
+    for _, series in X.items():
+        for cell in series:
+            if not isinstance(cell, pd.Series):
+                return False
+    return True
+
+
+def _is_pd_wide(X):
+    """Check whether the input nested DataFrame is "pd-wide" type."""
+    # only test is if all values are float. This from chatgpt seems stupid
+    float_cols = X.select_dtypes(include=[np.float]).columns
+    for col in float_cols:
+        if not np.issubdtype(X[col].dtype, np.floating):
+            return False
+    return True
diff --git a/aeon/utils/validation/tests/test_collection.py b/aeon/utils/validation/tests/test_collection.py
new file mode 100644
index 0000000000..090f944ec1
--- /dev/null
+++ b/aeon/utils/validation/tests/test_collection.py
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3 -u
+# -*- coding: utf-8 -*-
+"""Unit tests for aeon.utils.validation.collection check/convert functions."""
+import numpy as np
+import pandas as pd
+import pytest
+
+# from aeon.datasets._data_generators import make_example_multi_index_dataframe
+from aeon.utils._testing.tests.test_collection import make_nested_dataframe_data
+from aeon.utils.validation.collection import (  # _nested_uni_is_equal,; has_missing,
+    DATA_TYPES,
+    _is_nested_dataframe,
+    convertX,
+    equal_length,
+    get_type,
+)
+
+np_list = []
+for _ in range(10):
+    np_list.append(np.zeros(shape=(20, 2)))
+df_list = []
+for _ in range(10):
+    df_list.append(pd.DataFrame(np.zeros(shape=(20, 2))))
+nested, _ = make_nested_dataframe_data()
+# multi = make_example_multi_index_dataframe()
+
+DATA_EXAMPLES = {
+    "numpy3D": np.zeros(shape=(10, 3, 20)),
+    "numpyflat": np.zeros(shape=(10, 20)),
+    "np-list": np_list,
+    "df-list": df_list,
+    "pd-wide": pd.DataFrame(np.zeros(shape=(10, 20))),
+    "nested_univ": nested,
+}
+#    "pd-multiindex": multi,
+
+
+@pytest.mark.parametrize("data", DATA_TYPES)
+def test_equal_length(data):
+    assert equal_length(DATA_EXAMPLES[data], data)
+
+
+@pytest.mark.parametrize("data", DATA_TYPES)
+def test_get_type(data):
+    assert get_type(DATA_EXAMPLES[data]) == data
+
+
+@pytest.mark.parametrize("data", DATA_TYPES)
+def test_is_nested_dataframe(data):
+    if data == "nested_univ":
+        assert _is_nested_dataframe(DATA_EXAMPLES[data])
+    else:
+        assert not _is_nested_dataframe(DATA_EXAMPLES[data])
+
+
+@pytest.mark.parametrize("input_data", DATA_TYPES)
+@pytest.mark.parametrize("output_data", DATA_TYPES)
+def test_convertX(input_data, output_data):
+    # dont test conversion from unequal supporting to equal only, or multivariate to
+    # univariate only. pd-wide seems unsupported.
+    X = convertX(DATA_EXAMPLES[input_data], output_data)
+    t = get_type(X)
+    assert t == output_data

From a6925171e1795b81fb0edfd7e02bab4c299ac180 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Wed, 12 Jul 2023 16:37:21 +0100
Subject: [PATCH 10/14] remove conversions from this PR

---
 aeon/utils/validation/_convert_collection.py  | 192 -----------------
 aeon/utils/validation/collection.py           | 193 ------------------
 .../utils/validation/tests/test_collection.py |  63 ------
 3 files changed, 448 deletions(-)
 delete mode 100644 aeon/utils/validation/_convert_collection.py
 delete mode 100644 aeon/utils/validation/collection.py
 delete mode 100644 aeon/utils/validation/tests/test_collection.py

diff --git a/aeon/utils/validation/_convert_collection.py b/aeon/utils/validation/_convert_collection.py
deleted file mode 100644
index ef226dc26f..0000000000
--- a/aeon/utils/validation/_convert_collection.py
+++ /dev/null
@@ -1,192 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Collection data converters."""
-import numpy as np
-import pandas as pd
-
-from aeon.utils.validation.collection import DATA_TYPES
-
-convert_dict = dict()
-
-
-def convert_identity(obj, store=None):
-    """Convert identity."""
-    return obj
-
-
-# assign identity function to type conversion to self
-for x in DATA_TYPES:
-    convert_dict[(x, x)] = convert_identity
-
-
-def from_numpy3d_to_pd_multiindex(X):
-    """Convert numpy3D collection to pandas multi-index Panel.
-
-    Parameters
-    ----------
-    X : np.ndarray
-        3-dimensional NumPy array (n_instances, n_channels, n_timepoints)
-
-    Returns
-    -------
-    X_mi : pd.DataFrame
-        The multi-indexed pandas DataFrame
-    """
-    if X.ndim != 3:
-        msg = " ".join(
-            [
-                "Input should be 3-dimensional NumPy array with shape",
-                "(n_instances, n_channels, n_timepoints).",
-            ]
-        )
-        raise TypeError(msg)
-
-    n_instances, n_channels, n_timepoints = X.shape
-    multi_index = pd.MultiIndex.from_product(
-        [range(n_instances), range(n_channels), range(n_timepoints)],
-        names=["instances", "columns", "timepoints"],
-    )
-
-    X_mi = pd.DataFrame({"X": X.flatten()}, index=multi_index)
-    X_mi = X_mi.unstack(level="columns")
-    X_mi.columns = [f"var_{i}" for i in range(n_channels)]
-    return X_mi
-
-
-def from_numpy3d_to_nested_univ(X):
-    """Convert numpy3D collection to nested_univ pd.DataFrame.
-
-    Convert NumPy ndarray with shape (n_instances, n_channels, n_timepoints)
-    into nested pandas DataFrame (with time series as pandas Series in cells)
-
-    Parameters
-    ----------
-    X : np.ndarray
-        3-dimensional NumPy array (n_instances, n_channels, n_timepoints)
-
-    Returns
-    -------
-    df : pd.DataFrame
-    """
-    n_instances, n_channels, n_timepoints = X.shape
-    array_type = X.dtype
-    container = pd.Series
-    column_names = [f"var_{i}" for i in range(n_channels)]
-    column_list = []
-    for j, column in enumerate(column_names):
-        nested_column = (
-            pd.DataFrame(X[:, j, :])
-            .apply(lambda x: [container(x, dtype=array_type)], axis=1)
-            .str[0]
-            .rename(column)
-        )
-        column_list.append(nested_column)
-    df = pd.concat(column_list, axis=1)
-    return df
-
-
-def from_numpy3d_to_np_list(X, store=None):
-    """Convert 3D np.darray to a list of 2D numpy.
-
-    Converts 3D numpy array (n_instances, n_channels, n_timepoints) to
-    a 2D list length [n_instances] each of shape (n_channels, n_timepoints)
-
-    Parameters
-    ----------
-    X : np.ndarray
-        The input array with shape (n_instances, n_channels, n_timepoints)
-
-    Returns
-    -------
-    list : list [n_instances] np.ndarray
-        A list of np.ndarray
-    """
-    np_list = []
-    for arr in X:
-        np_list.append(arr)
-    return np_list
-
-
-def from_numpy3d_to_df_list(X, store=None):
-    """Convert 3D np.darray to a list of dataframes in wide format.
-
-    Converts 3D numpy array (n_instances, n_channels, n_timepoints) to
-    a 2D list length [n_instances] of pd.DataFrames shape (n_channels, n_timepoints)
-
-    Parameters
-    ----------
-    X : np.ndarray
-        The input array with shape (n_instances, n_channels, n_timepoints)
-
-    Returns
-    -------
-    df : pd.DataFrame
-    """
-    df_list = []
-    for arr in X:
-        df_list.append(pd.DataFrame(arr))
-    return df_list
-
-
-def from_numpy3d_to_pd_wide(X, store=None):
-    """Convert 3D np.darray to a list of dataframes in wide format.
-
-    Only valid with univariate time series. Converts 3D numpy array (n_instances, 1,
-    n_timepoints) to a dataframe (n_instances, n_timepoints)
-
-    Parameters
-    ----------
-    X : np.ndarray
-        The input array with shape (n_instances, 1, n_timepoints)
-
-    Returns
-    -------
-    df : a dataframe (n_instances, n_timepoints)
-
-    Raise
-    -----
-    ValueError if X has n_channels>1
-    """
-    if X.shape[1] > 1:
-        raise ValueError(
-            "Error, numpy3D passed with > 1 channel, cannot convert to " "pd-wide"
-        )
-    return pd.DataFrame(X.squeeze())
-
-
-def from_numpyflat_to_nested_univ(X):
-    """Convert np.ndarray to nested_univ format pd.DataFrame with a single column.
-
-    Parameters
-    ----------
-    X : np.ndarray shape (n_cases, n_timepoints)
-
-    Returns
-    -------
-    Xt : pd.DataFrame
-        DataFrame with a single column of pd.Series
-    """
-    container = pd.Series
-    n_instances, n_timepoints = X.shape
-    time_index = np.arange(n_timepoints)
-    kwargs = {"index": time_index}
-
-    Xt = pd.DataFrame(
-        pd.Series([container(X[i, :], **kwargs) for i in range(n_instances)])
-    )
-    return Xt
-
-
-def from_pd_wide_to_nested_univ(X):
-    """Convert wide pd.DataFrame to nested_univ format pd.DataFrame.
-
-    Parameters
-    ----------
-    X : pd.DataFrame shape (n_cases, n_timepoints)
-
-    Returns
-    -------
-    Xt : pd.DataFrame
-        Transformed DataFrame with a single column of pd.Series
-    """
-    X = X.to_numpy()
-    return from_numpyflat_to_nested_univ(X)
diff --git a/aeon/utils/validation/collection.py b/aeon/utils/validation/collection.py
deleted file mode 100644
index 39fa22d045..0000000000
--- a/aeon/utils/validation/collection.py
+++ /dev/null
@@ -1,193 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Conversion and checking for collections of time series."""
-import numpy as np
-import pandas as pd
-
-from aeon.datatypes._panel._convert import convert_dict
-
-DATA_TYPES = [
-    "numpy3D",  # 3D np.ndarray of format (n_cases, n_channels, n_timepoints)
-    "np-list",  # python list of 2D numpy array of length [n_cases], each of shape (
-    # n_channels, n_timepoints_i)
-    "df-list",  # python list of 2D pd.DataFrames of length [n_cases], each a of
-    # shape (n_timepoints_i, n_channels)
-    "numpyflat",  # 2D np.ndarray of shape (n_cases, n_timepoints)
-    "pd-wide",  # 2D pd.DataFrame of shape (n_cases, n_timepoints)
-    "nested_univ",  # pd.DataFrame (n_cases, n_channels) with each cell a pd.Series,
-]
-# "pd-multiindex", d.DataFrame with multi-index,
-# "dask_panel": not used anywhere
-
-
-def convertX(X, to_type):
-    """Convert from one of DATA_TYPE to another.
-
-    Parameters
-    ----------
-    X : data structure.
-    to_type : string, one of DATA_TYPES
-
-    Returns
-    -------
-    Data structure conforming to "to_type"
-
-    Raises
-    ------
-    ValueError if
-        X pd.ndarray but wrong dimension
-        X is list but not of np.ndarray or p.DataFrame.
-        X is a pd.DataFrame on non float primitives.
-
-    Example
-    -------
-    >>> X=convertX(np.zeros(shape=(10, 3, 20)), "np-list")
-    >>> type(X)
-    list
-    """
-    input_type = get_type(X)
-    return convert_dict[(input_type, to_type, "Panel")](X)
-
-
-def get_type(X):
-    """Get the string identifier associated with different data structures.
-
-    Parameters
-    ----------
-    X : data structure.
-
-    Returns
-    -------
-    input_type : string, one of DATA_TYPES
-
-    Raises
-    ------
-    ValueError if
-        X pd.ndarray but wrong dimension
-        X is list but not of np.ndarray or p.DataFrame.
-        X is a pd.DataFrame on non float primitives.
-
-    Example
-    -------
-    >>> equal_length( np.zeros(shape=(10, 3, 20)), "numpy3D")
-    True
-    """
-    if isinstance(X, np.ndarray):  # “numpy3D” or numpyflat
-        if X.ndim == 3:
-            return "numpy3D"
-        elif X.ndim == 2:
-            return "numpyflat"
-        else:
-            raise ValueError("ERROR np.ndarray must be either 2D or 3D")
-    elif isinstance(X, list):  # np-list or df-list
-        if isinstance(X[0], np.ndarray):  # if one a numpy they must all be 2D numpy
-            for a in X:
-                if not (isinstance(a, np.ndarray) and a.ndim == 2):
-                    raise ValueError("ERROR np-list np.ndarray must be either 2D or 3D")
-            return "np-list"
-        elif isinstance(X[0], pd.DataFrame):
-            for a in X:
-                if not isinstance(a, pd.DataFrame):
-                    raise ValueError("ERROR df-list must only contain pd.DataFrame")
-            return "df-list"
-    elif isinstance(X, pd.DataFrame):  # Nested univariate, hierachical or pd-wide
-        if _is_nested_dataframe(X):
-            return "nested_univ"
-        if isinstance(X.index, pd.MultiIndex):
-            return "pd-multiindex"
-        elif _is_pd_wide(X):
-            return "pd-wide"
-        raise ValueError(
-            "ERROR unknown pd.DataFrame, contains non float values, "
-            "not hierarchical nor is it nested pd.Series"
-        )
-    #    if isinstance(X, dask.dataframe.core.DataFrame):
-    #        return "dask_panel"
-    raise ValueError(f"ERROR unknown input type {type(X)}")
-
-
-def equal_length(X, input_type):
-    """Test if X contains equal length time series.
-
-    Assumes input_type is a valid type (DATA_TYPES).
-
-    Parameters
-    ----------
-    X : data structure.
-    input_type : string, one of DATA_TYPES
-
-    Returns
-    -------
-    boolean: True if all series in X are equal length, False otherwise
-
-    Raises
-    ------
-    ValueError if input_type equals "dask_panel" or not in DATA_TYPES.
-
-    Example
-    -------
-    >>> equal_length( np.zeros(shape=(10, 3, 20)), "numpy3D")
-    True
-    """
-    always_equal = {"numpy3D", "numpyflat", "pd-wide"}
-    if input_type in always_equal:
-        return True
-    if input_type == "np-list":
-        first = X[0].shape[1]
-        for i in range(1, len(X)):
-            if X[i].shape[1] != first:
-                return False
-        return True
-    if input_type == "df-list":
-        first = X[0].shape[0]
-        for i in range(1, len(X)):
-            if X[i].shape[0] != first:
-                return False
-        return True
-    if input_type == "nested_univ":  # Nested univariate or hierachical
-        return _nested_uni_is_equal(X)
-    if input_type == "pd-multiindex":
-        # TEMPORARY: WORK OUT HOW TO TEST THESE
-        return True
-    #        raise ValueError(" Multi index not supported here ")
-    if input_type == "dask_panel":
-        raise ValueError(" DASK panel not supported here ")
-    raise ValueError(f" unknown input type {input_type}")
-    return False
-
-
-def has_missing(X, input_type):
-    """Check if X has missing values."""
-    #    if isinstance(X, np.ndarray):   # “numpy3D” or numpyflat
-    #    elif isinstance(X, list): # np-list or df-list
-    return False
-
-
-def _nested_uni_is_equal(X):
-    """Check whether series are unequal length."""
-    length = X.iloc[0, 0].size
-    for series in X.iloc[0]:
-        if series.size != length:
-            return False
-    return True
-
-
-def _is_nested_dataframe(X):
-    """Check if X is nested dataframe."""
-    # Otherwise check all entries are pd.Series
-    if not isinstance(X, pd.DataFrame):
-        return False
-    for _, series in X.items():
-        for cell in series:
-            if not isinstance(cell, pd.Series):
-                return False
-    return True
-
-
-def _is_pd_wide(X):
-    """Check whether the input nested DataFrame is "pd-wide" type."""
-    # only test is if all values are float. This from chatgpt seems stupid
-    float_cols = X.select_dtypes(include=[np.float]).columns
-    for col in float_cols:
-        if not np.issubdtype(X[col].dtype, np.floating):
-            return False
-    return True
diff --git a/aeon/utils/validation/tests/test_collection.py b/aeon/utils/validation/tests/test_collection.py
deleted file mode 100644
index 090f944ec1..0000000000
--- a/aeon/utils/validation/tests/test_collection.py
+++ /dev/null
@@ -1,63 +0,0 @@
-#!/usr/bin/env python3 -u
-# -*- coding: utf-8 -*-
-"""Unit tests for aeon.utils.validation.collection check/convert functions."""
-import numpy as np
-import pandas as pd
-import pytest
-
-# from aeon.datasets._data_generators import make_example_multi_index_dataframe
-from aeon.utils._testing.tests.test_collection import make_nested_dataframe_data
-from aeon.utils.validation.collection import (  # _nested_uni_is_equal,; has_missing,
-    DATA_TYPES,
-    _is_nested_dataframe,
-    convertX,
-    equal_length,
-    get_type,
-)
-
-np_list = []
-for _ in range(10):
-    np_list.append(np.zeros(shape=(20, 2)))
-df_list = []
-for _ in range(10):
-    df_list.append(pd.DataFrame(np.zeros(shape=(20, 2))))
-nested, _ = make_nested_dataframe_data()
-# multi = make_example_multi_index_dataframe()
-
-DATA_EXAMPLES = {
-    "numpy3D": np.zeros(shape=(10, 3, 20)),
-    "numpyflat": np.zeros(shape=(10, 20)),
-    "np-list": np_list,
-    "df-list": df_list,
-    "pd-wide": pd.DataFrame(np.zeros(shape=(10, 20))),
-    "nested_univ": nested,
-}
-#    "pd-multiindex": multi,
-
-
-@pytest.mark.parametrize("data", DATA_TYPES)
-def test_equal_length(data):
-    assert equal_length(DATA_EXAMPLES[data], data)
-
-
-@pytest.mark.parametrize("data", DATA_TYPES)
-def test_get_type(data):
-    assert get_type(DATA_EXAMPLES[data]) == data
-
-
-@pytest.mark.parametrize("data", DATA_TYPES)
-def test_is_nested_dataframe(data):
-    if data == "nested_univ":
-        assert _is_nested_dataframe(DATA_EXAMPLES[data])
-    else:
-        assert not _is_nested_dataframe(DATA_EXAMPLES[data])
-
-
-@pytest.mark.parametrize("input_data", DATA_TYPES)
-@pytest.mark.parametrize("output_data", DATA_TYPES)
-def test_convertX(input_data, output_data):
-    # dont test conversion from unequal supporting to equal only, or multivariate to
-    # univariate only. pd-wide seems unsupported.
-    X = convertX(DATA_EXAMPLES[input_data], output_data)
-    t = get_type(X)
-    assert t == output_data

From 065fc9e2907ab3948d8f08e13a3769293a619466 Mon Sep 17 00:00:00 2001
From: Tony Bagnall <ajb@uea.ac.uk>
Date: Sat, 22 Jul 2023 20:02:20 +0100
Subject: [PATCH 11/14] remove method stub

---
 aeon/datasets/_data_generators.py | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/aeon/datasets/_data_generators.py b/aeon/datasets/_data_generators.py
index 697acfce40..8ae0d07f5d 100644
--- a/aeon/datasets/_data_generators.py
+++ b/aeon/datasets/_data_generators.py
@@ -179,26 +179,6 @@ def make_example_long_table(n_cases=50, n_channels=2, n_timepoints=20):
     return df
 
 
-def make_example_nested_dataframe(n_instances=10, n_channels=3, n_timepoints=20):
-    """Generate example nested dataframe, type "nested_univ".
-
-    Parameters
-    ----------
-    n_instances : int
-        Number of instances.
-    n_channels : int
-        Number of columns (series) in multi-indexed DataFrame.
-    n_timepoints : int
-        Number of timepoints per instance-column pair.
-
-    Returns
-    -------
-    nested_df : pd.DataFrame. each cell a pd.Series length n_timepoints
-
-    """
-    return None
-
-
 def make_example_multi_index_dataframe(n_instances=50, n_channels=3, n_timepoints=20):
     """Generate example multi-index DataFrame.
 

From 5e5f6dec5c1854c9a82c447e8c7b66c796c745ea Mon Sep 17 00:00:00 2001
From: MatthewMiddlehurst <m.middlehurst@uea.ac.uk>
Date: Mon, 24 Jul 2023 17:02:27 +0100
Subject: [PATCH 12/14] storage and benchmarking

---
 examples/datasets/benchmarking_data.ipynb | 164 +++++----
 examples/datasets/data_conversions.ipynb  |  10 +-
 examples/datasets/data_loading.ipynb      |  22 +-
 examples/datasets/data_storage.ipynb      | 417 +++++++++++++++-------
 4 files changed, 401 insertions(+), 212 deletions(-)

diff --git a/examples/datasets/benchmarking_data.ipynb b/examples/datasets/benchmarking_data.ipynb
index 514aac7a38..7ad658f6a5 100644
--- a/examples/datasets/benchmarking_data.ipynb
+++ b/examples/datasets/benchmarking_data.ipynb
@@ -6,16 +6,20 @@
     "# Downloading and loading benchmarking datasets\n",
     "\n",
     "It is common to use standard collections of data to compare different estimators for\n",
-    "classification, clustering, regression and forecasting. Some of these datasets are\n",
-    "shipped with aeon in the datasets/data directory. However, the files are far too\n",
-    "big to include them all. aeon p[rovides tools to download these data to use in benchmarking experiments.\n",
-    "Classification and regression data are stored in .ts format. Forecasting\n",
-    "data are stored in the equivalent .tsf format. See the [data formats notebook](examples/data_formats.ipynb) for more info.\n",
+    "classification, clustering, regression and forecasting. Some of the smaller datasets from\n",
+    "these datasets included with `aeon` in the `aeon/datasets/data` directory. However,\n",
+    "there is way to many datasets to include them all, and some of the files are far too big\n",
+    "to consider including in the package. `aeon` provides tools to download these data to use\n",
+    "in benchmarking experiments. Classification and regression data are stored in .ts format.\n",
+    "Forecasting data are stored in the equivalent .tsf format. See the\n",
+    "[data loading notebook](examples/data_loading.ipynb) for more info.\n",
     "\n",
-    "Classification and regression are loaded into 3D numpy arrays of shape `(n_cases, n_channels, n_timepoints)` if equal length\n",
-    "or a list of `[n_cases]` of 2D numpy if `n_timepoints` is different for different\n",
-    "cases. Forecasting data are loaded into pd.DataFrame. For more information on\n",
-    "aeon data types see the [data storage notebook](examples/data_storage.ipynb).\n",
+    "Classification and regression are loaded into 3D numpy arrays of shape\n",
+    "`(n_cases, n_channels, n_timepoints)` if equal length or a list of length\n",
+    "`n_cases` of 2D numpy arrays of shape `(n_channels, n_timepoints)` if\n",
+    "`n_timepoints` is different between cases. Forecasting data are loaded into\n",
+    "pd.DataFrame. For more information on aeon data types see the\n",
+    "[data storage notebook](examples/data_storage.ipynb).\n",
     "\n",
     "Note that this notebook is dependent on external websites, so will not function if\n",
     "you are not online or the associated website is down. We use the following three\n",
@@ -27,13 +31,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 9,
    "outputs": [],
    "source": [
     "from aeon.datasets import load_classification, load_forecasting, load_regression"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:38.987856700Z",
+     "start_time": "2023-07-24T15:29:38.941979900Z"
+    }
    }
   },
   {
@@ -41,14 +49,14 @@
    "source": [
     "## Time Series Classification Archive\n",
     "\n",
-    "[UCR/TSML Time Series Classification Archive](https://timeseriesclassification.com)\n",
-    "hosts the UCR univariate TSC archive [1], also available from [UCR](ucrweb) and\n",
+    "The [UCR/TSML Time Series Classification Archive](https://timeseriesclassification.com)\n",
+    "hosts the UCR univariate TSC archive (also available from\n",
+    "[UCR](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/)) [1], and\n",
     "the multivariate archive [2] (previously called the UEA archive, soon to change). We\n",
-    "provide seven of these in the datasets/data directort: ACSF1, ArrowHead, BasicMotions,\n",
+    "provide seven of these in the datasets/data directory: ACSF1, ArrowHead, BasicMotions,\n",
     "GunPoint, ItalyPowerDemand, JapaneseVowels and PLAID. The archive is much bigger. The\n",
-    " last batch release was for 128 univariate [1] and 33 multivariate [2]. If you just\n",
-    " want to download them all, please go to the [website]\n",
-    " (https://timeseriesclassification.com)"
+    "last batch release was for 128 univariate [1] and 33 multivariate [2] datasets. If you just\n",
+    "want to download them all, please go to the [website](https://timeseriesclassification.com)."
    ],
    "metadata": {
     "collapsed": false
@@ -56,13 +64,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 10,
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Univariate length =  127\n",
+      "Univariate length =  128\n",
       "Multivariate length =  33\n"
      ]
     }
@@ -75,7 +83,11 @@
     "print(\"Multivariate length = \", len(multivariate))"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:38.988882400Z",
+     "start_time": "2023-07-24T15:29:38.949956800Z"
+    }
    }
   },
   {
@@ -87,9 +99,9 @@
     "        <extract_path>/Chinatown/Chinatown_TRAIN.ts\n",
     "        <extract_path>/Chinatown/Chinatown_TEST.ts\n",
     "\n",
-    "You can load these problems directly from TSC.com and load them into memory. Note by\n",
-    "default, these functions return the data and associated metadata. This usage combines\n",
-    " the train and test splits and loads them into one `X` and one `y` array."
+    "You can load these problems directly from [https://timeseriesclassification.com] and load\n",
+    "them into memory. Note by default, these functions return the data and associated metadata.\n",
+    "This usage combines the train and test splits and loads them into one `X` and one `y` array."
    ],
    "metadata": {
     "collapsed": false
@@ -97,7 +109,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 11,
    "outputs": [
     {
      "name": "stdout",
@@ -118,20 +130,25 @@
     "print(\"\\nMeta data = \", meta)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.067643100Z",
+     "start_time": "2023-07-24T15:29:38.954944100Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
-    "If you look in aeon/datasets you should see a directory called `local_data`\n",
+    "If you look in `aeon/datasets/local_data/` you should see a directory called `Chinatown`\n",
     "containing the Chinatown datasets. All of the zips have `.ts` files. Some also have\n",
     "`.arff` and `.txt` files. If you load again, it will not download again if the file is\n",
-    "already there. If you want to store data somewhere else, you can specify a file path.\n",
-    " Also, you can load the train and test separately. This code will download the data\n",
-    " to Temp once, and load into separate train/test splits. The split argument is not\n",
-    " case sensitive. Once downloaded, `load_classification` is a equivalent to a call to\n",
-    " `load_from_tsfile`"
+    "already there. If you want to store data somewhere else, you can specify a file path\n",
+    "using the `extract_path` parameter. Additionally, you can load the train and test\n",
+    "separately as shown below.\n",
+    "\n",
+    "This code will download the data and load into separate train/test splits. The split argument is not\n",
+    "case sensitive. Once downloaded, `load_classification` is a equivalent to a call to `load_from_tsfile`."
    ],
    "metadata": {
     "collapsed": false
@@ -139,46 +156,31 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 12,
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Train shape =  (20, 1, 512)\n",
-      "Test shape =  (20, 1, 512)\n",
-      "Loaded directly shape =  (20, 1, 512)\n"
+      "Test shape =  (20, 1, 512)\n"
      ]
-    },
-    {
-     "data": {
-      "text/plain": "array([1.7400873, 1.7331051, 1.7091917, 1.6333304, 1.5405759])"
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
     }
    ],
    "source": [
     "X_train, y_train = load_classification(\n",
-    "    \"BeetleFly\", extract_path=\"C://Temp/\", split=\"TRAIN\", return_metadata=False\n",
-    ")\n",
-    "X_test, y_test = load_classification(\n",
-    "    \"BeetleFly\", extract_path=\"C://Temp/\", split=\"test\", return_metadata=False\n",
+    "    \"BeetleFly\", split=\"TRAIN\", return_metadata=False\n",
     ")\n",
+    "X_test, y_test = load_classification(\"BeetleFly\", split=\"test\", return_metadata=False)\n",
     "print(\"Train shape = \", X_train.shape)\n",
-    "print(\"Test shape = \", X_test.shape)\n",
-    "from aeon.datasets import load_from_tsfile\n",
-    "\n",
-    "X_train, y_train = load_from_tsfile(\n",
-    "    full_file_path_and_name=\"C://Temp/BeetleFly/BeetleFLY_TRAIN\"\n",
-    ")\n",
-    "print(\"Loaded directly shape = \", X_train.shape)\n",
-    "\n",
-    "X_test[0][0][:5]"
+    "print(\"Test shape = \", X_test.shape)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.229225500Z",
+     "start_time": "2023-07-24T15:29:39.068640Z"
+    }
    }
   },
   {
@@ -186,10 +188,10 @@
    "source": [
     "## Time Series (Extrinsic) Regression\n",
     "\n",
-    "[The Monash Time Series Extrinsic Regression Archive]() [3] repo (called extrinsic to\n",
-    " diffentiate if from sliding window based regression) currently contains 19\n",
-    " regression problems in .ts format. One of these, Covid3Month, is in `datasets\\data`.\n",
-    "  The usage of `load_regression` is identical to `load_classification`\n"
+    "The [Monash Time Series Extrinsic Regression Archive](http://tseregression.org/) [3] repo\n",
+    "(called extrinsic to differentiate if from sliding window based regression) currently\n",
+    "contains 19 regression problems in `.ts` format. One of these, Covid3Month, is in\n",
+    "`datasets\\data`. The usage of `load_regression` is identical to `load_classification`\n"
    ],
    "metadata": {
     "collapsed": false
@@ -197,13 +199,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 13,
    "outputs": [
     {
      "data": {
       "text/plain": "['AppliancesEnergy',\n 'AustraliaRainfall',\n 'BIDMCHR',\n 'BIDMCRR',\n 'BIDMCSpO2',\n 'BeijingPM10Quality',\n 'BeijingPM25Quality',\n 'BenzeneConcentration',\n 'Covid3Month',\n 'FloodModeling1',\n 'FloodModeling2',\n 'FloodModeling3',\n 'HouseholdPowerConsumption1',\n 'HouseholdPowerConsumption2',\n 'IEEEPPG',\n 'LiveFuelMoistureContent',\n 'NewsHeadlineSentiment',\n 'NewsTitleSentiment',\n 'PPGDalia']"
      },
-     "execution_count": 5,
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -214,12 +216,16 @@
     "list_available_tser_datasets()"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.237204700Z",
+     "start_time": "2023-07-24T15:29:39.230223400Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 14,
    "outputs": [
     {
      "name": "stdout",
@@ -234,7 +240,11 @@
     "print(\"Shape of X = \", X.shape)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:42.341536600Z",
+     "start_time": "2023-07-24T15:29:39.237204700Z"
+    }
    }
   },
   {
@@ -253,13 +263,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 15,
    "outputs": [
     {
      "data": {
       "text/plain": "['australian_electricity_demand_dataset',\n 'car_parts_dataset_with_missing_values',\n 'car_parts_dataset_without_missing_values',\n 'cif_2016_dataset',\n 'covid_deaths_dataset',\n 'covid_mobility_dataset_with_missing_values',\n 'covid_mobility_dataset_without_missing_values',\n 'dominick_dataset',\n 'elecdemand_dataset',\n 'electricity_hourly_dataset',\n 'electricity_weekly_dataset',\n 'fred_md_dataset',\n 'hospital_dataset',\n 'kaggle_web_traffic_dataset_with_missing_values',\n 'kaggle_web_traffic_dataset_without_missing_values',\n 'kaggle_web_traffic_weekly_dataset',\n 'kdd_cup_2018_dataset_with_missing_values',\n 'kdd_cup_2018_dataset_without_missing_values',\n 'london_smart_meters_dataset_with_missing_values',\n 'london_smart_meters_dataset_without_missing_values',\n 'm1_monthly_dataset',\n 'm1_quarterly_dataset',\n 'm1_yearly_dataset',\n 'm3_monthly_dataset',\n 'm3_other_dataset',\n 'm3_quarterly_dataset',\n 'm3_yearly_dataset',\n 'm4_daily_dataset',\n 'm4_hourly_dataset',\n 'm4_monthly_dataset',\n 'm4_quarterly_dataset',\n 'm4_weekly_dataset',\n 'm4_yearly_dataset',\n 'nn5_daily_dataset_with_missing_values',\n 'nn5_daily_dataset_without_missing_values',\n 'nn5_weekly_dataset',\n 'pedestrian_counts_dataset',\n 'saugeenday_dataset',\n 'solar_10_minutes_dataset',\n 'solar_4_seconds_dataset',\n 'solar_weekly_dataset',\n 'sunspot_dataset_with_missing_values',\n 'sunspot_dataset_without_missing_values',\n 'tourism_monthly_dataset',\n 'tourism_quarterly_dataset',\n 'tourism_yearly_dataset',\n 'traffic_hourly_dataset',\n 'traffic_weekly_dataset',\n 'us_births_dataset',\n 'weather_dataset',\n 'wind_4_seconds_dataset',\n 'wind_farms_minutely_dataset_with_missing_values',\n 'wind_farms_minutely_dataset_without_missing_values']"
      },
-     "execution_count": 7,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -270,12 +280,16 @@
     "list_available_tsf_datasets()"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:42.347519900Z",
+     "start_time": "2023-07-24T15:29:42.341536600Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 16,
    "outputs": [
     {
      "name": "stdout",
@@ -307,7 +321,11 @@
     "print(data)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:49.815481300Z",
+     "start_time": "2023-07-24T15:29:42.347519900Z"
+    }
    }
   },
   {
@@ -319,9 +337,9 @@
     " and experimental evaluation of recent algorithmic advances, Data Mining and\n",
     " Knowledge Discovery 35(2), 2021\n",
     "[3] Tan et. al, Time Series Extrinsic Regression, Data Mining and Knowledge\n",
-    "Discovery, 2021\n",
-    "[4] Godahewa et. al, Monash Time Series Forecasting Archive,Neural Information\n",
-    "Processing Systems Track on Datasets and Benchmarks, 2021\n"
+    " Discovery, 2021\n",
+    "[4] Godahewa et. al, Monash Time Series Forecasting Archive, Neural Information\n",
+    " Processing Systems Track on Datasets and Benchmarks, 2021\n"
    ],
    "metadata": {
     "collapsed": false
diff --git a/examples/datasets/data_conversions.ipynb b/examples/datasets/data_conversions.ipynb
index a228a7affd..6aa41242d4 100644
--- a/examples/datasets/data_conversions.ipynb
+++ b/examples/datasets/data_conversions.ipynb
@@ -6,11 +6,11 @@
     "# Data conversions in aeon\n",
     "\n",
     "We recommend you follow the data storage described in the [data storage notebook](examples/datasets/data_storage.ipynb)\n",
-    "which can be summarised as follows: Use `pd.Series` or `pd.DataFrame` for forecasting\n",
-    " and for classification, clustering and regression, use 3D numpy of shape `(n_cases,\n",
-    " n_channels, n_timepoints)` if your collection of time series are equal length, or a\n",
-    "  list of 2D numpy of length `[n_cases]` if not equal length. All are [data loaders]\n",
-    "  (examples/datasets/data_loading.ipynb)  use this format.\n",
+    "which can be summarised as follows: Use `pd.Series` or `pd.DataFrame` for tasks\n",
+    "which focus on single series such a forecasting, and for tasks such as classification,\n",
+    "clustering and regression use a 3D numpy array of shape `(n_cases, n_channels, n_timepoints)`\n",
+    "if your collection of time series are equal length, or a list of 2D numpy of length `[n_cases]`\n",
+    "if not equal length. All are [data loaders](examples/datasets/data_loading.ipynb) use this format.\n",
     "\n",
     "However, `aeon` provides a range of converters in the `datatypes` package. These are\n",
     "grouped into converters for single series and converters for collections of series"
diff --git a/examples/datasets/data_loading.ipynb b/examples/datasets/data_loading.ipynb
index 758106e8ed..045531830e 100644
--- a/examples/datasets/data_loading.ipynb
+++ b/examples/datasets/data_loading.ipynb
@@ -3,15 +3,23 @@
   {
    "cell_type": "markdown",
    "source": [
-    "# Loading data into aeon\n",
-    "aeon supports a range of data input formats. Example problems are described in\n",
-    "provided_data.ipyn. Downloading data is described in benchmarking_data.ipynb. You\n",
-    "can of course load and format the data so that it conforms to the input types\n",
-    "describe in data_storage. aeon also provides data formats for time series for both\n",
-    "forecasting and machine learning. These are all text files with a particular\n",
+    "# Loading data in aeon\n",
+    "\n",
+    "`aeon` supports a range of data input formats. Accepted datatypes are provided in the\n",
+    "[data conversions](examples/datasets/data_conversions.ipynb) and\n",
+    "[data storage](examples/datasets/data_storage.ipynb) notebooks. Example problems are\n",
+    "described in the [provided data notebook](examples/datasets/provided_data.ipynb), with\n",
+    "guidance on downloading popular benchmark data provided in the\n",
+    "[benchmarking data notebook](examples/datasets/benchmarking_data.ipynb).\n",
+    "\n",
+    "This notebook provides guidance on loading data from a few popular data file formats used in\n",
+    "time series machine learning and forecasting scenarios.\n",
+    "You can of course load data from whatever format you wish and then format the data so that\n",
+    "it conforms to the input types described. These are all text files with a particular\n",
     "structure. Both formats store a single time series per row.\n",
     "\n",
-    "1. The `.ts` and `.tsf` format used by the aeon packages and the [time series](https://timeseriesclassification.com) and [forecasting](https://forecastingdata.org)\n",
+    "1. `.csv`\n",
+    "2. The `.ts` and `.tsf` format used by the aeon packages and the [time series](https://timeseriesclassification.com) and [forecasting](https://forecastingdata.org)\n",
     " repositories. More information on the `.tsf` format is\n",
     "[here](https://openreview.net/pdf?id=wEc1mgAjU-)\n",
     "Links to download all of the UCR univariate and the tsml multivariate data in `.ts`\n",
diff --git a/examples/datasets/data_storage.ipynb b/examples/datasets/data_storage.ipynb
index 880c4c0b5f..e8bf8fc67f 100644
--- a/examples/datasets/data_storage.ipynb
+++ b/examples/datasets/data_storage.ipynb
@@ -5,36 +5,44 @@
    "source": [
     "# Storing data to use for aeon estimators\n",
     "\n",
-    "aeon includes time series forecasting and machine learning. These two communities\n",
-    "have different conventions on how to store data and what to call data structures.\n",
-    "Some of the differences are\n",
+    "`aeon` includes multiple time series tasks such as forecasting and machine learning\n",
+    "(i.e. classification, regression and clustering). These two communities have different\n",
+    "conventions and requirements for storing data and what to call data structures. We try\n",
+    "to accommodate for both, which leads to some differences between `aeon` packages. Some\n",
+    "differences are:\n",
     "\n",
-    "1. Forecasters almost always stores data in pandas data structures, whereas machine\n",
-    "learners use numpy arrays almost exclusively.\n",
-    "2. n forecasting a 2 dimensional data is almost always shape `(n_timepoints, n_timeseries)` whereas in\n",
-    "machine learning we would tend to store data in a `(n_timeseries, n_timepoints)`  array.\n",
-    "3. In forecasting, a variable `y` refers to a time series for which we are attempting\n",
+    "1. Forecasters almost always store data in pandas data structures internally, whereas machine\n",
+    " learners use numpy arrays almost exclusively.\n",
+    "2. Most forecasting estimators (but not all) will take a single series as a 1D or 2D array-like\n",
+    " as the data to learn from, whereas machine learning estimators will take a collection of series\n",
+    " as a 3D or 2D array-like.\n",
+    "3. In forecasting 2D arrays are almost always single series of shape `(n_timepoints, n_channels)`\n",
+    " whereas in machine learning we would tend to store data in a `(n_cases, n_timepoints)`\n",
+    " collection of series.\n",
+    "4. In forecasting, a variable `y` refers to a time series for which we are attempting\n",
     " to make a forecast, hence `y` is assumed to be ordered. In machine learning,\n",
     " `y` is a list of either class labels (for classification) or observations of a\n",
-    " response vairable (for regression). The ordering of values in `y` is determined by\n",
+    " response variables (for regression). The ordering of values in `y` is determined by\n",
     " the ordering of the `X` input.\n",
     "\n",
-    "Because of these sources of confusion, we recommend that you store data in\n",
-    "pandas data structures for forecasting and numpy arrays for machine learning."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
+    "Because of these sources of confusion, we recommend carefully reading the documentation for the task\n",
+    "prior to usage to ensure you are using the correct input data type. We also recommend that you store\n",
+    "data in pandas data structures for forecasting and numpy arrays for machine learning tasks. All of\n",
+    "our accepted input types can be used given they are compatible with the algorithms (see the\n",
+    "[data conversions notebook](examples/datasets/data_conversions.ipynb) for more accepted types), but\n",
+    "keeping to the recommended types is likely to reduce the number of data conversions and make finding help\n",
+    "easier.\n",
+    "\n",
+    "In the following, we provide guidance and examples for storing data for forecasting and machine learning\n",
+    "using our recommended data types.\n",
+    "\n",
     "## Forecasting data\n",
     "\n",
-    "aeon forecasting uses pd.Series, pd.DataFrame and pd.Multiindex to store data. It  has\n",
-    "some built in forecasting datasets and tools for downloading commonly used\n",
-    "benchmarks, loading_data.ipynb forecasting section. For details of the forecasting\n",
-    "functionality, see the numerous forecasting notebooks.\n",
+    "The `aeon` forecasting module primarily uses pd.Series, pd.DataFrame and pd.Multiindex to store data.\n",
+    "It has some built in forecasting datasets and tools for downloading commonly used benchmarks, see the\n",
+    "[data loading notebook](examples/datasets/loading_data.ipynb.ipynb) forecasting section. For details of\n",
+    "the forecasting functionality, see the [forecasting user guide](examples/forecasting/forecasting.ipynb)\n",
+    "and the numerous forecasting notebooks on the [examples page](examples).\n",
     "\n",
     "`pd.Series` are used to store a univariate time series with entries corresponding to\n",
     "different time points."
@@ -45,13 +53,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 19,
    "outputs": [
     {
      "data": {
-      "text/plain": "5    120.0\n6    140.0\n7    160.0\ndtype: float64"
+      "text/plain": "0     20.0\n1     40.0\n2     60.0\n3     80.0\n4    100.0\ndtype: float64"
      },
-     "execution_count": 1,
+     "execution_count": 19,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -61,23 +69,48 @@
     "import numpy as np\n",
     "import pandas as pd\n",
     "\n",
+    "y = pd.Series([20.0, 40.0, 60.0, 80.0, 100.0])\n",
+    "y"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.192506600Z",
+     "start_time": "2023-07-24T11:45:57.132654400Z"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "5    120.0\n6    140.0\n7    160.0\ndtype: float64"
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
     "from aeon.forecasting.trend import TrendForecaster\n",
     "\n",
-    "y = pd.Series([20.0, 40.0, 60.0, 80.0, 100.0])\n",
-    "forecaster = TrendForecaster()\n",
-    "forecaster.fit(y)  # fit the forecaster\n",
-    "forecaster.predict(fh=[1, 2, 3])  # forecast the next 3 values"
+    "tf = TrendForecaster()\n",
+    "tf.fit(y)  # fit the forecaster\n",
+    "tf.predict(fh=[1, 2, 3])  # forecast the next 3 values"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.194505900Z",
+     "start_time": "2023-07-24T11:45:57.140619300Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
-    "`pd.Series` are used to store a univariate time series with entries corresponding to\n",
-    "different time points.\n",
-    "\n",
     "`pd.DataFrame` are used to store multiple time series, where each column is a time\n",
     "series, and each row corresponds to a different, distinct time point. The index\n",
     "is the time point and should be monotonic. This creates two series called Sales and\n",
@@ -89,27 +122,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 21,
    "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "   Sales  Temperature\n",
-      "0    111           26\n",
-      "1    100           21\n",
-      "2     90           19\n",
-      "3     80           14\n",
-      "4     65           12\n",
-      "5     89           22\n"
-     ]
-    },
     {
      "data": {
-      "text/plain": "   Sales  Temperature\n6   89.0         22.0\n7   89.0         22.0\n8   89.0         22.0",
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Sales</th>\n      <th>Temperature</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>6</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+      "text/plain": "   Sales  Temperature\n0    111           26\n1    100           21\n2     90           19\n3     80           14\n4     65           12\n5     89           22",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Sales</th>\n      <th>Temperature</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>111</td>\n      <td>26</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>100</td>\n      <td>21</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>90</td>\n      <td>19</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>80</td>\n      <td>14</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>65</td>\n      <td>12</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>89</td>\n      <td>22</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
      },
-     "execution_count": 2,
+     "execution_count": 21,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -121,15 +141,43 @@
     "}\n",
     "# Create DataFrame\n",
     "ice_creams = pd.DataFrame(ice_creams)\n",
-    "print(ice_creams)\n",
+    "ice_creams"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.245339100Z",
+     "start_time": "2023-07-24T11:45:57.148598Z"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "   Sales  Temperature\n6   89.0         22.0\n7   89.0         22.0\n8   89.0         22.0",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Sales</th>\n      <th>Temperature</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>6</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>89.0</td>\n      <td>22.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
     "from aeon.forecasting.exp_smoothing import ExponentialSmoothing\n",
     "\n",
-    "forecaster = ExponentialSmoothing()\n",
-    "forecaster.fit(ice_creams)\n",
-    "forecaster.predict(fh=[1, 2, 3])"
+    "es = ExponentialSmoothing()\n",
+    "es.fit(ice_creams)\n",
+    "es.predict(fh=[1, 2, 3])"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.256309400Z",
+     "start_time": "2023-07-24T11:45:57.156602400Z"
+    }
    }
   },
   {
@@ -143,21 +191,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 23,
    "outputs": [
     {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "                     Sales  Temperature\n",
-      "datetime                               \n",
-      "2018-01-06 23:15:00    111           26\n",
-      "2019-02-09 01:48:00    100           21\n",
-      "2020-08-06 13:20:00     90           19\n",
-      "2021-07-03 14:50:00     80           14\n",
-      "2022-07-06 11:50:00     65           12\n",
-      "2023-03-05 16:50:00     89           22\n"
-     ]
+     "data": {
+      "text/plain": "                     Sales  Temperature\ndatetime                               \n2018-01-06 23:15:00    111           26\n2019-02-09 01:48:00    100           21\n2020-08-06 13:20:00     90           19\n2021-07-03 14:50:00     80           14\n2022-07-06 11:50:00     65           12\n2023-03-05 16:50:00     89           22",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Sales</th>\n      <th>Temperature</th>\n    </tr>\n    <tr>\n      <th>datetime</th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2018-01-06 23:15:00</th>\n      <td>111</td>\n      <td>26</td>\n    </tr>\n    <tr>\n      <th>2019-02-09 01:48:00</th>\n      <td>100</td>\n      <td>21</td>\n    </tr>\n    <tr>\n      <th>2020-08-06 13:20:00</th>\n      <td>90</td>\n      <td>19</td>\n    </tr>\n    <tr>\n      <th>2021-07-03 14:50:00</th>\n      <td>80</td>\n      <td>14</td>\n    </tr>\n    <tr>\n      <th>2022-07-06 11:50:00</th>\n      <td>65</td>\n      <td>12</td>\n    </tr>\n    <tr>\n      <th>2023-03-05 16:50:00</th>\n      <td>89</td>\n      <td>22</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -172,17 +215,21 @@
     "    ]\n",
     ")\n",
     "ice_creams = ice_creams.set_index(\"datetime\")\n",
-    "print(ice_creams)"
+    "ice_creams"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.257307Z",
+     "start_time": "2023-07-24T11:45:57.179516200Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
     "`pd.DataFrame` also have the capability to store multiple indexes, which can be used\n",
-    "to represent whats called Panel data in forecasting hierarchical data. A Panel is a\n",
+    "to represent what's called Panel data in forecasting hierarchical data. A Panel is a\n",
     "collection of (possibly) multivariate data."
    ],
    "metadata": {
@@ -191,14 +238,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 24,
    "outputs": [
     {
      "data": {
-      "text/plain": "                            c0\nh0   h1   time                \nh0_0 h1_0 2000-01-01  2.199534\n          2000-01-02  5.267746\n          2000-01-03  4.792742\n          2000-01-04  3.115800\n          2000-01-05  5.581822",
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th></th>\n      <th>c0</th>\n    </tr>\n    <tr>\n      <th>h0</th>\n      <th>h1</th>\n      <th>time</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th rowspan=\"5\" valign=\"top\">h0_0</th>\n      <th rowspan=\"5\" valign=\"top\">h1_0</th>\n      <th>2000-01-01</th>\n      <td>2.199534</td>\n    </tr>\n    <tr>\n      <th>2000-01-02</th>\n      <td>5.267746</td>\n    </tr>\n    <tr>\n      <th>2000-01-03</th>\n      <td>4.792742</td>\n    </tr>\n    <tr>\n      <th>2000-01-04</th>\n      <td>3.115800</td>\n    </tr>\n    <tr>\n      <th>2000-01-05</th>\n      <td>5.581822</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+      "text/plain": "                            c0\nh0   h1   time                \nh0_0 h1_0 2000-01-01  4.249534\n          2000-01-02  2.899939\n          2000-01-03  2.671320\n          2000-01-04  4.380220\n          2000-01-05  5.538047\n...                        ...\nh0_1 h1_3 2000-01-08  3.658460\n          2000-01-09  3.672319\n          2000-01-10  2.938018\n          2000-01-11  2.902982\n          2000-01-12  2.871146\n\n[96 rows x 1 columns]",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th></th>\n      <th>c0</th>\n    </tr>\n    <tr>\n      <th>h0</th>\n      <th>h1</th>\n      <th>time</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th rowspan=\"5\" valign=\"top\">h0_0</th>\n      <th rowspan=\"5\" valign=\"top\">h1_0</th>\n      <th>2000-01-01</th>\n      <td>4.249534</td>\n    </tr>\n    <tr>\n      <th>2000-01-02</th>\n      <td>2.899939</td>\n    </tr>\n    <tr>\n      <th>2000-01-03</th>\n      <td>2.671320</td>\n    </tr>\n    <tr>\n      <th>2000-01-04</th>\n      <td>4.380220</td>\n    </tr>\n    <tr>\n      <th>2000-01-05</th>\n      <td>5.538047</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <th>...</th>\n      <th>...</th>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th rowspan=\"5\" valign=\"top\">h0_1</th>\n      <th rowspan=\"5\" valign=\"top\">h1_3</th>\n      <th>2000-01-08</th>\n      <td>3.658460</td>\n    </tr>\n    <tr>\n      <th>2000-01-09</th>\n      <td>3.672319</td>\n    </tr>\n    <tr>\n      <th>2000-01-10</th>\n      <td>2.938018</td>\n    </tr>\n    <tr>\n      <th>2000-01-11</th>\n      <td>2.902982</td>\n    </tr>\n    <tr>\n      <th>2000-01-12</th>\n      <td>2.871146</td>\n    </tr>\n  </tbody>\n</table>\n<p>96 rows × 1 columns</p>\n</div>"
      },
-     "execution_count": 4,
+     "execution_count": 24,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -207,40 +254,47 @@
     "from aeon.utils._testing.hierarchical import _make_hierarchical\n",
     "\n",
     "y = _make_hierarchical()\n",
-    "y.head()"
+    "y"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.258304600Z",
+     "start_time": "2023-07-24T11:45:57.188516600Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 25,
    "outputs": [
     {
      "data": {
-      "text/plain": "                            c0\nh0   h1   time                \nh0_0 h1_0 2000-01-13  4.076904\n          2000-01-14  4.076904\n     h1_1 2000-01-13  5.185745\n          2000-01-14  5.185745\n     h1_2 2000-01-13  3.773312\n          2000-01-14  3.773312\n     h1_3 2000-01-13  2.851027\n          2000-01-14  2.851027\nh0_1 h1_0 2000-01-13  3.468474\n          2000-01-14  3.468474\n     h1_1 2000-01-13  4.421536\n          2000-01-14  4.421536\n     h1_2 2000-01-13  3.791238\n          2000-01-14  3.791238\n     h1_3 2000-01-13  4.026049\n          2000-01-14  4.026049",
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th></th>\n      <th>c0</th>\n    </tr>\n    <tr>\n      <th>h0</th>\n      <th>h1</th>\n      <th>time</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th rowspan=\"8\" valign=\"top\">h0_0</th>\n      <th rowspan=\"2\" valign=\"top\">h1_0</th>\n      <th>2000-01-13</th>\n      <td>4.076904</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>4.076904</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_1</th>\n      <th>2000-01-13</th>\n      <td>5.185745</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>5.185745</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_2</th>\n      <th>2000-01-13</th>\n      <td>3.773312</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.773312</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_3</th>\n      <th>2000-01-13</th>\n      <td>2.851027</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>2.851027</td>\n    </tr>\n    <tr>\n      <th rowspan=\"8\" valign=\"top\">h0_1</th>\n      <th rowspan=\"2\" valign=\"top\">h1_0</th>\n      <th>2000-01-13</th>\n      <td>3.468474</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.468474</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_1</th>\n      <th>2000-01-13</th>\n      <td>4.421536</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>4.421536</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_2</th>\n      <th>2000-01-13</th>\n      <td>3.791238</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.791238</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_3</th>\n      <th>2000-01-13</th>\n      <td>4.026049</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>4.026049</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+      "text/plain": "                            c0\nh0   h1   time                \nh0_0 h1_0 2000-01-13  4.200625\n          2000-01-14  4.200625\n     h1_1 2000-01-13  3.714500\n          2000-01-14  3.714500\n     h1_2 2000-01-13  3.982618\n          2000-01-14  3.982618\n     h1_3 2000-01-13  3.911963\n          2000-01-14  3.911963\nh0_1 h1_0 2000-01-13  3.627664\n          2000-01-14  3.627664\n     h1_1 2000-01-13  3.844651\n          2000-01-14  3.844651\n     h1_2 2000-01-13  3.889248\n          2000-01-14  3.889248\n     h1_3 2000-01-13  3.119286\n          2000-01-14  3.119286",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th></th>\n      <th></th>\n      <th>c0</th>\n    </tr>\n    <tr>\n      <th>h0</th>\n      <th>h1</th>\n      <th>time</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th rowspan=\"8\" valign=\"top\">h0_0</th>\n      <th rowspan=\"2\" valign=\"top\">h1_0</th>\n      <th>2000-01-13</th>\n      <td>4.200625</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>4.200625</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_1</th>\n      <th>2000-01-13</th>\n      <td>3.714500</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.714500</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_2</th>\n      <th>2000-01-13</th>\n      <td>3.982618</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.982618</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_3</th>\n      <th>2000-01-13</th>\n      <td>3.911963</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.911963</td>\n    </tr>\n    <tr>\n      <th rowspan=\"8\" valign=\"top\">h0_1</th>\n      <th rowspan=\"2\" valign=\"top\">h1_0</th>\n      <th>2000-01-13</th>\n      <td>3.627664</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.627664</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_1</th>\n      <th>2000-01-13</th>\n      <td>3.844651</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.844651</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_2</th>\n      <th>2000-01-13</th>\n      <td>3.889248</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.889248</td>\n    </tr>\n    <tr>\n      <th rowspan=\"2\" valign=\"top\">h1_3</th>\n      <th>2000-01-13</th>\n      <td>3.119286</td>\n    </tr>\n    <tr>\n      <th>2000-01-14</th>\n      <td>3.119286</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
      },
-     "execution_count": 5,
+     "execution_count": 25,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "forecaster.fit(y, fh=[1, 2]).predict()"
+    "es.fit(y, fh=[1, 2]).predict()"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.418875200Z",
+     "start_time": "2023-07-24T11:45:57.200459Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
     "`np.ndarray` can be used with the forecasters in aeon, although we recommend using\n",
-    "pandas. One dimensional np.ndarray are treated as a single time series. 2D numpy\n",
-    "array are treated as multiple series of shape `(n_timeseries, n_timepoints)`.\n",
-    "Forecasters fit independently on each series."
+    "pandas. One-dimensional np.ndarray are treated as a single time series. 2D numpy\n",
+    "arrays are treated as multiple series of shape `(n_timeseries, n_timepoints)`."
    ],
    "metadata": {
     "collapsed": false
@@ -248,13 +302,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 26,
    "outputs": [
     {
      "data": {
       "text/plain": "array([[120.],\n       [140.],\n       [160.]])"
      },
-     "execution_count": 6,
+     "execution_count": 26,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -266,18 +320,22 @@
     "forecaster.predict(fh=[1, 2, 3])  # forecast the next 3 values"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.420869900Z",
+     "start_time": "2023-07-24T11:45:57.299224700Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 27,
    "outputs": [
     {
      "data": {
       "text/plain": "array([[120.,  50.],\n       [140.,  40.],\n       [160.,  30.]])"
      },
-     "execution_count": 7,
+     "execution_count": 27,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -290,7 +348,11 @@
     "forecaster.predict(fh=[1, 2, 3])  # forecast the next 3 values"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.444806900Z",
+     "start_time": "2023-07-24T11:45:57.308171700Z"
+    }
    }
   },
   {
@@ -310,7 +372,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 28,
    "outputs": [
     {
      "name": "stdout",
@@ -330,12 +392,16 @@
     "print(\"X shape = \", X.shape, \" First series =\", X[0], \"second series = \", X[1])"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.445803400Z",
+     "start_time": "2023-07-24T11:45:57.324129700Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 29,
    "outputs": [
     {
      "name": "stdout",
@@ -351,14 +417,6 @@
       " [ 14.  70.  60.  22.]\n",
       " [ 49.  49.  66.   9.]]\n"
      ]
-    },
-    {
-     "data": {
-      "text/plain": "array([0, 1, 1, 1], dtype=int64)"
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -371,7 +429,30 @@
     "    ]\n",
     ")\n",
     "# n_cases = 4, n_channels =3, n_timepoints = 4\n",
-    "print(\"X shape = \", X.shape, \"\\n First series =\\n\", X[0], \"\\nsecond series = \\n\", X[1])\n",
+    "print(\"X shape = \", X.shape, \"\\n First series =\\n\", X[0], \"\\nsecond series = \\n\", X[1])"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.446800500Z",
+     "start_time": "2023-07-24T11:45:57.330112900Z"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "array([0, 1, 1, 1], dtype=int64)"
+     },
+     "execution_count": 30,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
     "from aeon.clustering.k_means import TimeSeriesKMeans\n",
     "\n",
     "kmeans = TimeSeriesKMeans(metric=\"euclidean\", n_clusters=2)\n",
@@ -379,7 +460,11 @@
     "kmeans.predict(X)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.473727600Z",
+     "start_time": "2023-07-24T11:45:57.337094Z"
+    }
    }
   },
   {
@@ -394,13 +479,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 31,
    "outputs": [
     {
      "data": {
       "text/plain": "array(['pass', 'pass', 'fail', 'fail'], dtype='<U4')"
      },
-     "execution_count": 10,
+     "execution_count": 31,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -408,15 +493,42 @@
    "source": [
     "y = np.array([1, 1, 0, 0])\n",
     "y2 = np.array([\"pass\", \"pass\", \"fail\", \"fail\"])\n",
+    "y2"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.496701Z",
+     "start_time": "2023-07-24T11:45:57.346070900Z"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "array(['pass', 'pass', 'fail', 'fail'], dtype='<U4')"
+     },
+     "execution_count": 32,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
     "from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier\n",
     "\n",
     "knn = KNeighborsTimeSeriesClassifier(distance=\"dtw\")\n",
-    "knn.fit(X, y)\n",
     "knn.fit(X, y2)\n",
     "knn.predict(X)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.497692900Z",
+     "start_time": "2023-07-24T11:45:57.353051100Z"
+    }
    }
   },
   {
@@ -430,19 +542,43 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 33,
    "outputs": [
     {
      "data": {
       "text/plain": "array([ 1.5,  4.3, -2. , 10. ])"
      },
-     "execution_count": 11,
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y3 = np.array([1.5, 4.3, -2.0, 10])\n",
+    "y3"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.516613700Z",
+     "start_time": "2023-07-24T11:45:57.360032Z"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "array([1., 1., 0., 0.])"
+     },
+     "execution_count": 34,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "y = np.array([1.5, 4.3, -2.0, 10])\n",
     "from aeon.regression.distance_based import KNeighborsTimeSeriesRegressor\n",
     "\n",
     "knn_r = KNeighborsTimeSeriesRegressor(distance=\"dtw\")\n",
@@ -450,7 +586,11 @@
     "knn_r.predict(X)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.547556200Z",
+     "start_time": "2023-07-24T11:45:57.367014200Z"
+    }
    }
   },
   {
@@ -459,7 +599,7 @@
     "If the time series are not all equal length, they should be stored as a list of 2D\n",
     "numpy arrays. Some estimators can deal with unequal length series. Those that can't\n",
     "will raise an exception if passed unequal length series. Note we assume that channels\n",
-    " are all the same length for any given series."
+    "are all the same length for any given series."
    ],
    "metadata": {
     "collapsed": false
@@ -467,13 +607,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 35,
    "outputs": [
     {
      "data": {
-      "text/plain": "array([0, 0, 1])"
+      "text/plain": "[array([[20. , 40. , 60. , 55. , 66. ],\n        [10. , 11. , 12. , 11. , 66. ],\n        [-4. , 15. ,  6.6, 12. , 44. ]]),\n array([[10, 90, 80],\n        [70, 60, 22],\n        [49, 66,  9]]),\n array([[ 22,  93,  18, 100],\n        [ 34, 170,   0,  87],\n        [ 49,  49,  33,  49]])]"
      },
-     "execution_count": 12,
+     "execution_count": 35,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -486,27 +626,50 @@
     "X_uneq.append(x0)\n",
     "X_uneq.append(x1)\n",
     "X_uneq.append(x2)\n",
-    "y = np.array([0, 0, 1])\n",
-    "knn.fit(X_uneq, y)\n",
-    "knn.predict(X_uneq)"
+    "\n",
+    "X_uneq"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.548553800Z",
+     "start_time": "2023-07-24T11:45:57.372997800Z"
+    }
    }
   },
   {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": 36,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "array([0, 0, 1])"
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "aeon has several standard problems baked in, and facilities for loading data from\n",
-    "external sources. Please see [the data loading notebook](examples/datasets/loading_data.ipynb)"
+    "y = np.array([0, 0, 1])\n",
+    "knn.fit(X_uneq, y)\n",
+    "knn.predict(X_uneq)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T11:45:57.548553800Z",
+     "start_time": "2023-07-24T11:45:57.379979300Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
-   "source": [],
+   "source": [
+    "`aeon` has several standard problems baked in, and facilities for loading data from\n",
+    "external sources. Please see the [provided data notebook](examples/datasets/provided_data.ipynb)\n",
+    "and [data loading notebook](examples/datasets/data_loading.ipynb)."
+   ],
    "metadata": {
     "collapsed": false
    }

From 1940abba0a66849aeb8d24cad1bf283410dd2127 Mon Sep 17 00:00:00 2001
From: MatthewMiddlehurst <m.middlehurst@uea.ac.uk>
Date: Wed, 26 Jul 2023 11:17:35 +0100
Subject: [PATCH 13/14] fixes

---
 examples/datasets/benchmarking_data.ipynb | 7 ++++++-
 examples/datasets/data_loading.ipynb      | 3 +--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/examples/datasets/benchmarking_data.ipynb b/examples/datasets/benchmarking_data.ipynb
index 7ad658f6a5..76b0d21134 100644
--- a/examples/datasets/benchmarking_data.ipynb
+++ b/examples/datasets/benchmarking_data.ipynb
@@ -99,7 +99,8 @@
     "        <extract_path>/Chinatown/Chinatown_TRAIN.ts\n",
     "        <extract_path>/Chinatown/Chinatown_TEST.ts\n",
     "\n",
-    "You can load these problems directly from [https://timeseriesclassification.com] and load\n",
+    "You can load these problems directly from\n",
+    "[https://timeseriesclassification.com](https://timeseriesclassification.com) and load\n",
     "them into memory. Note by default, these functions return the data and associated metadata.\n",
     "This usage combines the train and test splits and loads them into one `X` and one `y` array."
    ],
@@ -332,12 +333,16 @@
    "cell_type": "markdown",
    "source": [
     "## References\n",
+    "\n",
     "[1] Dau et. al, The UCR time series archive, IEEE/CAA Journal of Automatica Sinica, 2019\n",
+    "\n",
     "[2] Ruiz et. al, The great multivariate time series classification bake off: a review\n",
     " and experimental evaluation of recent algorithmic advances, Data Mining and\n",
     " Knowledge Discovery 35(2), 2021\n",
+    "\n",
     "[3] Tan et. al, Time Series Extrinsic Regression, Data Mining and Knowledge\n",
     " Discovery, 2021\n",
+    "\n",
     "[4] Godahewa et. al, Monash Time Series Forecasting Archive, Neural Information\n",
     " Processing Systems Track on Datasets and Benchmarks, 2021\n"
    ],
diff --git a/examples/datasets/data_loading.ipynb b/examples/datasets/data_loading.ipynb
index 045531830e..74b13b464a 100644
--- a/examples/datasets/data_loading.ipynb
+++ b/examples/datasets/data_loading.ipynb
@@ -18,8 +18,7 @@
     "it conforms to the input types described. These are all text files with a particular\n",
     "structure. Both formats store a single time series per row.\n",
     "\n",
-    "1. `.csv`\n",
-    "2. The `.ts` and `.tsf` format used by the aeon packages and the [time series](https://timeseriesclassification.com) and [forecasting](https://forecastingdata.org)\n",
+    "1. The `.ts` and `.tsf` format used by the aeon packages and the [time series](https://timeseriesclassification.com) and [forecasting](https://forecastingdata.org)\n",
     " repositories. More information on the `.tsf` format is\n",
     "[here](https://openreview.net/pdf?id=wEc1mgAjU-)\n",
     "Links to download all of the UCR univariate and the tsml multivariate data in `.ts`\n",

From 7db60f252b5eafbcdd63e12d71f195078c4e2d33 Mon Sep 17 00:00:00 2001
From: MatthewMiddlehurst <m.middlehurst@uea.ac.uk>
Date: Mon, 9 Oct 2023 14:23:34 +0100
Subject: [PATCH 14/14] rename

---
 examples/datasets/benchmarking_data.ipynb  | 375 ---------------------
 examples/datasets/load_data_from_web.ipynb | 167 +++++----
 2 files changed, 95 insertions(+), 447 deletions(-)
 delete mode 100644 examples/datasets/benchmarking_data.ipynb

diff --git a/examples/datasets/benchmarking_data.ipynb b/examples/datasets/benchmarking_data.ipynb
deleted file mode 100644
index 76b0d21134..0000000000
--- a/examples/datasets/benchmarking_data.ipynb
+++ /dev/null
@@ -1,375 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "source": [
-    "# Downloading and loading benchmarking datasets\n",
-    "\n",
-    "It is common to use standard collections of data to compare different estimators for\n",
-    "classification, clustering, regression and forecasting. Some of the smaller datasets from\n",
-    "these datasets included with `aeon` in the `aeon/datasets/data` directory. However,\n",
-    "there is way to many datasets to include them all, and some of the files are far too big\n",
-    "to consider including in the package. `aeon` provides tools to download these data to use\n",
-    "in benchmarking experiments. Classification and regression data are stored in .ts format.\n",
-    "Forecasting data are stored in the equivalent .tsf format. See the\n",
-    "[data loading notebook](examples/data_loading.ipynb) for more info.\n",
-    "\n",
-    "Classification and regression are loaded into 3D numpy arrays of shape\n",
-    "`(n_cases, n_channels, n_timepoints)` if equal length or a list of length\n",
-    "`n_cases` of 2D numpy arrays of shape `(n_channels, n_timepoints)` if\n",
-    "`n_timepoints` is different between cases. Forecasting data are loaded into\n",
-    "pd.DataFrame. For more information on aeon data types see the\n",
-    "[data storage notebook](examples/data_storage.ipynb).\n",
-    "\n",
-    "Note that this notebook is dependent on external websites, so will not function if\n",
-    "you are not online or the associated website is down. We use the following three\n",
-    "functions"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "outputs": [],
-   "source": [
-    "from aeon.datasets import load_classification, load_forecasting, load_regression"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:38.987856700Z",
-     "start_time": "2023-07-24T15:29:38.941979900Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "## Time Series Classification Archive\n",
-    "\n",
-    "The [UCR/TSML Time Series Classification Archive](https://timeseriesclassification.com)\n",
-    "hosts the UCR univariate TSC archive (also available from\n",
-    "[UCR](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/)) [1], and\n",
-    "the multivariate archive [2] (previously called the UEA archive, soon to change). We\n",
-    "provide seven of these in the datasets/data directory: ACSF1, ArrowHead, BasicMotions,\n",
-    "GunPoint, ItalyPowerDemand, JapaneseVowels and PLAID. The archive is much bigger. The\n",
-    "last batch release was for 128 univariate [1] and 33 multivariate [2] datasets. If you just\n",
-    "want to download them all, please go to the [website](https://timeseriesclassification.com)."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Univariate length =  128\n",
-      "Multivariate length =  33\n"
-     ]
-    }
-   ],
-   "source": [
-    "from aeon.datasets.tsc_data_lists import multivariate, univariate\n",
-    "\n",
-    "# This file also contains sub lists by type, e.g. unequal length\n",
-    "print(\"Univariate length = \", len(univariate))\n",
-    "print(\"Multivariate length = \", len(multivariate))"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:38.988882400Z",
-     "start_time": "2023-07-24T15:29:38.949956800Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "A default train and test split is provided for this data. The file structure for a\n",
-    "problem such as Chinatown is\n",
-    "\n",
-    "        <extract_path>/Chinatown/Chinatown_TRAIN.ts\n",
-    "        <extract_path>/Chinatown/Chinatown_TEST.ts\n",
-    "\n",
-    "You can load these problems directly from\n",
-    "[https://timeseriesclassification.com](https://timeseriesclassification.com) and load\n",
-    "them into memory. Note by default, these functions return the data and associated metadata.\n",
-    "This usage combines the train and test splits and loads them into one `X` and one `y` array."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Shape of X =  (363, 1, 24)\n",
-      "First case =  [ 573.  375.  301.  212.   55.   34.   25.   33.  113.  143.  303.  615.\n",
-      " 1226. 1281. 1221. 1081.  866. 1096. 1039.  975.  746.  581.  409.  182.]  has label =  1\n",
-      "\n",
-      "Meta data =  {'problemname': 'chinatown', 'timestamps': False, 'missing': False, 'univariate': True, 'equallength': True, 'classlabel': True, 'targetlabel': False, 'class_values': ['1', '2']}\n"
-     ]
-    }
-   ],
-   "source": [
-    "X, y, meta = load_classification(\"Chinatown\")\n",
-    "print(\"Shape of X = \", X.shape)\n",
-    "print(\"First case = \", X[0][0], \" has label = \", y[0])\n",
-    "print(\"\\nMeta data = \", meta)"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:39.067643100Z",
-     "start_time": "2023-07-24T15:29:38.954944100Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "If you look in `aeon/datasets/local_data/` you should see a directory called `Chinatown`\n",
-    "containing the Chinatown datasets. All of the zips have `.ts` files. Some also have\n",
-    "`.arff` and `.txt` files. If you load again, it will not download again if the file is\n",
-    "already there. If you want to store data somewhere else, you can specify a file path\n",
-    "using the `extract_path` parameter. Additionally, you can load the train and test\n",
-    "separately as shown below.\n",
-    "\n",
-    "This code will download the data and load into separate train/test splits. The split argument is not\n",
-    "case sensitive. Once downloaded, `load_classification` is a equivalent to a call to `load_from_tsfile`."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Train shape =  (20, 1, 512)\n",
-      "Test shape =  (20, 1, 512)\n"
-     ]
-    }
-   ],
-   "source": [
-    "X_train, y_train = load_classification(\n",
-    "    \"BeetleFly\", split=\"TRAIN\", return_metadata=False\n",
-    ")\n",
-    "X_test, y_test = load_classification(\"BeetleFly\", split=\"test\", return_metadata=False)\n",
-    "print(\"Train shape = \", X_train.shape)\n",
-    "print(\"Test shape = \", X_test.shape)"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:39.229225500Z",
-     "start_time": "2023-07-24T15:29:39.068640Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "## Time Series (Extrinsic) Regression\n",
-    "\n",
-    "The [Monash Time Series Extrinsic Regression Archive](http://tseregression.org/) [3] repo\n",
-    "(called extrinsic to differentiate if from sliding window based regression) currently\n",
-    "contains 19 regression problems in `.ts` format. One of these, Covid3Month, is in\n",
-    "`datasets\\data`. The usage of `load_regression` is identical to `load_classification`\n"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "['AppliancesEnergy',\n 'AustraliaRainfall',\n 'BIDMCHR',\n 'BIDMCRR',\n 'BIDMCSpO2',\n 'BeijingPM10Quality',\n 'BeijingPM25Quality',\n 'BenzeneConcentration',\n 'Covid3Month',\n 'FloodModeling1',\n 'FloodModeling2',\n 'FloodModeling3',\n 'HouseholdPowerConsumption1',\n 'HouseholdPowerConsumption2',\n 'IEEEPPG',\n 'LiveFuelMoistureContent',\n 'NewsHeadlineSentiment',\n 'NewsTitleSentiment',\n 'PPGDalia']"
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from aeon.datasets.dataset_collections import list_available_tser_datasets\n",
-    "\n",
-    "list_available_tser_datasets()"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:39.237204700Z",
-     "start_time": "2023-07-24T15:29:39.230223400Z"
-    }
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Shape of X =  (673, 1, 266)\n"
-     ]
-    }
-   ],
-   "source": [
-    "X, y, meta = load_regression(\"FloodModeling1\")\n",
-    "print(\"Shape of X = \", X.shape)"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:42.341536600Z",
-     "start_time": "2023-07-24T15:29:39.237204700Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "## Time Series Forecasting\n",
-    "\n",
-    "The [Monash time series forecasting](https://forecastingdata.org/) repo contains a\n",
-    "large number of forecasting data, including competition data such as M1, M3 and M4.\n",
-    "Usage is the same as the other problems, although there is no provided train/test\n",
-    "splits.\n"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "['australian_electricity_demand_dataset',\n 'car_parts_dataset_with_missing_values',\n 'car_parts_dataset_without_missing_values',\n 'cif_2016_dataset',\n 'covid_deaths_dataset',\n 'covid_mobility_dataset_with_missing_values',\n 'covid_mobility_dataset_without_missing_values',\n 'dominick_dataset',\n 'elecdemand_dataset',\n 'electricity_hourly_dataset',\n 'electricity_weekly_dataset',\n 'fred_md_dataset',\n 'hospital_dataset',\n 'kaggle_web_traffic_dataset_with_missing_values',\n 'kaggle_web_traffic_dataset_without_missing_values',\n 'kaggle_web_traffic_weekly_dataset',\n 'kdd_cup_2018_dataset_with_missing_values',\n 'kdd_cup_2018_dataset_without_missing_values',\n 'london_smart_meters_dataset_with_missing_values',\n 'london_smart_meters_dataset_without_missing_values',\n 'm1_monthly_dataset',\n 'm1_quarterly_dataset',\n 'm1_yearly_dataset',\n 'm3_monthly_dataset',\n 'm3_other_dataset',\n 'm3_quarterly_dataset',\n 'm3_yearly_dataset',\n 'm4_daily_dataset',\n 'm4_hourly_dataset',\n 'm4_monthly_dataset',\n 'm4_quarterly_dataset',\n 'm4_weekly_dataset',\n 'm4_yearly_dataset',\n 'nn5_daily_dataset_with_missing_values',\n 'nn5_daily_dataset_without_missing_values',\n 'nn5_weekly_dataset',\n 'pedestrian_counts_dataset',\n 'saugeenday_dataset',\n 'solar_10_minutes_dataset',\n 'solar_4_seconds_dataset',\n 'solar_weekly_dataset',\n 'sunspot_dataset_with_missing_values',\n 'sunspot_dataset_without_missing_values',\n 'tourism_monthly_dataset',\n 'tourism_quarterly_dataset',\n 'tourism_yearly_dataset',\n 'traffic_hourly_dataset',\n 'traffic_weekly_dataset',\n 'us_births_dataset',\n 'weather_dataset',\n 'wind_4_seconds_dataset',\n 'wind_farms_minutely_dataset_with_missing_values',\n 'wind_farms_minutely_dataset_without_missing_values']"
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from aeon.datasets.dataset_collections import list_available_tsf_datasets\n",
-    "\n",
-    "list_available_tsf_datasets()"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:42.347519900Z",
-     "start_time": "2023-07-24T15:29:42.341536600Z"
-    }
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(23000, 3)\n",
-      "{'frequency': 'yearly', 'forecast_horizon': 6, 'contain_missing_values': False, 'contain_equal_length': False}\n",
-      "  series_name     start_timestamp  \\\n",
-      "0          T1 1979-01-01 12:00:00   \n",
-      "1          T2 1979-01-01 12:00:00   \n",
-      "2          T3 1979-01-01 12:00:00   \n",
-      "3          T4 1979-01-01 12:00:00   \n",
-      "4          T5 1979-01-01 12:00:00   \n",
-      "\n",
-      "                                        series_value  \n",
-      "0  [5172.1, 5133.5, 5186.9, 5084.6, 5182.0, 5414....  \n",
-      "1  [2070.0, 2104.0, 2394.0, 1651.0, 1492.0, 1348....  \n",
-      "2  [2760.0, 2980.0, 3200.0, 3450.0, 3670.0, 3850....  \n",
-      "3  [3380.0, 3670.0, 3960.0, 4190.0, 4440.0, 4700....  \n",
-      "4  [1980.0, 2030.0, 2220.0, 2530.0, 2610.0, 2720....  \n"
-     ]
-    }
-   ],
-   "source": [
-    "X, metadata = load_forecasting(\"m4_yearly_dataset\")\n",
-    "print(X.shape)\n",
-    "print(metadata)\n",
-    "data = X.head()\n",
-    "print(data)"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2023-07-24T15:29:49.815481300Z",
-     "start_time": "2023-07-24T15:29:42.347519900Z"
-    }
-   }
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "## References\n",
-    "\n",
-    "[1] Dau et. al, The UCR time series archive, IEEE/CAA Journal of Automatica Sinica, 2019\n",
-    "\n",
-    "[2] Ruiz et. al, The great multivariate time series classification bake off: a review\n",
-    " and experimental evaluation of recent algorithmic advances, Data Mining and\n",
-    " Knowledge Discovery 35(2), 2021\n",
-    "\n",
-    "[3] Tan et. al, Time Series Extrinsic Regression, Data Mining and Knowledge\n",
-    " Discovery, 2021\n",
-    "\n",
-    "[4] Godahewa et. al, Monash Time Series Forecasting Archive, Neural Information\n",
-    " Processing Systems Track on Datasets and Benchmarks, 2021\n"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/examples/datasets/load_data_from_web.ipynb b/examples/datasets/load_data_from_web.ipynb
index c2eff91acf..76b0d21134 100644
--- a/examples/datasets/load_data_from_web.ipynb
+++ b/examples/datasets/load_data_from_web.ipynb
@@ -6,16 +6,20 @@
     "# Downloading and loading benchmarking datasets\n",
     "\n",
     "It is common to use standard collections of data to compare different estimators for\n",
-    "classification, clustering, regression and forecasting. Some of these datasets are\n",
-    "shipped with aeon in the datasets/data directory. However, the files are far too\n",
-    "big to include them all. aeon p[rovides tools to download these data to use in benchmarking experiments.\n",
-    "Classification and regression data are stored in .ts format. Forecasting\n",
-    "data are stored in the equivalent .tsf format. See the [data loading notebook](data_loading.ipynb) for more info.\n",
+    "classification, clustering, regression and forecasting. Some of the smaller datasets from\n",
+    "these datasets included with `aeon` in the `aeon/datasets/data` directory. However,\n",
+    "there is way to many datasets to include them all, and some of the files are far too big\n",
+    "to consider including in the package. `aeon` provides tools to download these data to use\n",
+    "in benchmarking experiments. Classification and regression data are stored in .ts format.\n",
+    "Forecasting data are stored in the equivalent .tsf format. See the\n",
+    "[data loading notebook](examples/data_loading.ipynb) for more info.\n",
     "\n",
-    "Classification and regression are loaded into 3D numpy arrays of shape `(n_cases, n_channels, n_timepoints)` if equal length\n",
-    "or a list of `[n_cases]` of 2D numpy if `n_timepoints` is different for different\n",
-    "cases. Forecasting data are loaded into pd.DataFrame. For more information on\n",
-    "aeon data types see the [data structures notebook](data_structures.ipynb).\n",
+    "Classification and regression are loaded into 3D numpy arrays of shape\n",
+    "`(n_cases, n_channels, n_timepoints)` if equal length or a list of length\n",
+    "`n_cases` of 2D numpy arrays of shape `(n_channels, n_timepoints)` if\n",
+    "`n_timepoints` is different between cases. Forecasting data are loaded into\n",
+    "pd.DataFrame. For more information on aeon data types see the\n",
+    "[data storage notebook](examples/data_storage.ipynb).\n",
     "\n",
     "Note that this notebook is dependent on external websites, so will not function if\n",
     "you are not online or the associated website is down. We use the following three\n",
@@ -27,13 +31,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 9,
    "outputs": [],
    "source": [
     "from aeon.datasets import load_classification, load_forecasting, load_regression"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:38.987856700Z",
+     "start_time": "2023-07-24T15:29:38.941979900Z"
+    }
    }
   },
   {
@@ -41,14 +49,14 @@
    "source": [
     "## Time Series Classification Archive\n",
     "\n",
-    "[UCR/TSML Time Series Classification Archive](https://timeseriesclassification.com)\n",
-    "hosts the UCR univariate TSC archive [1], also available from [UCR](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/) and\n",
+    "The [UCR/TSML Time Series Classification Archive](https://timeseriesclassification.com)\n",
+    "hosts the UCR univariate TSC archive (also available from\n",
+    "[UCR](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/)) [1], and\n",
     "the multivariate archive [2] (previously called the UEA archive, soon to change). We\n",
-    "provide seven of these in the datasets/data directort: ACSF1, ArrowHead, BasicMotions,\n",
+    "provide seven of these in the datasets/data directory: ACSF1, ArrowHead, BasicMotions,\n",
     "GunPoint, ItalyPowerDemand, JapaneseVowels and PLAID. The archive is much bigger. The\n",
-    " last batch release was for 128 univariate [1] and 33 multivariate [2]. If you just\n",
-    " want to download them all, please go to the [website]\n",
-    " (https://timeseriesclassification.com)"
+    "last batch release was for 128 univariate [1] and 33 multivariate [2] datasets. If you just\n",
+    "want to download them all, please go to the [website](https://timeseriesclassification.com)."
    ],
    "metadata": {
     "collapsed": false
@@ -56,7 +64,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 10,
    "outputs": [
     {
      "name": "stdout",
@@ -75,7 +83,11 @@
     "print(\"Multivariate length = \", len(multivariate))"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:38.988882400Z",
+     "start_time": "2023-07-24T15:29:38.949956800Z"
+    }
    }
   },
   {
@@ -87,9 +99,10 @@
     "        <extract_path>/Chinatown/Chinatown_TRAIN.ts\n",
     "        <extract_path>/Chinatown/Chinatown_TEST.ts\n",
     "\n",
-    "You can load these problems directly from TSC.com and load them into memory. Note by\n",
-    "default, these functions return the data and associated metadata. This usage combines\n",
-    " the train and test splits and loads them into one `X` and one `y` array."
+    "You can load these problems directly from\n",
+    "[https://timeseriesclassification.com](https://timeseriesclassification.com) and load\n",
+    "them into memory. Note by default, these functions return the data and associated metadata.\n",
+    "This usage combines the train and test splits and loads them into one `X` and one `y` array."
    ],
    "metadata": {
     "collapsed": false
@@ -97,7 +110,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 11,
    "outputs": [
     {
      "name": "stdout",
@@ -118,20 +131,25 @@
     "print(\"\\nMeta data = \", meta)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.067643100Z",
+     "start_time": "2023-07-24T15:29:38.954944100Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
-    "If you look in aeon/datasets you should see a directory called `local_data`\n",
+    "If you look in `aeon/datasets/local_data/` you should see a directory called `Chinatown`\n",
     "containing the Chinatown datasets. All of the zips have `.ts` files. Some also have\n",
     "`.arff` and `.txt` files. If you load again, it will not download again if the file is\n",
-    "already there. If you want to store data somewhere else, you can specify a file path.\n",
-    " Also, you can load the train and test separately. This code will download the data\n",
-    " to Temp once, and load into separate train/test splits. The split argument is not\n",
-    " case sensitive. Once downloaded, `load_classification` is a equivalent to a call to\n",
-    " `load_from_tsfile`"
+    "already there. If you want to store data somewhere else, you can specify a file path\n",
+    "using the `extract_path` parameter. Additionally, you can load the train and test\n",
+    "separately as shown below.\n",
+    "\n",
+    "This code will download the data and load into separate train/test splits. The split argument is not\n",
+    "case sensitive. Once downloaded, `load_classification` is a equivalent to a call to `load_from_tsfile`."
    ],
    "metadata": {
     "collapsed": false
@@ -139,46 +157,31 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 12,
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Train shape =  (20, 1, 512)\n",
-      "Test shape =  (20, 1, 512)\n",
-      "Loaded directly shape =  (20, 1, 512)\n"
+      "Test shape =  (20, 1, 512)\n"
      ]
-    },
-    {
-     "data": {
-      "text/plain": "array([1.7400873, 1.7331051, 1.7091917, 1.6333304, 1.5405759])"
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
     }
    ],
    "source": [
     "X_train, y_train = load_classification(\n",
-    "    \"BeetleFly\", extract_path=\"./Temp/\", split=\"TRAIN\", return_metadata=False\n",
-    ")\n",
-    "X_test, y_test = load_classification(\n",
-    "    \"BeetleFly\", extract_path=\"./Temp/\", split=\"test\", return_metadata=False\n",
+    "    \"BeetleFly\", split=\"TRAIN\", return_metadata=False\n",
     ")\n",
+    "X_test, y_test = load_classification(\"BeetleFly\", split=\"test\", return_metadata=False)\n",
     "print(\"Train shape = \", X_train.shape)\n",
-    "print(\"Test shape = \", X_test.shape)\n",
-    "from aeon.datasets import load_from_tsfile\n",
-    "\n",
-    "X_train, y_train = load_from_tsfile(\n",
-    "    full_file_path_and_name=\"./Temp/BeetleFly/BeetleFLY_TRAIN\"\n",
-    ")\n",
-    "print(\"Loaded directly shape = \", X_train.shape)\n",
-    "\n",
-    "X_test[0][0][:5]"
+    "print(\"Test shape = \", X_test.shape)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.229225500Z",
+     "start_time": "2023-07-24T15:29:39.068640Z"
+    }
    }
   },
   {
@@ -186,10 +189,10 @@
    "source": [
     "## Time Series (Extrinsic) Regression\n",
     "\n",
-    "[The Monash Time Series Extrinsic Regression Archive]() [3] repo (called extrinsic to\n",
-    " diffentiate if from sliding window based regression) currently contains 19\n",
-    " regression problems in .ts format. One of these, Covid3Month, is in `datasets\\data`.\n",
-    "  The usage of `load_regression` is identical to `load_classification`\n"
+    "The [Monash Time Series Extrinsic Regression Archive](http://tseregression.org/) [3] repo\n",
+    "(called extrinsic to differentiate if from sliding window based regression) currently\n",
+    "contains 19 regression problems in `.ts` format. One of these, Covid3Month, is in\n",
+    "`datasets\\data`. The usage of `load_regression` is identical to `load_classification`\n"
    ],
    "metadata": {
     "collapsed": false
@@ -197,13 +200,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 13,
    "outputs": [
     {
      "data": {
       "text/plain": "['AppliancesEnergy',\n 'AustraliaRainfall',\n 'BIDMCHR',\n 'BIDMCRR',\n 'BIDMCSpO2',\n 'BeijingPM10Quality',\n 'BeijingPM25Quality',\n 'BenzeneConcentration',\n 'Covid3Month',\n 'FloodModeling1',\n 'FloodModeling2',\n 'FloodModeling3',\n 'HouseholdPowerConsumption1',\n 'HouseholdPowerConsumption2',\n 'IEEEPPG',\n 'LiveFuelMoistureContent',\n 'NewsHeadlineSentiment',\n 'NewsTitleSentiment',\n 'PPGDalia']"
      },
-     "execution_count": 5,
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -214,12 +217,16 @@
     "list_available_tser_datasets()"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:39.237204700Z",
+     "start_time": "2023-07-24T15:29:39.230223400Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 14,
    "outputs": [
     {
      "name": "stdout",
@@ -234,7 +241,11 @@
     "print(\"Shape of X = \", X.shape)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:42.341536600Z",
+     "start_time": "2023-07-24T15:29:39.237204700Z"
+    }
    }
   },
   {
@@ -253,13 +264,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 15,
    "outputs": [
     {
      "data": {
       "text/plain": "['australian_electricity_demand_dataset',\n 'car_parts_dataset_with_missing_values',\n 'car_parts_dataset_without_missing_values',\n 'cif_2016_dataset',\n 'covid_deaths_dataset',\n 'covid_mobility_dataset_with_missing_values',\n 'covid_mobility_dataset_without_missing_values',\n 'dominick_dataset',\n 'elecdemand_dataset',\n 'electricity_hourly_dataset',\n 'electricity_weekly_dataset',\n 'fred_md_dataset',\n 'hospital_dataset',\n 'kaggle_web_traffic_dataset_with_missing_values',\n 'kaggle_web_traffic_dataset_without_missing_values',\n 'kaggle_web_traffic_weekly_dataset',\n 'kdd_cup_2018_dataset_with_missing_values',\n 'kdd_cup_2018_dataset_without_missing_values',\n 'london_smart_meters_dataset_with_missing_values',\n 'london_smart_meters_dataset_without_missing_values',\n 'm1_monthly_dataset',\n 'm1_quarterly_dataset',\n 'm1_yearly_dataset',\n 'm3_monthly_dataset',\n 'm3_other_dataset',\n 'm3_quarterly_dataset',\n 'm3_yearly_dataset',\n 'm4_daily_dataset',\n 'm4_hourly_dataset',\n 'm4_monthly_dataset',\n 'm4_quarterly_dataset',\n 'm4_weekly_dataset',\n 'm4_yearly_dataset',\n 'nn5_daily_dataset_with_missing_values',\n 'nn5_daily_dataset_without_missing_values',\n 'nn5_weekly_dataset',\n 'pedestrian_counts_dataset',\n 'saugeenday_dataset',\n 'solar_10_minutes_dataset',\n 'solar_4_seconds_dataset',\n 'solar_weekly_dataset',\n 'sunspot_dataset_with_missing_values',\n 'sunspot_dataset_without_missing_values',\n 'tourism_monthly_dataset',\n 'tourism_quarterly_dataset',\n 'tourism_yearly_dataset',\n 'traffic_hourly_dataset',\n 'traffic_weekly_dataset',\n 'us_births_dataset',\n 'weather_dataset',\n 'wind_4_seconds_dataset',\n 'wind_farms_minutely_dataset_with_missing_values',\n 'wind_farms_minutely_dataset_without_missing_values']"
      },
-     "execution_count": 7,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -270,12 +281,16 @@
     "list_available_tsf_datasets()"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:42.347519900Z",
+     "start_time": "2023-07-24T15:29:42.341536600Z"
+    }
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 16,
    "outputs": [
     {
      "name": "stdout",
@@ -307,21 +322,29 @@
     "print(data)"
    ],
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-07-24T15:29:49.815481300Z",
+     "start_time": "2023-07-24T15:29:42.347519900Z"
+    }
    }
   },
   {
    "cell_type": "markdown",
    "source": [
     "## References\n",
+    "\n",
     "[1] Dau et. al, The UCR time series archive, IEEE/CAA Journal of Automatica Sinica, 2019\n",
+    "\n",
     "[2] Ruiz et. al, The great multivariate time series classification bake off: a review\n",
     " and experimental evaluation of recent algorithmic advances, Data Mining and\n",
     " Knowledge Discovery 35(2), 2021\n",
+    "\n",
     "[3] Tan et. al, Time Series Extrinsic Regression, Data Mining and Knowledge\n",
-    "Discovery, 2021\n",
-    "[4] Godahewa et. al, Monash Time Series Forecasting Archive,Neural Information\n",
-    "Processing Systems Track on Datasets and Benchmarks, 2021\n"
+    " Discovery, 2021\n",
+    "\n",
+    "[4] Godahewa et. al, Monash Time Series Forecasting Archive, Neural Information\n",
+    " Processing Systems Track on Datasets and Benchmarks, 2021\n"
    ],
    "metadata": {
     "collapsed": false

	dask_series	np.ndarray	pd.DataFrame	pd.Series	xr.DataArray
dask_series	1	1	1	1	1
np.ndarray	1	1	1	1	1
pd.DataFrame	1	1	1	1	1
pd.Series	1	1	1	1	1
xr.DataArray	1	1	1	1	1
	Sales	Temperature
datetime
2018-01-06 23:15:00	111	26
2019-02-09 01:48:00	100	21
2020-08-06 13:20:00	90	19
2021-07-03 14:50:00	80	14
2022-07-06 11:50:00	65	12
2023-03-05 16:50:00	89	22
			c0
h0	h1	time
h0_0	h1_0	2000-01-01	2.199534
2000-01-02	5.267746
2000-01-03	4.792742
2000-01-04	3.115800
2000-01-05	5.581822