Revert "update Read input arguments (#444)"

This reverts commit bae2d89.
icesat2py · Jan 5, 2024 · bf2a40a · bf2a40a
1 parent 6d9acf6
commit bf2a40a
Show file tree

Hide file tree

Showing 7 changed files with 183 additions and 353 deletions.
diff --git a/doc/source/example_notebooks/IS2_data_read-in.ipynb b/doc/source/example_notebooks/IS2_data_read-in.ipynb
@@ -63,8 +63,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "path_root = '/full/path/to/your/ATL06_data/'\n",
-    "reader = ipx.Read(path_root)"
+    "path_root = '/full/path/to/your/data/'\n",
+    "pattern = \"processed_ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"\n",
+    "reader = ipx.Read(path_root, \"ATL06\", pattern) # or ipx.Read(filepath, \"ATLXX\") if your filenames match the default pattern"
    ]
   },
   {
@@ -110,9 +111,10 @@
     "\n",
     "Reading in ICESat-2 data with icepyx happens in a few simple steps:\n",
     "1. Let icepyx know where to find your data (this might be local files or urls to data in cloud storage)\n",
-    "2. Create an icepyx `Read` object\n",
-    "3. Make a list of the variables you want to read in (does not apply for gridded products)\n",
-    "4. Load your data into memory (or read it in lazily, if you're using Dask)\n",
+    "2. Tell icepyx how to interpret the filename format\n",
+    "3. Create an icepyx `Read` object\n",
+    "4. Make a list of the variables you want to read in (does not apply for gridded products)\n",
+    "5. Load your data into memory (or read it in lazily, if you're using Dask)\n",
     "\n",
     "We go through each of these steps in more detail in this notebook."
    ]
@@ -166,18 +168,21 @@
   {
    "cell_type": "markdown",
    "id": "e8da42c1",
-   "metadata": {
-    "user_expressions": []
-   },
+   "metadata": {},
    "source": [
     "### Step 1: Set data source path\n",
     "\n",
     "Provide a full path to the data to be read in (i.e. opened).\n",
     "Currently accepted inputs are:\n",
-    "* a string path to directory - all files from the directory will be opened\n",
-    "* a string path to single file - one file will be opened\n",
-    "* a list of filepaths - all files in the list will be opened\n",
-    "* a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened"
+    "* a directory\n",
+    "* a single file\n",
+    "\n",
+    "All files to be read in *must* have a consistent filename pattern.\n",
+    "If a directory is supplied as the data source, all files in any subdirectories that match the filename pattern will be included.\n",
+    "\n",
+    "S3 bucket data access is currently under development, and requires you are registered with NSIDC as a beta tester for cloud-based ICESat-2 data.\n",
+    "icepyx is working to ensure a smooth transition to working with remote files.\n",
+    "We'd love your help exploring and testing these features as they become available!"
    ]
   },
   {
@@ -203,135 +208,69 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fac636c2-e0eb-4e08-adaa-8f47623e46a1",
+   "id": "e683ebf7",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5', \n",
-    "#                  '/my/other/data/ATL06/processed_ATL06_20191202102922_10160505_006_01.h5']"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ba3ebeb0-3091-4712-b0f7-559ddb95ca5a",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "#### Glob Strings\n",
-    "\n",
-    "[glob](https://docs.python.org/3/library/glob.html) is a Python library which allows users to list files in their file systems whose paths match a given pattern. Icepyx uses the glob library to give users greater flexibility over their input file lists.\n",
-    "\n",
-    "glob works using `*` and `?` as wildcard characters, where `*` matches any number of characters and `?` matches a single character. For example:\n",
-    "\n",
-    "* `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: \"/this/path/processed_ATL03_20191130221008_09930503_006_01.h5\" or \"/this/path/myfavoriteicsat-2file.h5\")\n",
-    "* `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: \"/this/path/ATL07-02_20221012220720_03391701_005_01.h5\" or \"/this/path/processed_ATL07.h5\")\n",
-    "* `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: \"/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5\", \"/this/path/ATL06/myfile.h5\")\n",
-    "\n",
-    "See the glob documentation or other online explainer tutorials for more in depth explanation, or advanced glob paths such as character classes and ranges."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20286c76-5632-4420-b2c9-a5a6b1952672",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "#### Recursive Directory Search"
+    "# urlpath = 's3://nsidc-cumulus-prod-protected/ATLAS/ATL03/004/2019/11/30/ATL03_20191130221008_09930503_004_01.h5'"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "632bd1ce-2397-4707-a63f-9d5d2fc02fbc",
+   "id": "92743496",
    "metadata": {
     "user_expressions": []
    },
    "source": [
-    "glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so.\n",
+    "### Step 2: Create a filename pattern for your data files\n",
     "\n",
-    "If you would like to search recursively, you can achieve this by either:\n",
-    "1. passing the `recursive` argument into `glob_kwargs` and including `\\**\\` in your filepath\n",
-    "2. using glob directly to create a list of filepaths\n",
+    "Files provided by NSIDC typically match the format `\"ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"` where the parameters in curly brackets indicate a parameter name (left of the colon) and character length or format (right of the colon).\n",
+    "Some of this information is used during data opening to help correctly read and label the data within the data structure, particularly when multiple files are opened simultaneously.\n",
     "\n",
-    "Each of these two methods are shown below."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "da0cacd8-9ddc-4c31-86b6-167d850b989e",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "Method 1: passing the `recursive` argument into `glob_kwargs`"
+    "By default, icepyx will assume your filenames follow the default format.\n",
+    "However, you can easily read in other ICESat-2 data files by supplying your own filename pattern.\n",
+    "For instance, `pattern=\"ATL{product:2}-{datetime:%Y%m%d%H%M%S}-Sample.h5\"`. A few example patterns are provided below."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e276b876-9ec7-4991-8520-05c97824b896",
+   "id": "7318abd0",
    "metadata": {},
    "outputs": [],
    "source": [
-    "ipx.Read('/path/to/**/folder', glob_kwargs={'recursive': True})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5a1e85e-fc4a-405f-9710-0cb61b827f2c",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "You can use `glob_kwargs` for any additional argument to Python's builtin `glob.glob` that you would like to pass in via icepyx."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76de9539-710c-49f6-9e9e-238849382c33",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "Method 2: using glob directly to create a list of filepaths"
+    "# pattern = 'ATL06-{datetime:%Y%m%d%H%M%S}-Sample.h5'\n",
+    "# pattern = 'ATL{product:2}-{datetime:%Y%m%d%H%M%S}-Sample.h5'"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "be79b0dd-efcf-4d50-bdb0-8e3ae8e8e38c",
+   "id": "f43e8664",
    "metadata": {},
    "outputs": [],
    "source": [
-    "import glob"
+    "# pattern = \"ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\""
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5d088571-496d-479a-9fb7-833ed7e98676",
-   "metadata": {
-    "tags": []
-   },
+   "id": "992a77fb",
+   "metadata": {},
    "outputs": [],
    "source": [
-    "list_of_files = glob.glob('/path/to/**/folder', recursive=True)\n",
-    "ipx.Read(list_of_files)"
+    "# grid_pattern = \"ATL{product:2}_GL_0311_{res:3}m_{version:3}_{revision:2}.nc\""
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "08df2874-7c54-4670-8f37-9135ea296ff5",
-   "metadata": {
-    "user_expressions": []
-   },
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6aec1a70",
+   "metadata": {},
+   "outputs": [],
    "source": [
-    "```{admonition} Read Module Update\n",
-    "Previously, icepyx required two additional conditions: 1) a `product` argument and 2) that your files either matched the default `filename_pattern` or that the user provided their own `filename_pattern`. These two requirements have been removed. `product` is now read directly from the file metadata (the root group's `short_name` attribute). Flexibility to specify multiple files via the `filename_pattern` has been replaced with the [glob string](https://docs.python.org/3/library/glob.html) feature, and by allowing a list of filepaths as an argument.\n",
-    "\n",
-    "The `product` and `filename_pattern` arguments have been maintained for backwards compatibility, but will be fully removed in icepyx version 1.0.0.\n",
-    "```"
+    "pattern = \"processed_ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\""
    ]
   },
   {
@@ -341,9 +280,14 @@
     "user_expressions": []
    },
    "source": [
-    "### Step 2: Create an icepyx read object\n",
+    "### Step 3: Create an icepyx read object\n",
     "\n",
-    "Using the `data_source` described in Step 1, we can create our Read object."
+    "The `Read` object has two required inputs:\n",
+    "- `path` = a string with the full file path or full directory path to your hdf5 (.h5) format files.\n",
+    "- `product` = the data product you're working with, also known as the \"short name\".\n",
+    "\n",
+    "The `Read` object also accepts the optional keyword input:\n",
+    "- `pattern` = a formatted string indicating the filename pattern required for Intake's path_as_pattern argument."
    ]
   },
   {
@@ -355,17 +299,7 @@
    },
    "outputs": [],
    "source": [
-    "reader = ipx.Read(data_source=path_root)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7b2acfdb-75eb-4c64-b583-2ab19326aaee",
-   "metadata": {
-    "user_expressions": []
-   },
-   "source": [
-    "The Read object now contains the list of matching files that will eventually be loaded into Python. You can inspect its properties, such as the files that were located or the identified product, directly on the Read object."
+    "reader = ipx.Read(data_source=path_root, product=\"ATL06\", filename_pattern=pattern) # or ipx.Read(filepath, \"ATLXX\") if your filenames match the default pattern"
    ]
   },
   {
@@ -375,17 +309,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "reader.filelist"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7455ee3f-f9ab-486e-b4c7-2fa2314d4084",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "reader.product"
+    "reader._filelist"
    ]
   },
   {
@@ -395,7 +319,7 @@
     "user_expressions": []
    },
    "source": [
-    "### Step 3: Specify variables to be read in\n",
+    "### Step 4: Specify variables to be read in\n",
     "\n",
     "To load your data into memory or prepare it for analysis, icepyx needs to know which variables you'd like to read in.\n",
     "If you've used icepyx to download data from NSIDC with variable subsetting (which is the default), then you may already be familiar with the icepyx `Variables` module and how to create and modify lists of variables.\n",
@@ -502,7 +426,7 @@
     "user_expressions": []
    },
    "source": [
-    "### Step 4: Loading your data\n",
+    "### Step 5: Loading your data\n",
     "\n",
     "Now that you've set up all the options, you're ready to read your ICESat-2 data into memory!"
    ]
@@ -617,9 +541,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "icepyx-dev",
+   "display_name": "general",
    "language": "python",
-   "name": "icepyx-dev"
+   "name": "general"
   },
   "language_info": {
    "codemirror_mode": {