From ece278c8203e9860069fd383417855ef50b68db3 Mon Sep 17 00:00:00 2001 From: Rachel Wegener <35503632+rwegener2@users.noreply.github.com> Date: Mon, 4 Sep 2023 19:22:21 -0400 Subject: [PATCH 01/21] Remove intake catalog from Read module (#438) * delete is2cat.py and references * remove intake and related modules Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> Co-authored-by: Jessica Scheick --- .../example_notebooks/IS2_data_read-in.ipynb | 255 +++++------------- .../documentation/classes_dev_uml.svg | 63 +++-- .../documentation/classes_user_uml.svg | 31 ++- .../documentation/packages_user_uml.svg | 44 ++- doc/source/user_guide/documentation/read.rst | 1 - icepyx/core/is2cat.py | 178 ------------ icepyx/core/read.py | 115 ++++---- icepyx/tests/test_read.py | 3 +- requirements.txt | 2 - 9 files changed, 188 insertions(+), 504 deletions(-) delete mode 100644 icepyx/core/is2cat.py diff --git a/doc/source/example_notebooks/IS2_data_read-in.ipynb b/doc/source/example_notebooks/IS2_data_read-in.ipynb index dc9d8ed31..115c63044 100644 --- a/doc/source/example_notebooks/IS2_data_read-in.ipynb +++ b/doc/source/example_notebooks/IS2_data_read-in.ipynb @@ -3,7 +3,9 @@ { "cell_type": "markdown", "id": "552e9ef9", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "# Reading ICESat-2 Data in for Analysis\n", "This notebook ({nb-download}`download `) illustrates the use of icepyx for reading ICESat-2 data files, loading them into a data object.\n", @@ -20,10 +22,7 @@ "Instead of needing to manually iterate through the beam pairs, you can provide a few options to the `Read` object and icepyx will do the heavy lifting for you (as detailed in this notebook).\n", "\n", "### Approach\n", - "If you're interested in what's happening under the hood: icepyx turns your instructions into something called a catalog, then uses the Intake library and the catalog to actually load the data into memory. Specifically, icepyx creates an [Intake](https://intake.readthedocs.io/en/latest/) data [catalog](https://intake.readthedocs.io/en/latest/catalog.html) for each requested variable and then merges the read-in data from each of the variables to create a single data object.\n", - "\n", - "Intake catalogs are powerful (and the tool we selected) because they can be saved, shared, modified, and reused to reproducibly read in a set of data files in a consistent way as part of an analysis workflow.\n", - "This approach streamlines the transition between data sources (local/downloaded files or, ultimately, cloud/bucket access) and data object types (e.g. [Xarray Dataset](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html) or [GeoPandas GeoDataFrame](https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.html))." + "If you're interested in what's happening under the hood: icepyx uses the [xarray](https://docs.xarray.dev/en/stable/) library to read in each of the requested variables of the dataset. icepyx formats each requested variable and then merges the read-in data from each of the variables to create a single data object. The use of xarray is powerful, because the returned data object can be used with relevant xarray processing tools." ] }, { @@ -47,7 +46,9 @@ { "cell_type": "markdown", "id": "1ffb9a0c", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "---------------------------------\n", "\n", @@ -101,7 +102,9 @@ { "cell_type": "markdown", "id": "b8875936", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "---------------------------------------\n", "## Key steps for loading (reading) ICESat-2 data\n", @@ -119,7 +122,9 @@ { "cell_type": "markdown", "id": "9bf6d38c", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 0: Get some data if you haven't already\n", "Here are a few lines of code to get you set up with a few data files if you don't already have some on your local system." @@ -213,7 +218,9 @@ { "cell_type": "markdown", "id": "92743496", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 2: Create a filename pattern for your data files\n", "\n", @@ -269,7 +276,9 @@ { "cell_type": "markdown", "id": "4275b04c", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 3: Create an icepyx read object\n", "\n", @@ -277,9 +286,8 @@ "- `path` = a string with the full file path or full directory path to your hdf5 (.h5) format files.\n", "- `product` = the data product you're working with, also known as the \"short name\".\n", "\n", - "The `Read` object also accepts two optional keyword inputs:\n", - "- `pattern` = a formatted string indicating the filename pattern required for Intake's path_as_pattern argument.\n", - "- `catalog` = a string with the full path to an Intake catalog, for users who wish to use their own catalog (note this may have unintended consequenses if multiple granules are being combined)." + "The `Read` object also accepts the optional keyword input:\n", + "- `pattern` = a formatted string indicating the filename pattern required for Intake's path_as_pattern argument." ] }, { @@ -307,7 +315,9 @@ { "cell_type": "markdown", "id": "da8d8024", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 4: Specify variables to be read in\n", "\n", @@ -333,7 +343,9 @@ { "cell_type": "markdown", "id": "b2449941", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "To make things easier, you can use icepyx's built-in default list that loads commonly used variables for your non-gridded data product, or create your own list of variables to be read in.\n", "icepyx will determine what variables are available for you to read in by creating a list from one of your source files.\n", @@ -349,7 +361,9 @@ { "cell_type": "markdown", "id": "55092d1b", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "For a basic case, let's say we want to read in height, latitude, and longitude for all beam pairs.\n", "We create our variables list as" @@ -368,7 +382,9 @@ { "cell_type": "markdown", "id": "fff0bb19", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "Then we can view a dictionary of the variables we'd like to read in." ] @@ -386,7 +402,9 @@ { "cell_type": "markdown", "id": "9d5b50b5", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "Don't forget - if you need to start over, and re-generate your wanted variables list, it's easy!" ] @@ -404,13 +422,23 @@ { "cell_type": "markdown", "id": "473de4d7", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 5: Loading your data\n", "\n", "Now that you've set up all the options, you're ready to read your ICESat-2 data into memory!" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a66d889-8d2d-4b9a-821a-96a394ff8d66", + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -424,7 +452,9 @@ { "cell_type": "markdown", "id": "db6560f1", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "Within a Jupyter Notebook, you can get a summary view of your data object.\n", "\n", @@ -446,7 +476,9 @@ { "cell_type": "markdown", "id": "b1d7de2d", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "## On to data analysis!\n", "\n", @@ -469,7 +501,9 @@ { "cell_type": "markdown", "id": "6421f67c", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "A developer note to users:\n", "our next steps will be to create an xarray extension with ICESat-2 aware functions (like \"get_strong_beams\", etc.).\n", @@ -478,191 +512,38 @@ }, { "cell_type": "markdown", - "id": "6edfbb25", - "metadata": {}, - "source": [ - "### More on Intake catalogs and the read object\n", - "\n", - "As anyone familiar with ICESat-2 hdf5 files knows, one of the challenges to reading in data is looping through all of the beam pairs for each track.\n", - "The icepyx read module takes advantage of icepyx's variables module, which has some awareness of ICESat-2 data and uses that to save the user the trouble of having to loop through each beam pair.\n", - "The `reader.load()` function does this by automatically creating minimal Intake catalogs for each variable path, reading in the data, and merging each variable into a ready-to-analyze Xarray DataSet.\n", - "The Intake savvy user may wish to view the template catalog or use an existing catalog." - ] - }, - { - "cell_type": "markdown", - "id": "0f0076f9", - "metadata": {}, - "source": [ - "#### Viewing the template catalog\n", - "\n", - "You can access the ICESat-2 catalog template as an attribute of the read object.\n", - "\n", - "***NOTE: accessing `reader.is2catalog` creates a template with a placeholder in the 'group' parameter; thus, it will not work to actually read in data***" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2de29fd8", + "id": "1b0cb477", "metadata": { - "scrolled": true + "user_expressions": [] }, - "outputs": [], "source": [ - "reader.is2catalog" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7a5deef8", - "metadata": {}, - "outputs": [], - "source": [ - "reader.is2catalog.gui" - ] - }, - { - "cell_type": "markdown", - "id": "fef43556", - "metadata": {}, - "source": [ - "#### Use an existing catalog\n", - "If you already have a catalog for your data, you can supply that when you create the read object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "64986a60", - "metadata": {}, - "outputs": [], - "source": [ - "catpath = path_root + 'test_catalog.yml'\n", - "reader = ipx.Read(filepath, pattern, catpath)" - ] - }, - { - "cell_type": "markdown", - "id": "cf930e0a", - "metadata": {}, - "source": [ - "Then, you can use the catalog you supplied by calling intake's `read` directly to read in the specified data variable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dd0e086a", - "metadata": {}, - "outputs": [], - "source": [ - "ds = reader.is2catalog.read()" - ] - }, - { - "cell_type": "markdown", - "id": "60b1a304", - "metadata": {}, - "source": [ - "***NOTE: this means that you will only be able to read in a single data variable!***\n", - "\n", - "To take advantage of icepyx's knowledge of ICESat-2 data nesting of beam pairs and read in multiple related variables at once, you must use the variable approach outlined earlier in this tutorial." + "#### Credits\n", + "* original notebook by: Jessica Scheick\n", + "* notebook contributors: Wei Ji and Tian" ] }, { "cell_type": "code", "execution_count": null, - "id": "f5e3a221", + "id": "aaf6f5a6-355b-456a-99fd-ce0b51045b58", "metadata": {}, "outputs": [], - "source": [ - "ds = reader.load()\n", - "ds" - ] - }, - { - "cell_type": "markdown", - "id": "d56fc41c", - "metadata": {}, - "source": [ - "### More customization options\n", - "\n", - "If you'd like to use the icepyx ICESat-2 Catalog template to create your own customized catalog, we recommend that you access the `build_catalog` function directly, which returns an Intake Catalog instance.\n", - "\n", - "You'll need to supply the required `data_source`, `path_pattern`, and `source_type` arguments. `data_source` and `path_pattern` are described in Steps 2 and 3 of this tutorial. `source_type` is the string you'd like to use for your Local Catalog entry.\n", - "\n", - "This function accepts as keyword input arguments (kwargs) dictionaries with appropriate keys (depending on the Intake driver you are using).\n", - "The simplest version of this is specifying the variable parameters and paths of interest.\n", - "`grp_paths` may contain \"variables\", each of which must then be further defined by `grp_path_params`.\n", - "You cannot use glob-like path syntax to access variables (so `grp_path = '/*/land_ice_segments'` is NOT VALID)." - ] + "source": [] }, { "cell_type": "code", "execution_count": null, - "id": "f174f885", - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "import icepyx.core.is2cat as is2cat\n", - "\n", - "# build a custom ICESat-2 catalog with a group and parameter\n", - "cat = is2cat.build_catalog(data_source = path_root,\n", - " path_pattern = pattern,\n", - " source_type = \"manual_catalog\",\n", - " grp_paths = \"/{{gt}}/land_ice_segments\",\n", - " grp_path_params = [{\"name\": \"gt\",\n", - " \"description\": \"Ground track\",\n", - " \"type\": \"str\",\n", - " \"default\": \"gt1l\",\n", - " \"allowed\": [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", - " }]\n", - " )" - ] - }, - { - "cell_type": "markdown", - "id": "bab9c949", - "metadata": {}, - "source": [ - "#### Saving your catalog\n", - "If you create a highly customized ICESat-2 catalog, you can use Intake's `save` to export it as a .yml file.\n", - "\n", - "Don't forget you can easily use an existing catalog (such as this highly customized one you just made) to read in your data with `reader = ipx.Read(filepath, pattern, catalog)` (so it's as easy as re-creating your reader object with your modified catalog)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "30f0122a", + "id": "8ea1987f-b6bf-44df-a869-949290f498cb", "metadata": {}, "outputs": [], - "source": [ - "catpath = path_root + 'test_catalog.yml'\n", - "cat.save(catpath)" - ] - }, - { - "cell_type": "markdown", - "id": "1b0cb477", - "metadata": {}, - "source": [ - "#### Credits\n", - "* original notebook by: Jessica Scheick\n", - "* notebook contributors: Wei Ji and Tian\n", - "* templates for default ICESat-2 Intake catalogs from: [Wei Ji](https://github.com/icesat2py/icepyx/issues/106) and [Tian](https://github.com/icetianli/ICESat2_xarray)." - ] + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "general", "language": "python", - "name": "python3" + "name": "general" }, "language_info": { "codemirror_mode": { @@ -674,7 +555,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.11.4" } }, "nbformat": 4, diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index 7db958018..3a0b5c396 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -24,7 +24,7 @@ EarthdataAuthMixin -_auth : NoneType, Auth +_auth : Auth, NoneType _s3_initial_ts : datetime, NoneType _s3login_credentials : NoneType, dict _session : Session, NoneType @@ -77,7 +77,7 @@ _file_vars _granules _order_vars -_prod : str, NoneType +_prod : NoneType, str _readable_granule_name : list _reqparams _source : str @@ -226,29 +226,26 @@ icepyx.core.read.Read - -Read - -_catalog_path -_filelist : list, NoneType -_is2catalog : Catalog -_out_obj : Dataset -_pattern : str -_prod : str -_read_vars -_source_type : str -data_source -is2catalog -vars - -__init__(data_source, product, filename_pattern, catalog, out_obj_type) -_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) -_build_dataset_template(file) -_build_single_file_dataset(file, groups_list) -_check_source_for_pattern(source, filename_pattern) -_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) -_read_single_grp(file, grp_path) -load() + +Read + +_filelist : list, NoneType +_out_obj : Dataset +_pattern : str +_prod : str +_read_vars +_source_type : str +data_source +vars + +__init__(data_source, product, filename_pattern, out_obj_type) +_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) +_build_dataset_template(file) +_build_single_file_dataset(file, groups_list) +_check_source_for_pattern(source, filename_pattern) +_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) +_read_single_grp(file, grp_path) +load() @@ -312,7 +309,7 @@ Variables -_avail : NoneType, list +_avail : list, NoneType _vartype _version : NoneType path : NoneType @@ -360,8 +357,8 @@ icepyx.core.variables.Variables->icepyx.core.read.Read - - + + _read_vars @@ -370,11 +367,11 @@ Visualize -bbox : list -cycles : NoneType -date_range : NoneType -product : str, NoneType -tracks : NoneType +bbox : list +cycles : NoneType +date_range : NoneType +product : NoneType, str +tracks : NoneType __init__(query_obj, product, spatial_extent, date_range, cycles, tracks) generate_OA_parameters(): list diff --git a/doc/source/user_guide/documentation/classes_user_uml.svg b/doc/source/user_guide/documentation/classes_user_uml.svg index c5882022d..b6d850951 100644 --- a/doc/source/user_guide/documentation/classes_user_uml.svg +++ b/doc/source/user_guide/documentation/classes_user_uml.svg @@ -192,14 +192,13 @@ icepyx.core.read.Read - -Read - -data_source -is2catalog -vars - -load() + +Read + +data_source +vars + +load() @@ -255,7 +254,7 @@ path : NoneType product : NoneType -wanted : NoneType, dict +wanted : dict, NoneType append(defaults, var_list, beam_list, keyword_list) avail(options, internal) @@ -292,8 +291,8 @@ icepyx.core.variables.Variables->icepyx.core.read.Read - - + + _read_vars @@ -302,11 +301,11 @@ Visualize -bbox : list -cycles : NoneType -date_range : NoneType -product : NoneType, str -tracks : NoneType +bbox : list +cycles : NoneType +date_range : NoneType +product : str, NoneType +tracks : NoneType generate_OA_parameters(): list grid_bbox(binsize): list diff --git a/doc/source/user_guide/documentation/packages_user_uml.svg b/doc/source/user_guide/documentation/packages_user_uml.svg index 93b1d45e9..3422b44e8 100644 --- a/doc/source/user_guide/documentation/packages_user_uml.svg +++ b/doc/source/user_guide/documentation/packages_user_uml.svg @@ -4,11 +4,11 @@ - + packages_user_uml - + icepyx.core @@ -45,20 +45,14 @@ icepyx.core.icesat2data - - -icepyx.core.is2cat - -icepyx.core.is2cat - - + icepyx.core.is2ref - -icepyx.core.is2ref + +icepyx.core.is2ref - + icepyx.core.query icepyx.core.query @@ -76,7 +70,7 @@ - + icepyx.core.variables icepyx.core.variables @@ -88,7 +82,7 @@ - + icepyx.core.visualization icepyx.core.visualization @@ -100,7 +94,7 @@ - + icepyx.core.read icepyx.core.read @@ -112,22 +106,22 @@ - + icepyx.core.spatial - -icepyx.core.spatial + +icepyx.core.spatial - + icepyx.core.temporal - -icepyx.core.temporal + +icepyx.core.temporal - + icepyx.core.validate_inputs - -icepyx.core.validate_inputs + +icepyx.core.validate_inputs diff --git a/doc/source/user_guide/documentation/read.rst b/doc/source/user_guide/documentation/read.rst index b076ef210..a5beedf4e 100644 --- a/doc/source/user_guide/documentation/read.rst +++ b/doc/source/user_guide/documentation/read.rst @@ -19,7 +19,6 @@ Attributes .. autosummary:: :toctree: ../../_icepyx/ - Read.is2catalog Read.vars diff --git a/icepyx/core/is2cat.py b/icepyx/core/is2cat.py deleted file mode 100644 index f4e66a7bf..000000000 --- a/icepyx/core/is2cat.py +++ /dev/null @@ -1,178 +0,0 @@ -from intake.catalog import Catalog - -# Need to post on intake's page to see if this would be a useful contribution... -# https://github.com/intake/intake/blob/0.6.4/intake/source/utils.py#L216 -def _pattern_to_glob(pattern): - """ - Adapted from intake.source.utils.path_to_glob to convert a path as pattern into a glob style path - that uses the pattern's indicated number of '?' instead of '*' where an int was specified. - - Returns pattern if pattern is not a string. - - Parameters - ---------- - pattern : str - Path as pattern optionally containing format_strings - - Returns - ------- - glob_path : str - Path with int format strings replaced with the proper number of '?' and '*' otherwise. - - Examples - -------- - >>> _pattern_to_glob('{year}/{month}/{day}.csv') - '*/*/*.csv' - >>> _pattern_to_glob('{year:4}/{month:2}/{day:2}.csv') - '????/??/??.csv' - >>> _pattern_to_glob('data/{year:4}{month:02}{day:02}.csv') - 'data/????????.csv' - >>> _pattern_to_glob('data/*.csv') - 'data/*.csv' - """ - from string import Formatter - - if not isinstance(pattern, str): - return pattern - - fmt = Formatter() - glob_path = "" - # prev_field_name = None - for literal_text, field_name, format_specs, _ in fmt.parse(format_string=pattern): - glob_path += literal_text - if field_name and (glob_path != "*"): - try: - glob_path += "?" * int(format_specs) - except ValueError: - glob_path += "*" - # alternatively, you could use bits=utils._get_parts_of_format_string(resolved_string, literal_texts, format_specs) - # and then use len(bits[i]) to get the length of each format_spec - # print(glob_path) - return glob_path - - -def build_catalog( - data_source, - path_pattern, - source_type, - grp_paths=None, - grp_path_params=None, - extra_engine_kwargs=None, - **kwargs -): - """ - Build a general Intake catalog for reading in ICESat-2 data. - This function is used by the read class object to create catalogs from lists of ICESat-2 variables. - - Parameters - ---------- - data_source : string - A string with a full file path or full directory path to ICESat-2 hdf5 (.h5) format files. - Files within a directory must have a consistent filename pattern that includes the "ATL??" data product name. - Files must all be within a single directory. - - path_pattern : string - String that shows the filename pattern as required for Intake's path_as_pattern argument. - - source_type : string - String to use as the Local Catalog Entry name. - - grp_paths : str, default None - Variable paths to load. - Can include general parameter names, which must be contained within double curly brackets and further - described in `grp_path_params`. - Default list based on data product of provided files. - If multiple data products are included in the files, the default list will be for the product of the first file. - This may result in errors during read-in if all files do not have the same variable paths. - - grp_path_params : [dict], default None - List of dictionaries with a keyword for each parameter name specified in the `grp_paths` string. - Each parameter keyword should contain a dictionary with the acceptable keyword-value pairs for the driver being used. - - **kwargs : - Keyword arguments to be passed through to `intake.catalog.Catalog.from_dict()`. - Keywords needed to override default inputs include: - - `source_args_dict` # highest level source information; keys include: "urlpath", "path_as_pattern", driver-specific ("xarray_kwargs" is default) - - `metadata_dict` - - `source_dict` # individual source entry information (default is supplied by data object; "name", "description", "driver", "args") - - `defaults_dict` # catalog "name", "description", "metadata", "entries", etc. - - Returns - ------- - intake.catalog.Catalog object - - See Also - -------- - read.Read - - """ - from intake.catalog.local import LocalCatalogEntry, UserParameter - import intake_xarray - - import icepyx.core.APIformatting as apifmt - - assert ( - grp_paths - ), "You must enter a variable path or you will not be able to read in any data." - - # generalize this/make it so the [engine] values can be entered as kwargs... - engine_key = "xarray_kwargs" - xarray_kwargs_dict = {"engine": "h5netcdf", "group": grp_paths} - if extra_engine_kwargs: - for key in extra_engine_kwargs.keys(): - xarray_kwargs_dict[key] = extra_engine_kwargs[key] - - source_args_dict = { - "urlpath": data_source, - "path_as_pattern": path_pattern, - engine_key: xarray_kwargs_dict, - } - - metadata_dict = {"version": 1} - - source_dict = { - "name": source_type, - "description": "", - "driver": "intake_xarray.netcdf.NetCDFSource", # NOTE: this must be a string or the catalog cannot be imported after saving - "args": source_args_dict, - } - - if grp_path_params: - source_dict = apifmt.combine_params( - source_dict, - {"parameters": [UserParameter(**params) for params in grp_path_params]}, - ) - - # NOTE: LocalCatalogEntry has some required positional args (name, description, driver) - # I tried doing this generally with *source_dict after the positional args (instead of as part of the if) - # but apparently I don't quite get something about passing dicts with * and ** and couldn't make it work - local_cat_source = { - source_type: LocalCatalogEntry( - name=source_dict.pop("name"), - description=source_dict.pop("description"), - driver=source_dict.pop("driver"), - parameters=source_dict.pop("parameters"), - args=source_dict.pop("args"), - ) - } - - else: - local_cat_source = { - source_type: LocalCatalogEntry( - name=source_dict.pop("name"), - description=source_dict.pop("description"), - driver=source_dict.pop("driver"), - args=source_dict.pop("args"), - ) - } - - defaults_dict = { - "name": "IS2-hdf5-icepyx-intake-catalog", - "description": "an icepyx-generated catalog for creating local ICESat-2 intake entries", - "metadata": metadata_dict, - "entries": local_cat_source, - } - - build_cat_dict = apifmt.combine_params(defaults_dict, kwargs) - - return Catalog.from_dict(**build_cat_dict) diff --git a/icepyx/core/read.py b/icepyx/core/read.py index 5a497279a..2ffe32cb7 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -5,7 +5,6 @@ import numpy as np import xarray as xr -import icepyx.core.is2cat as is2cat import icepyx.core.is2ref as is2ref from icepyx.core.variables import Variables as Variables from icepyx.core.variables import list_of_dict_vals @@ -206,11 +205,61 @@ def _run_fast_scandir(dir, fn_glob): return subfolders, files +# Need to post on intake's page to see if this would be a useful contribution... +# https://github.com/intake/intake/blob/0.6.4/intake/source/utils.py#L216 +def _pattern_to_glob(pattern): + """ + Adapted from intake.source.utils.path_to_glob to convert a path as pattern into a glob style path + that uses the pattern's indicated number of '?' instead of '*' where an int was specified. + + Returns pattern if pattern is not a string. + + Parameters + ---------- + pattern : str + Path as pattern optionally containing format_strings + + Returns + ------- + glob_path : str + Path with int format strings replaced with the proper number of '?' and '*' otherwise. + + Examples + -------- + >>> _pattern_to_glob('{year}/{month}/{day}.csv') + '*/*/*.csv' + >>> _pattern_to_glob('{year:4}/{month:2}/{day:2}.csv') + '????/??/??.csv' + >>> _pattern_to_glob('data/{year:4}{month:02}{day:02}.csv') + 'data/????????.csv' + >>> _pattern_to_glob('data/*.csv') + 'data/*.csv' + """ + from string import Formatter + + if not isinstance(pattern, str): + return pattern + + fmt = Formatter() + glob_path = "" + # prev_field_name = None + for literal_text, field_name, format_specs, _ in fmt.parse(format_string=pattern): + glob_path += literal_text + if field_name and (glob_path != "*"): + try: + glob_path += "?" * int(format_specs) + except ValueError: + glob_path += "*" + # alternatively, you could use bits=utils._get_parts_of_format_string(resolved_string, literal_texts, format_specs) + # and then use len(bits[i]) to get the length of each format_spec + # print(glob_path) + return glob_path + # To do: test this class and functions therein class Read: """ - Data object to create and use Intake catalogs to read ICESat-2 data into the specified formats. + Data object to read ICESat-2 data into the specified formats. Provides flexiblity for reading nested hdf5 files into common analysis formats. Parameters @@ -229,10 +278,6 @@ class Read: The default describes files downloaded directly from NSIDC (subsetted and non-subsetted) for most products (e.g. ATL06). The ATL11 filename pattern from NSIDC is: 'ATL{product:2}_{rgt:4}{orbitsegment:2}_{cycles:4}_{version:3}_{revision:2}.h5'. - catalog : string, default None - Full path to an Intake catalog for reading in data. - If you still need to create a catalog, leave as default. - out_obj_type : object, default xarray.Dataset The desired format for the data to be read in. Currently, only xarray.Dataset objects (default) are available. @@ -255,10 +300,8 @@ def __init__( data_source=None, product=None, filename_pattern="ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5", - catalog=None, out_obj_type=None, # xr.Dataset, ): - if data_source is None: raise ValueError("Please provide a data source.") else: @@ -271,7 +314,6 @@ def __init__( ) else: self._prod = is2ref._validate_product(product) - pattern_ck, filelist = Read._check_source_for_pattern( data_source, filename_pattern ) @@ -298,11 +340,6 @@ def __init__( self._filelist = filelist # after validation, use the notebook code and code outline to start implementing the rest of the class - if catalog is not None: - assert os.path.isfile( - catalog - ), f"Your catalog path '{catalog}' does not point to a valid file." - self._catalog_path = catalog if out_obj_type is not None: print( @@ -314,28 +351,6 @@ def __init__( # ---------------------------------------------------------------------- # Properties - @property - def is2catalog(self): - """ - Print a generic ICESat-2 Intake catalog. - This catalog does not specify groups, so it cannot be used to read in data. - - """ - if not hasattr(self, "_is2catalog") and hasattr(self, "_catalog_path"): - from intake import open_catalog - - self._is2catalog = open_catalog(self._catalog_path) - - else: - self._is2catalog = is2cat.build_catalog( - self.data_source, - self._pattern, - self._source_type, - grp_paths="/paths/to/variables", - ) - - return self._is2catalog - # I cut and pasted this directly out of the Query class - going to need to reconcile the _source/file stuff there @property @@ -370,7 +385,7 @@ def _check_source_for_pattern(source, filename_pattern): """ Check that the entered data source contains files that match the input filename_pattern """ - glob_pattern = is2cat._pattern_to_glob(filename_pattern) + glob_pattern = _pattern_to_glob(filename_pattern) if os.path.isdir(source): _, filelist = _run_fast_scandir(source, glob_pattern) @@ -601,9 +616,6 @@ def load(self): All items in the wanted variables list will be loaded from the files into memory. If you do not provide a wanted variables list, a default one will be created for you. - - If you would like to use the Intake catalog you provided to read in a single data variable, - simply call Intake's `read()` function on the is2catalog property (e.g. `reader.is2catalog.read()`). """ # todo: @@ -668,7 +680,7 @@ def _build_dataset_template(self, file): def _read_single_grp(self, file, grp_path): """ - For a given file and variable group path, construct an Intake catalog and use it to read in the data. + For a given file and variable group path, construct an xarray Dataset. Parameters ---------- @@ -685,24 +697,8 @@ def _read_single_grp(self, file, grp_path): """ - try: - grpcat = is2cat.build_catalog( - file, self._pattern, self._source_type, grp_paths=grp_path - ) - ds = grpcat[self._source_type].read() - - # NOTE: could also do this with h5py, but then would have to read in each variable in the group separately - except ValueError: - grpcat = is2cat.build_catalog( - file, - self._pattern, - self._source_type, - grp_paths=grp_path, - extra_engine_kwargs={"phony_dims": "access"}, - ) - ds = grpcat[self._source_type].read() - - return ds + return xr.open_dataset(file, group=grp_path, engine='h5netcdf', + backend_kwargs={'phony_dims': 'access'}) def _build_single_file_dataset(self, file, groups_list): """ @@ -722,7 +718,6 @@ def _build_single_file_dataset(self, file, groups_list): ------- Xarray Dataset """ - file_product = self._read_single_grp(file, "/").attrs["identifier_product_type"] assert ( file_product == self._prod diff --git a/icepyx/tests/test_read.py b/icepyx/tests/test_read.py index 9748ae992..018435968 100644 --- a/icepyx/tests/test_read.py +++ b/icepyx/tests/test_read.py @@ -63,7 +63,6 @@ def test_validate_source_str_not_a_dir_or_file(): ), sorted( [ - "./icepyx/core/is2cat.py", "./icepyx/core/is2ref.py", "./icepyx/tests/is2class_query.py", ] @@ -73,7 +72,7 @@ def test_validate_source_str_not_a_dir_or_file(): ( "./icepyx/core", "is2*.py", - ([], ["./icepyx/core/is2cat.py", "./icepyx/core/is2ref.py"]), + ([], ["./icepyx/core/is2ref.py"]), ), ( "./icepyx", diff --git a/requirements.txt b/requirements.txt index 86618f108..06f4ad9a7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,8 +7,6 @@ h5netcdf h5py holoviews hvplot -intake -intake-xarray matplotlib numpy requests From aa6dd4ac07b395e28b165c32d6151226e254b6cd Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Tue, 12 Sep 2023 16:31:32 -0400 Subject: [PATCH 02/21] add atl23 to product lists and tests (#445) --- icepyx/core/is2ref.py | 1 + icepyx/core/read.py | 10 ++++++++-- icepyx/tests/test_is2ref.py | 12 ++++++++++++ 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index 883772a9e..52cf0e3a1 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -39,6 +39,7 @@ def _validate_product(product): "ATL19", "ATL20", "ATL21", + "ATL23", ], "Please enter a valid product" else: raise TypeError("Please enter a product string") diff --git a/icepyx/core/read.py b/icepyx/core/read.py index 2ffe32cb7..c276c5e53 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -205,6 +205,7 @@ def _run_fast_scandir(dir, fn_glob): return subfolders, files + # Need to post on intake's page to see if this would be a useful contribution... # https://github.com/intake/intake/blob/0.6.4/intake/source/utils.py#L216 def _pattern_to_glob(pattern): @@ -697,8 +698,12 @@ def _read_single_grp(self, file, grp_path): """ - return xr.open_dataset(file, group=grp_path, engine='h5netcdf', - backend_kwargs={'phony_dims': 'access'}) + return xr.open_dataset( + file, + group=grp_path, + engine="h5netcdf", + backend_kwargs={"phony_dims": "access"}, + ) def _build_single_file_dataset(self, file, groups_list): """ @@ -740,6 +745,7 @@ def _build_single_file_dataset(self, file, groups_list): "ATL19", "ATL20", "ATL21", + "ATL23", ]: is2ds = xr.open_dataset(file) diff --git a/icepyx/tests/test_is2ref.py b/icepyx/tests/test_is2ref.py index c2ddf6e5e..8d50568fe 100644 --- a/icepyx/tests/test_is2ref.py +++ b/icepyx/tests/test_is2ref.py @@ -273,6 +273,18 @@ def test_atl21_product(): assert obs == expected +def test_atl23_product(): + lcds = "atl23" + obs = is2ref._validate_product(lcds) + expected = "ATL23" + assert obs == expected + + ucds = "ATL23" + obs = is2ref._validate_product(ucds) + expected = "ATL23" + assert obs == expected + + ########## about_product ########## # Note: requires internet connection # could the github flat data option be used here? https://octo.github.com/projects/flat-data From ac17ff3aee51325b341eb102b0ca390d91f89113 Mon Sep 17 00:00:00 2001 From: Rachel Wegener <35503632+rwegener2@users.noreply.github.com> Date: Wed, 13 Sep 2023 12:38:43 -0400 Subject: [PATCH 03/21] raise error for use of catalog in Read (#446) * add deprecationerror and raise for use of catalog --------- Co-authored-by: Jessica Scheick --- .../documentation/classes_dev_uml.svg | 529 +++++++++--------- .../documentation/classes_user_uml.svg | 395 ++++++------- .../documentation/packages_user_uml.svg | 48 +- icepyx/core/exceptions.py | 9 + icepyx/core/read.py | 14 + 5 files changed, 521 insertions(+), 474 deletions(-) diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index 3a0b5c396..fd5033938 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -4,11 +4,11 @@ - + classes_dev_uml - + icepyx.core.auth.AuthenticationError @@ -18,369 +18,378 @@ - + +icepyx.core.exceptions.DeprecationError + +DeprecationError + + + + + + icepyx.core.auth.EarthdataAuthMixin - -EarthdataAuthMixin - -_auth : Auth, NoneType -_s3_initial_ts : datetime, NoneType -_s3login_credentials : NoneType, dict -_session : Session, NoneType -auth -s3login_credentials -session - -__init__(auth) -__str__() -earthdata_login(uid, email, s3token): None + +EarthdataAuthMixin + +_auth : Auth, NoneType +_s3_initial_ts : NoneType, datetime +_s3login_credentials : NoneType, dict +_session : NoneType, Session +auth +s3login_credentials +session + +__init__(auth) +__str__() +earthdata_login(uid, email, s3token): None - + icepyx.core.query.GenQuery - -GenQuery - -_spatial -_temporal - -__init__(spatial_extent, date_range, start_time, end_time) -__str__() + +GenQuery + +_spatial +_temporal + +__init__(spatial_extent, date_range, start_time, end_time) +__str__() - + icepyx.core.granules.Granules - -Granules - -avail : list -orderIDs : list - -__init__ -() -download(verbose, path, session, restart) -get_avail(CMRparams, reqparams, cloud) -place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) + +Granules + +avail : list +orderIDs : list + +__init__ +() +download(verbose, path, session, restart) +get_avail(CMRparams, reqparams, cloud) +place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) - + icepyx.core.query.Query - -Query - -CMRparams -_CMRparams -_about_product -_cust_options : dict -_cycles : list -_file_vars -_granules -_order_vars -_prod : NoneType, str -_readable_granule_name : list -_reqparams -_source : str -_subsetparams : NoneType -_tracks : list -_version -cycles -dataset -dates -end_time -file_vars -granules -order_vars -product -product_version -reqparams -spatial -spatial_extent -start_time -temporal -tracks - -__init__(product, spatial_extent, date_range, start_time, end_time, version, cycles, tracks, files, auth) -__str__() -avail_granules(ids, cycles, tracks, cloud) -download_granules(path, verbose, subset, restart) -latest_version() -order_granules(verbose, subset, email) -product_all_info() -product_summary_info() -show_custom_options(dictview) -subsetparams() -visualize_elevation() -visualize_spatial_extent() + +Query + +CMRparams +_CMRparams +_about_product +_cust_options : dict +_cycles : list +_file_vars +_granules +_order_vars +_prod : NoneType, str +_readable_granule_name : list +_reqparams +_source : str +_subsetparams : NoneType +_tracks : list +_version +cycles +dataset +dates +end_time +file_vars +granules +order_vars +product +product_version +reqparams +spatial +spatial_extent +start_time +temporal +tracks + +__init__(product, spatial_extent, date_range, start_time, end_time, version, cycles, tracks, files, auth) +__str__() +avail_granules(ids, cycles, tracks, cloud) +download_granules(path, verbose, subset, restart) +latest_version() +order_granules(verbose, subset, email) +product_all_info() +product_summary_info() +show_custom_options(dictview) +subsetparams() +visualize_elevation() +visualize_spatial_extent() icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules - + icepyx.core.icesat2data.Icesat2Data - -Icesat2Data - - -__init__() + +Icesat2Data + + +__init__() - + icepyx.core.exceptions.NsidcQueryError - -NsidcQueryError - -errmsg -msgtxt : str - -__init__(errmsg, msgtxt) -__str__() + +NsidcQueryError + +errmsg +msgtxt : str + +__init__(errmsg, msgtxt) +__str__() - + icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + - + icepyx.core.APIformatting.Parameters - -Parameters - -_fmted_keys : NoneType, dict -_poss_keys : dict -_reqtype : NoneType, str -fmted_keys -partype -poss_keys - -__init__(partype, values, reqtype) -_check_valid_keys() -_get_possible_keys() -build_params() -check_req_values() -check_values() + +Parameters + +_fmted_keys : NoneType, dict +_poss_keys : dict +_reqtype : NoneType, str +fmted_keys +partype +poss_keys + +__init__(partype, values, reqtype) +_check_valid_keys() +_get_possible_keys() +build_params() +check_req_values() +check_values() icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_CMRparams + + +_CMRparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_reqparams + + +_reqparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.query.Query->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + - + icepyx.core.read.Read - -Read - -_filelist : list, NoneType -_out_obj : Dataset -_pattern : str -_prod : str -_read_vars -_source_type : str -data_source -vars - -__init__(data_source, product, filename_pattern, out_obj_type) -_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) -_build_dataset_template(file) -_build_single_file_dataset(file, groups_list) -_check_source_for_pattern(source, filename_pattern) -_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) -_read_single_grp(file, grp_path) -load() + +Read + +_filelist : list, NoneType +_out_obj : Dataset +_pattern : str +_prod : str +_read_vars +_source_type : str +data_source +vars + +__init__(data_source, product, filename_pattern, catalog, out_obj_type) +_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) +_build_dataset_template(file) +_build_single_file_dataset(file, groups_list) +_check_source_for_pattern(source, filename_pattern) +_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) +_read_single_grp(file, grp_path) +load() - + icepyx.core.spatial.Spatial - -Spatial - -_ext_type : str -_gdf_spat : DataFrame, GeoDataFrame -_geom_file : NoneType -_spatial_ext -_xdateln -extent -extent_as_gdf -extent_file -extent_type - -__init__(spatial_extent) -__str__() -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +_ext_type : str +_gdf_spat : GeoDataFrame, DataFrame +_geom_file : NoneType +_spatial_ext +_xdateln +extent +extent_as_gdf +extent_file +extent_type + +__init__(spatial_extent) +__str__() +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial - + icepyx.core.temporal.Temporal - -Temporal - -_end : datetime -_start : datetime -end -start - -__init__(date_range, start_time, end_time) -__str__() + +Temporal + +_end : datetime +_start : datetime +end +start + +__init__(date_range, start_time, end_time) +__str__() icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal - + icepyx.core.variables.Variables - -Variables - -_avail : list, NoneType -_vartype -_version : NoneType -path : NoneType -product : NoneType -wanted : NoneType, dict - -__init__(vartype, avail, wanted, product, version, path, auth) -_check_valid_lists(vgrp, allpaths, var_list, beam_list, keyword_list) -_get_combined_list(beam_list, keyword_list) -_get_sum_varlist(var_list, all_vars, defaults) -_iter_paths(sum_varlist, req_vars, vgrp, beam_list, keyword_list) -_iter_vars(sum_varlist, req_vars, vgrp) -append(defaults, var_list, beam_list, keyword_list) -avail(options, internal) -parse_var_list(varlist, tiered, tiered_vars) -remove(all, var_list, beam_list, keyword_list) + +Variables + +_avail : NoneType, list +_vartype +_version : NoneType +path : NoneType +product : NoneType +wanted : NoneType, dict + +__init__(vartype, avail, wanted, product, version, path, auth) +_check_valid_lists(vgrp, allpaths, var_list, beam_list, keyword_list) +_get_combined_list(beam_list, keyword_list) +_get_sum_varlist(var_list, all_vars, defaults) +_iter_paths(sum_varlist, req_vars, vgrp, beam_list, keyword_list) +_iter_vars(sum_varlist, req_vars, vgrp) +append(defaults, var_list, beam_list, keyword_list) +avail(options, internal) +parse_var_list(varlist, tiered, tiered_vars) +remove(all, var_list, beam_list, keyword_list) icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_file_vars + + +_file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars - + icepyx.core.visualization.Visualize - -Visualize - -bbox : list -cycles : NoneType -date_range : NoneType -product : NoneType, str -tracks : NoneType - -__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) -generate_OA_parameters(): list -grid_bbox(binsize): list -make_request(base_url, payload) -parallel_request_OA(): da.array -query_icesat2_filelist(): tuple -request_OA_data(paras): da.array -viz_elevation(): (hv.DynamicMap, hv.Layout) + +Visualize + +bbox : list +cycles : NoneType +date_range : NoneType +product : NoneType, str +tracks : NoneType + +__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) +generate_OA_parameters(): list +grid_bbox(binsize): list +make_request(base_url, payload) +parallel_request_OA(): da.array +query_icesat2_filelist(): tuple +request_OA_data(paras): da.array +viz_elevation(): (hv.DynamicMap, hv.Layout) diff --git a/doc/source/user_guide/documentation/classes_user_uml.svg b/doc/source/user_guide/documentation/classes_user_uml.svg index b6d850951..1c9184379 100644 --- a/doc/source/user_guide/documentation/classes_user_uml.svg +++ b/doc/source/user_guide/documentation/classes_user_uml.svg @@ -4,11 +4,11 @@ - + classes_user_uml - + icepyx.core.auth.AuthenticationError @@ -18,302 +18,311 @@ - + +icepyx.core.exceptions.DeprecationError + +DeprecationError + + + + + + icepyx.core.auth.EarthdataAuthMixin - -EarthdataAuthMixin - -auth -s3login_credentials -session - -earthdata_login(uid, email, s3token): None + +EarthdataAuthMixin + +auth +s3login_credentials +session + +earthdata_login(uid, email, s3token): None - + icepyx.core.query.GenQuery - -GenQuery - - - + +GenQuery + + + - + icepyx.core.granules.Granules - -Granules - -avail : list -orderIDs : list - -download(verbose, path, session, restart) -get_avail(CMRparams, reqparams, cloud) -place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) + +Granules + +avail : list +orderIDs : list + +download(verbose, path, session, restart) +get_avail(CMRparams, reqparams, cloud) +place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) - + icepyx.core.query.Query - -Query - -CMRparams -cycles -dataset -dates -end_time -file_vars -granules -order_vars -product -product_version -reqparams -spatial -spatial_extent -start_time -temporal -tracks - -avail_granules(ids, cycles, tracks, cloud) -download_granules(path, verbose, subset, restart) -latest_version() -order_granules(verbose, subset, email) -product_all_info() -product_summary_info() -show_custom_options(dictview) -subsetparams() -visualize_elevation() -visualize_spatial_extent() + +Query + +CMRparams +cycles +dataset +dates +end_time +file_vars +granules +order_vars +product +product_version +reqparams +spatial +spatial_extent +start_time +temporal +tracks + +avail_granules(ids, cycles, tracks, cloud) +download_granules(path, verbose, subset, restart) +latest_version() +order_granules(verbose, subset, email) +product_all_info() +product_summary_info() +show_custom_options(dictview) +subsetparams() +visualize_elevation() +visualize_spatial_extent() icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules - + icepyx.core.icesat2data.Icesat2Data - -Icesat2Data - - - + +Icesat2Data + + + - + icepyx.core.exceptions.NsidcQueryError - -NsidcQueryError - -errmsg -msgtxt : str - - + +NsidcQueryError + +errmsg +msgtxt : str + + - + icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + - + icepyx.core.APIformatting.Parameters - -Parameters - -fmted_keys -partype -poss_keys - -build_params() -check_req_values() -check_values() + +Parameters + +fmted_keys +partype +poss_keys + +build_params() +check_req_values() +check_values() icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_CMRparams + + +_CMRparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_reqparams + + +_reqparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.query.Query->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + - + icepyx.core.read.Read - -Read - -data_source -vars - -load() + +Read + +data_source +vars + +load() - + icepyx.core.spatial.Spatial - -Spatial - -extent -extent_as_gdf -extent_file -extent_type - -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +extent +extent_as_gdf +extent_file +extent_type + +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial - + icepyx.core.temporal.Temporal - -Temporal - -end -start - - + +Temporal + +end +start + + icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal - + icepyx.core.variables.Variables - -Variables - -path : NoneType -product : NoneType -wanted : dict, NoneType - -append(defaults, var_list, beam_list, keyword_list) -avail(options, internal) -parse_var_list(varlist, tiered, tiered_vars) -remove(all, var_list, beam_list, keyword_list) + +Variables + +path : NoneType +product : NoneType +wanted : NoneType, dict + +append(defaults, var_list, beam_list, keyword_list) +avail(options, internal) +parse_var_list(varlist, tiered, tiered_vars) +remove(all, var_list, beam_list, keyword_list) icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_file_vars + + +_file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars - + icepyx.core.visualization.Visualize - -Visualize - -bbox : list -cycles : NoneType -date_range : NoneType -product : str, NoneType -tracks : NoneType - -generate_OA_parameters(): list -grid_bbox(binsize): list -make_request(base_url, payload) -parallel_request_OA(): da.array -query_icesat2_filelist(): tuple -request_OA_data(paras): da.array -viz_elevation(): (hv.DynamicMap, hv.Layout) + +Visualize + +bbox : list +cycles : NoneType +date_range : NoneType +product : NoneType, str +tracks : NoneType + +generate_OA_parameters(): list +grid_bbox(binsize): list +make_request(base_url, payload) +parallel_request_OA(): da.array +query_icesat2_filelist(): tuple +request_OA_data(paras): da.array +viz_elevation(): (hv.DynamicMap, hv.Layout) diff --git a/doc/source/user_guide/documentation/packages_user_uml.svg b/doc/source/user_guide/documentation/packages_user_uml.svg index 3422b44e8..44a041c77 100644 --- a/doc/source/user_guide/documentation/packages_user_uml.svg +++ b/doc/source/user_guide/documentation/packages_user_uml.svg @@ -4,11 +4,11 @@ - + packages_user_uml - + icepyx.core @@ -30,8 +30,8 @@ icepyx.core.exceptions - -icepyx.core.exceptions + +icepyx.core.exceptions @@ -42,14 +42,14 @@ icepyx.core.icesat2data - -icepyx.core.icesat2data + +icepyx.core.icesat2data icepyx.core.is2ref - -icepyx.core.is2ref + +icepyx.core.is2ref @@ -96,35 +96,41 @@ icepyx.core.read - -icepyx.core.read + +icepyx.core.read - + +icepyx.core.read->icepyx.core.exceptions + + + + + icepyx.core.read->icepyx.core.variables - - + + icepyx.core.spatial - -icepyx.core.spatial + +icepyx.core.spatial icepyx.core.temporal - -icepyx.core.temporal + +icepyx.core.temporal icepyx.core.validate_inputs - -icepyx.core.validate_inputs + +icepyx.core.validate_inputs - + icepyx.core.variables->icepyx.core.auth diff --git a/icepyx/core/exceptions.py b/icepyx/core/exceptions.py index 2c29f1fa0..a36a1b645 100644 --- a/icepyx/core/exceptions.py +++ b/icepyx/core/exceptions.py @@ -1,3 +1,10 @@ +class DeprecationError(Exception): + """ + Class raised for use of functionality that is no longer supported by icepyx. + """ + pass + + class QueryError(Exception): """ Base class for Query object exceptions @@ -20,3 +27,5 @@ def __init__( def __str__(self): return f"{self.msgtxt}: {self.errmsg}" + + diff --git a/icepyx/core/read.py b/icepyx/core/read.py index c276c5e53..a7ee15db7 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -5,6 +5,7 @@ import numpy as np import xarray as xr +from icepyx.core.exceptions import DeprecationError import icepyx.core.is2ref as is2ref from icepyx.core.variables import Variables as Variables from icepyx.core.variables import list_of_dict_vals @@ -278,6 +279,11 @@ class Read: String that shows the filename pattern as required for Intake's path_as_pattern argument. The default describes files downloaded directly from NSIDC (subsetted and non-subsetted) for most products (e.g. ATL06). The ATL11 filename pattern from NSIDC is: 'ATL{product:2}_{rgt:4}{orbitsegment:2}_{cycles:4}_{version:3}_{revision:2}.h5'. + + catalog : string, default None + Full path to an Intake catalog for reading in data. + If you still need to create a catalog, leave as default. + **Deprecation warning:** This argument has been depreciated. Please use the data_source argument to pass in valid data. out_obj_type : object, default xarray.Dataset The desired format for the data to be read in. @@ -301,8 +307,16 @@ def __init__( data_source=None, product=None, filename_pattern="ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5", + catalog=None, out_obj_type=None, # xr.Dataset, ): + # Raise error for depreciated argument + if catalog: + raise DeprecationError( + 'The `catalog` argument has been deprecated and intake is no longer supported. ' + 'Please use the `data_source` argument to specify your dataset instead.' + ) + if data_source is None: raise ValueError("Please provide a data source.") else: From eec037e4d1a8ea3dd0df04380fd1d6ce187a5298 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 14 Sep 2023 09:28:45 -0400 Subject: [PATCH 04/21] Release v0.8.0 (#443) --------- Co-authored-by: Rachel Wegener --- doc/source/user_guide/changelog/index.rst | 10 ++- doc/source/user_guide/changelog/template.rst | 2 - doc/source/user_guide/changelog/v0.8.0.rst | 86 ++++++++++++++++++++ 3 files changed, 95 insertions(+), 3 deletions(-) create mode 100644 doc/source/user_guide/changelog/v0.8.0.rst diff --git a/doc/source/user_guide/changelog/index.rst b/doc/source/user_guide/changelog/index.rst index c781ea041..edd1c9884 100644 --- a/doc/source/user_guide/changelog/index.rst +++ b/doc/source/user_guide/changelog/index.rst @@ -6,9 +6,17 @@ icepyx ChangeLog This is the list of changes made to icepyx in between each release. Full details can be found in the `commit logs `_. -Latest Release (Version 0.7.0) +Latest Release (Version 0.8.0) ------------------------------ +.. toctree:: + :maxdepth: 2 + + v0.8.0 + +Version 0.7.0 +------------- + .. toctree:: :maxdepth: 2 diff --git a/doc/source/user_guide/changelog/template.rst b/doc/source/user_guide/changelog/template.rst index e7957d265..c09dd972f 100644 --- a/doc/source/user_guide/changelog/template.rst +++ b/doc/source/user_guide/changelog/template.rst @@ -1,5 +1,3 @@ -.. _whatsnew_0x0: - What's new in 0.4.0 (DD MONTH YYYY) ----------------------------------- diff --git a/doc/source/user_guide/changelog/v0.8.0.rst b/doc/source/user_guide/changelog/v0.8.0.rst new file mode 100644 index 000000000..4f60f57f4 --- /dev/null +++ b/doc/source/user_guide/changelog/v0.8.0.rst @@ -0,0 +1,86 @@ +What's new in 0.8.0 (12 September 2023) +----------------------------------- + +These are the changes in icepyx 0.8.0 See :ref:`release` for a full changelog +including other versions of icepyx. + + +New Features +~~~~~~~~~~~~ + +- create temporal module and add input types and testing (#327) + + - create temporal module + - create temporal testing module + - add support for more temporal input types (datetime objects) and formats (dict) + - temporal docstring, user guide updates + - updated example notebook for new temporal inputs + - update temporal info in data access tutorial example notebook + - GitHub action UML generation auto-update + +- Refactor authentication (#435) + + - modularize authentication using a mixin class + - add docstrings and update example notebooks + - add tests + +- add atl23 (new product) to lists and tests (#445) + + +Deprecations +~~~~~~~~~~~~ + +- Remove intake catalog from Read module (#438) + + - delete is2cat.py and references + - remove intake and related modules + +- Raise warning for use of catalog in Read module (#446) + + +Maintenance +^^^^^^^^^^^ + +- update codecov action and remove from deps (#421) + +- is2ref tests for product formatting and default var lists (#424) + +- get s3urls for all data products and update doctests to v006 (#426) + + - Always send CMR query to provider NSIDC_CPRD to make sure s3 urls are returned. + +- Traffic updates 2023 Feb-Aug (#442) + +Documentation +^^^^^^^^^^^^^ + +- update install instructions (#409) + + - add s3fs as requirement to make cloud access default + - transition to recommending mamba over conda + +- add release guide to docs (#255) + +- docs maintenance and pubs/citations update (#422) + + - add JOSS to bib and badges + - switch zenodo links to nonversioned icepyx + + +Other +^^^^^ + +- JOSS submission (#361) + + Matches Release v0.6.4_JOSS per #420 plus a few editorial edits available via the pubs/joss branch. + +- update and clarify authorship, citation, and attribution policies (#419) + + - add CITATION.cff file + - update citation docs with Zenodo doi and 'icepyx Developers' as author + + +Contributors +~~~~~~~~~~~~ + +.. contributors:: v0.7.0..v0.8.0|HEAD From eb668bdbd7a11adfa441fc9321040c011fc22919 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Tue, 14 Nov 2023 14:57:27 -0500 Subject: [PATCH 05/21] fix ATL08 delta_time dimension read error (#470) Co-authored-by: GitHub Action --- .../documentation/classes_dev_uml.svg | 88 +++++++++---------- icepyx/core/read.py | 13 ++- 2 files changed, 49 insertions(+), 52 deletions(-) diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index fd5033938..8e83d4dc1 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -33,13 +33,13 @@ EarthdataAuthMixin -_auth : Auth, NoneType -_s3_initial_ts : NoneType, datetime -_s3login_credentials : NoneType, dict -_session : NoneType, Session -auth -s3login_credentials -session +_auth : NoneType +_s3_initial_ts : NoneType, datetime +_s3login_credentials : NoneType +_session : NoneType +auth +s3login_credentials +session __init__(auth) __str__() @@ -48,14 +48,14 @@ icepyx.core.query.GenQuery - -GenQuery - -_spatial -_temporal - -__init__(spatial_extent, date_range, start_time, end_time) -__str__() + +GenQuery + +_spatial +_temporal + +__init__(spatial_extent, date_range, start_time, end_time) +__str__() @@ -229,8 +229,8 @@ icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + @@ -238,7 +238,7 @@ Read -_filelist : list, NoneType +_filelist : NoneType, list _out_obj : Dataset _pattern : str _prod : str @@ -259,37 +259,37 @@ icepyx.core.spatial.Spatial - -Spatial - -_ext_type : str -_gdf_spat : GeoDataFrame, DataFrame -_geom_file : NoneType -_spatial_ext -_xdateln -extent -extent_as_gdf -extent_file -extent_type - -__init__(spatial_extent) -__str__() -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +_ext_type : str +_gdf_spat : GeoDataFrame +_geom_file : NoneType +_spatial_ext +_xdateln +extent +extent_as_gdf +extent_file +extent_type + +__init__(spatial_extent) +__str__() +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial @@ -308,9 +308,9 @@ icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal diff --git a/icepyx/core/read.py b/icepyx/core/read.py index a7ee15db7..627395be2 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -279,7 +279,7 @@ class Read: String that shows the filename pattern as required for Intake's path_as_pattern argument. The default describes files downloaded directly from NSIDC (subsetted and non-subsetted) for most products (e.g. ATL06). The ATL11 filename pattern from NSIDC is: 'ATL{product:2}_{rgt:4}{orbitsegment:2}_{cycles:4}_{version:3}_{revision:2}.h5'. - + catalog : string, default None Full path to an Intake catalog for reading in data. If you still need to create a catalog, leave as default. @@ -313,8 +313,8 @@ def __init__( # Raise error for depreciated argument if catalog: raise DeprecationError( - 'The `catalog` argument has been deprecated and intake is no longer supported. ' - 'Please use the `data_source` argument to specify your dataset instead.' + "The `catalog` argument has been deprecated and intake is no longer supported. " + "Please use the `data_source` argument to specify your dataset instead." ) if data_source is None: @@ -616,11 +616,8 @@ def _combine_nested_vars(is2ds, ds, grp_path, wanted_dict): except (AttributeError, KeyError): pass - try: - is2ds = is2ds.assign(ds[grp_spec_vars]) - except xr.MergeError: - ds = ds[grp_spec_vars].reset_coords() - is2ds = is2ds.assign(ds) + ds = ds[grp_spec_vars].swap_dims({"delta_time": "photon_idx"}) + is2ds = is2ds.assign(ds) return is2ds From 6d7c170bcbf440b47f33d72d0d37d3d6994105da Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 15 Nov 2023 09:34:06 -0500 Subject: [PATCH 06/21] v0.8.1 minor/patch release docs (#472) --- doc/source/user_guide/changelog/index.rst | 10 +++++++++- doc/source/user_guide/changelog/v0.8.1.rst | 18 ++++++++++++++++++ 2 files changed, 27 insertions(+), 1 deletion(-) create mode 100644 doc/source/user_guide/changelog/v0.8.1.rst diff --git a/doc/source/user_guide/changelog/index.rst b/doc/source/user_guide/changelog/index.rst index edd1c9884..eaaffd658 100644 --- a/doc/source/user_guide/changelog/index.rst +++ b/doc/source/user_guide/changelog/index.rst @@ -6,9 +6,17 @@ icepyx ChangeLog This is the list of changes made to icepyx in between each release. Full details can be found in the `commit logs `_. -Latest Release (Version 0.8.0) +Latest Release (Version 0.8.1) ------------------------------ +.. toctree:: + :maxdepth: 2 + + v0.8.1 + +Version 0.8.0 +------------- + .. toctree:: :maxdepth: 2 diff --git a/doc/source/user_guide/changelog/v0.8.1.rst b/doc/source/user_guide/changelog/v0.8.1.rst new file mode 100644 index 000000000..5b86c5dec --- /dev/null +++ b/doc/source/user_guide/changelog/v0.8.1.rst @@ -0,0 +1,18 @@ +What's new in 0.8.1 (14 November 2023) +------------------------------------- + +These are the changes in icepyx 0.8.1 See :ref:`release` for a full changelog +including other versions of icepyx. + + + +Bug fixes +~~~~~~~~~ + +- fix the ATL08 delta_time dimension read error (#470) + + +Contributors +~~~~~~~~~~~~ + +.. contributors:: v0.8.0..v0.8.1|HEAD From c92cd07bb650cb6de70b0eef550f3efd47fbfb49 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Mon, 25 Sep 2023 11:45:10 -0400 Subject: [PATCH 07/21] update QUEST and GenQuery classes for argo integration (#441) * Adding argo search and download script * Create get_argo.py Download the 'classic' argo data with physical variables only * begin implementing argo dataset * 1st draft implementing argo dataset * implement search_data for physical argo * doctests and general cleanup for physical argo query * beginning of BGC Argo download * parse BGC profiles into DF * plan to query BGC profiles * validate BGC param input function * order BGC params in order in which they should be queried * fix bug in parse_into_df() - init blank df to take in union of params from all profiles * identify profiles from initial API request containing all required params * creates df with only profiles that contain all user specified params Need to dload additional params * modified to populate prof df by querying individual profiles * finished up BGC argo download! * assert bounding box type in Argo init, begin framework for unit tests * Adding argo search and download script * Create get_argo.py Download the 'classic' argo data with physical variables only * begin implementing argo dataset * 1st draft implementing argo dataset * implement search_data for physical argo * doctests and general cleanup for physical argo query * beginning of BGC Argo download * parse BGC profiles into DF * plan to query BGC profiles * validate BGC param input function * order BGC params in order in which they should be queried * fix bug in parse_into_df() - init blank df to take in union of params from all profiles * identify profiles from initial API request containing all required params * creates df with only profiles that contain all user specified params Need to dload additional params * modified to populate prof df by querying individual profiles * finished up BGC argo download! * assert bounding box type in Argo init, begin framework for unit tests * need to confirm spatial extent is bbox * begin test case for available profiles * add tests for argo.py * add typing, add example json, and use it to test parsing * update argo to submit successful api request (update keys and values submitted) * first pass at porting argo over to metadata+per profile download (WIP) * basic working argo script * simplify parameter validation (ordered list no longer needed) * add option to delete existing data before new download * continue cleaning up argo.py * fix download_by_profile to properly store all downloaded data * remove old get_argo.py script * remove _filter_profiles function in favor of submitting data kwarg in request * start filling in docstrings * clean up nearly duplicate functions * add more docstrings * get a few minimal argo tests working * add bgc argo params. begin adding merge for second download runs * some changes * WIP test commit to see if can push to GH * WIP handling argo merge issue * update profile to df to return df and move merging to get_dataframe * merge profiles with existing df * clean up docstrings and code * add test_argo.py * add prelim test case for adding to Argo df * remove sandbox files * remove bgc argo test file * update variables notebook from development * simplify import statements * quickfix for granules error * draft subpage on available QUEST datasets * small reference fix in text * add reference to top of .rst file * test argo df merge * add functionality to Quest class to pass search criteria to all datasets * add functionality to Quest class to pass search criteria to all datasets * update dataset docstrings; reorder argo.py to match * implement quest search+download for IS2 * move spatial and temporal properties from query to genquery * add query docstring test for cycles,tracks to test file * add quest test module * standardize print outputs for quest search and download; is2 download needs auth updates * remove extra files from this branch * comment out argo portions of quest for PR * remove argo-branch-only init file * remove argo script from branch * remove argo test file from branch * comment out another line of argo stuff * Update quest.py Added Docstrings to functions within quest.py and edited the primary docstring for the QUEST class here. Note I did not add Docstrings to the implicit __self__ function. * Update test_quest.py Added comments (not Docstrings) to test functions * Update dataset.py Minor edits to the doc strings * Update quest.py Edited docstrings * catch error with downloading datasets in Quest; template test case for multi dataset query --------- Co-authored-by: Kelsey Bisson <48059682+kelseybisson@users.noreply.github.com> Co-authored-by: Romina Co-authored-by: zachghiaccio Co-authored-by: Zach Fair <48361714+zachghiaccio@users.noreply.github.com> --- .../contributing/quest-available-datasets.rst | 25 ++ icepyx/core/query.py | 345 +++++++++--------- icepyx/quest/__init__.py | 0 icepyx/quest/dataset_scripts/dataset.py | 90 +++-- icepyx/quest/quest.py | 104 +++++- icepyx/tests/test_query.py | 12 + icepyx/tests/test_quest.py | 80 ++++ 7 files changed, 424 insertions(+), 232 deletions(-) create mode 100644 doc/source/contributing/quest-available-datasets.rst delete mode 100644 icepyx/quest/__init__.py create mode 100644 icepyx/tests/test_quest.py diff --git a/doc/source/contributing/quest-available-datasets.rst b/doc/source/contributing/quest-available-datasets.rst new file mode 100644 index 000000000..91a6283a0 --- /dev/null +++ b/doc/source/contributing/quest-available-datasets.rst @@ -0,0 +1,25 @@ +.. _quest_supported_label: + +QUEST Supported Datasets +======================== + +On this page, we outline the datasets that are supported by the QUEST module. Click on the links for each dataset to view information about the API and sensor/data platform used. + + +List of Datasets +---------------- + +* `Argo `_ + * The Argo mission involves a series of floats that are designed to capture vertical ocean profiles of temperature, salinity, and pressure down to ~2000 m. Some floats are in support of BGC-Argo, which also includes data relevant for biogeochemical applications: oxygen, nitrate, chlorophyll, backscatter, and solar irradiance. + * (Link Kelsey's paper here) + * (Link to example workbook here) + + +Adding a Dataset to QUEST +------------------------- + +Want to add a new dataset to QUEST? No problem! QUEST includes a template script (``dataset.py``) that may be used to create your own querying module for a dataset of interest. + +Guidelines on how to construct your dataset module may be found here: (link to be added) + +Once you have developed a script with the template, you may request for the module to be added to QUEST via Github. Please see the How to Contribute page :ref:`dev_guide_label` for instructions on how to contribute to icepyx. \ No newline at end of file diff --git a/icepyx/core/query.py b/icepyx/core/query.py index e8f1d8e7c..3459fd132 100644 --- a/icepyx/core/query.py +++ b/icepyx/core/query.py @@ -12,11 +12,9 @@ import icepyx.core.APIformatting as apifmt from icepyx.core.auth import EarthdataAuthMixin import icepyx.core.granules as granules -from icepyx.core.granules import Granules as Granules +# QUESTION: why doesn't from granules import Granules work, since granules=icepyx.core.granules? +from icepyx.core.granules import Granules import icepyx.core.is2ref as is2ref - -# QUESTION: why doesn't from granules import Granules as Granules work, since granules=icepyx.core.granules? -# from icepyx.core.granules import Granules import icepyx.core.spatial as spat import icepyx.core.temporal as tp import icepyx.core.validate_inputs as val @@ -148,6 +146,177 @@ def __str__(self): ) return str + # ---------------------------------------------------------------------- + # Properties + + @property + def temporal(self): + """ + Return the Temporal object containing date/time range information for the query object. + + See Also + -------- + temporal.Temporal.start + temporal.Temporal.end + temporal.Temporal + + Examples + -------- + >>> reg_a = GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> print(reg_a.temporal) + Start date and time: 2019-02-20 00:00:00 + End date and time: 2019-02-28 23:59:59 + + >>> reg_a = GenQuery([-55, 68, -48, 71],cycles=['03','04','05','06','07'], tracks=['0849','0902']) + >>> print(reg_a.temporal) + ['No temporal parameters set'] + """ + + if hasattr(self, "_temporal"): + return self._temporal + else: + return ["No temporal parameters set"] + + @property + def spatial(self): + """ + Return the spatial object, which provides the underlying functionality for validating + and formatting geospatial objects. The spatial object has several properties to enable + user access to the stored spatial extent in multiple formats. + + See Also + -------- + spatial.Spatial.spatial_extent + spatial.Spatial.extent_type + spatial.Spatial.extent_file + spatial.Spatial + + Examples + -------- + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> reg_a.spatial # doctest: +SKIP + + + >>> print(reg_a.spatial) + Extent type: bounding_box + Coordinates: [-55.0, 68.0, -48.0, 71.0] + + """ + return self._spatial + + @property + def spatial_extent(self): + """ + Return an array showing the spatial extent of the query object. + Spatial extent is returned as an input type (which depends on how + you initially entered your spatial data) followed by the geometry data. + Bounding box data is [lower-left-longitude, lower-left-latitute, upper-right-longitude, upper-right-latitude]. + Polygon data is [longitude1, latitude1, longitude2, latitude2, + ... longitude_n,latitude_n, longitude1,latitude1]. + + Returns + ------- + tuple of length 2 + First tuple element is the spatial type ("bounding box" or "polygon"). + Second tuple element is the spatial extent as a list of coordinates. + + Examples + -------- + + # Note: coordinates returned as float, not int + >>> reg_a = GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> reg_a.spatial_extent + ('bounding_box', [-55.0, 68.0, -48.0, 71.0]) + + >>> reg_a = GenQuery([(-55, 68), (-55, 71), (-48, 71), (-48, 68), (-55, 68)],['2019-02-20','2019-02-28']) + >>> reg_a.spatial_extent + ('polygon', [-55.0, 68.0, -55.0, 71.0, -48.0, 71.0, -48.0, 68.0, -55.0, 68.0]) + + # NOTE Is this where we wanted to put the file-based test/example? + # The test file path is: examples/supporting_files/simple_test_poly.gpkg + + See Also + -------- + Spatial.extent + Spatial.extent_type + Spatial.extent_as_gdf + + """ + + return (self._spatial._ext_type, self._spatial._spatial_ext) + + @property + def dates(self): + """ + Return an array showing the date range of the query object. + Dates are returned as an array containing the start and end datetime objects, inclusive, in that order. + + Examples + -------- + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> reg_a.dates + ['2019-02-20', '2019-02-28'] + + >>> reg_a = GenQuery([-55, 68, -48, 71]) + >>> reg_a.dates + ['No temporal parameters set'] + """ + if not hasattr(self, "_temporal"): + return ["No temporal parameters set"] + else: + return [ + self._temporal._start.strftime("%Y-%m-%d"), + self._temporal._end.strftime("%Y-%m-%d"), + ] # could also use self._start.date() + + @property + def start_time(self): + """ + Return the start time specified for the start date. + + Examples + -------- + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> reg_a.start_time + '00:00:00' + + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28'], start_time='12:30:30') + >>> reg_a.start_time + '12:30:30' + + >>> reg_a = GenQuery([-55, 68, -48, 71]) + >>> reg_a.start_time + ['No temporal parameters set'] + """ + if not hasattr(self, "_temporal"): + return ["No temporal parameters set"] + else: + return self._temporal._start.strftime("%H:%M:%S") + + @property + def end_time(self): + """ + Return the end time specified for the end date. + + Examples + -------- + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28']) + >>> reg_a.end_time + '23:59:59' + + >>> reg_a = ipx.GenQuery([-55, 68, -48, 71],['2019-02-20','2019-02-28'], end_time='10:20:20') + >>> reg_a.end_time + '10:20:20' + + >>> reg_a = GenQuery([-55, 68, -48, 71]) + >>> reg_a.end_time + ['No temporal parameters set'] + """ + if not hasattr(self, "_temporal"): + return ["No temporal parameters set"] + else: + return self._temporal._end.strftime("%H:%M:%S") + # DevGoal: update docs throughout to allow for polygon spatial extent # Note: add files to docstring once implemented @@ -333,174 +502,6 @@ def product_version(self): """ return self._version - @property - def temporal(self): - """ - Return the Temporal object containing date/time range information for the query object. - - See Also - -------- - temporal.Temporal.start - temporal.Temporal.end - temporal.Temporal - - Examples - -------- - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> print(reg_a.temporal) - Start date and time: 2019-02-20 00:00:00 - End date and time: 2019-02-28 23:59:59 - - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],cycles=['03','04','05','06','07'], tracks=['0849','0902']) - >>> print(reg_a.temporal) - ['No temporal parameters set'] - """ - - if hasattr(self, "_temporal"): - return self._temporal - else: - return ["No temporal parameters set"] - - @property - def spatial(self): - """ - Return the spatial object, which provides the underlying functionality for validating - and formatting geospatial objects. The spatial object has several properties to enable - user access to the stored spatial extent in multiple formats. - - See Also - -------- - spatial.Spatial.spatial_extent - spatial.Spatial.extent_type - spatial.Spatial.extent_file - spatial.Spatial - - Examples - -------- - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> reg_a.spatial # doctest: +SKIP - - - >>> print(reg_a.spatial) - Extent type: bounding_box - Coordinates: [-55.0, 68.0, -48.0, 71.0] - - """ - return self._spatial - - @property - def spatial_extent(self): - """ - Return an array showing the spatial extent of the query object. - Spatial extent is returned as an input type (which depends on how - you initially entered your spatial data) followed by the geometry data. - Bounding box data is [lower-left-longitude, lower-left-latitute, upper-right-longitude, upper-right-latitude]. - Polygon data is [longitude1, latitude1, longitude2, latitude2, - ... longitude_n,latitude_n, longitude1,latitude1]. - - Returns - ------- - tuple of length 2 - First tuple element is the spatial type ("bounding box" or "polygon"). - Second tuple element is the spatial extent as a list of coordinates. - - Examples - -------- - - # Note: coordinates returned as float, not int - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> reg_a.spatial_extent - ('bounding_box', [-55.0, 68.0, -48.0, 71.0]) - - >>> reg_a = Query('ATL06',[(-55, 68), (-55, 71), (-48, 71), (-48, 68), (-55, 68)],['2019-02-20','2019-02-28']) - >>> reg_a.spatial_extent - ('polygon', [-55.0, 68.0, -55.0, 71.0, -48.0, 71.0, -48.0, 68.0, -55.0, 68.0]) - - # NOTE Is this where we wanted to put the file-based test/example? - # The test file path is: examples/supporting_files/simple_test_poly.gpkg - - See Also - -------- - Spatial.extent - Spatial.extent_type - Spatial.extent_as_gdf - - """ - - return (self._spatial._ext_type, self._spatial._spatial_ext) - - @property - def dates(self): - """ - Return an array showing the date range of the query object. - Dates are returned as an array containing the start and end datetime objects, inclusive, in that order. - - Examples - -------- - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> reg_a.dates - ['2019-02-20', '2019-02-28'] - - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],cycles=['03','04','05','06','07'], tracks=['0849','0902']) - >>> reg_a.dates - ['No temporal parameters set'] - """ - if not hasattr(self, "_temporal"): - return ["No temporal parameters set"] - else: - return [ - self._temporal._start.strftime("%Y-%m-%d"), - self._temporal._end.strftime("%Y-%m-%d"), - ] # could also use self._start.date() - - @property - def start_time(self): - """ - Return the start time specified for the start date. - - Examples - -------- - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> reg_a.start_time - '00:00:00' - - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28'], start_time='12:30:30') - >>> reg_a.start_time - '12:30:30' - - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],cycles=['03','04','05','06','07'], tracks=['0849','0902']) - >>> reg_a.start_time - ['No temporal parameters set'] - """ - if not hasattr(self, "_temporal"): - return ["No temporal parameters set"] - else: - return self._temporal._start.strftime("%H:%M:%S") - - @property - def end_time(self): - """ - Return the end time specified for the end date. - - Examples - -------- - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) - >>> reg_a.end_time - '23:59:59' - - >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28'], end_time='10:20:20') - >>> reg_a.end_time - '10:20:20' - - >>> reg_a = Query('ATL06',[-55, 68, -48, 71],cycles=['03','04','05','06','07'], tracks=['0849','0902']) - >>> reg_a.end_time - ['No temporal parameters set'] - """ - if not hasattr(self, "_temporal"): - return ["No temporal parameters set"] - else: - return self._temporal._end.strftime("%H:%M:%S") - @property def cycles(self): """ diff --git a/icepyx/quest/__init__.py b/icepyx/quest/__init__.py deleted file mode 100644 index e69de29bb..000000000 diff --git a/icepyx/quest/dataset_scripts/dataset.py b/icepyx/quest/dataset_scripts/dataset.py index 13e926229..e76081e08 100644 --- a/icepyx/quest/dataset_scripts/dataset.py +++ b/icepyx/quest/dataset_scripts/dataset.py @@ -1,4 +1,5 @@ import warnings +from icepyx.core.query import GenQuery warnings.filterwarnings("ignore") @@ -6,78 +7,75 @@ class DataSet: """ - Parent Class for all supported datasets (i.e. ATL03, ATL07, MODIS, etc.) - all sub classes must support the following methods for use in - colocated data class + Template parent class for all QUEST supported datasets (i.e. ICESat-2, Argo BGC, Argo, MODIS, etc.). + All sub-classes must support the following methods for use via the QUEST class. """ - def __init__(self, boundingbox, timeframe): + def __init__( + self, spatial_extent=None, date_range=None, start_time=None, end_time=None + ): """ - * use existing Icepyx functionality to initialise this - :param timeframe: datetime + Complete any dataset specific initializations (i.e. beyond space and time) required here. + For instance, ICESat-2 requires a product, and Argo requires parameters. + One can also check that the "default" space and time supplied by QUEST are the right format + (e.g. if the spatial extent must be a bounding box). """ - self.bounding_box = boundingbox - self.time_frame = timeframe - - def _fmt_coordinates(self): - # use icepyx geospatial module (icepyx core) raise NotImplementedError - def _fmt_timerange(self): + # ---------------------------------------------------------------------- + # Formatting API Inputs + + def _fmt_coordinates(self): """ - will return list of datetime objects [start_time, end_time] + Convert spatial extent into format needed by DataSet API, + if different than the formats available directly from SuperQuery. """ raise NotImplementedError - # todo: merge with Icepyx SuperQuery - def _validate_input(self): + def _fmt_timerange(self): """ - This may already be done in icepyx. - Not sure if we need this here + Convert temporal information into format needed by DataSet API, + if different than the formats available directly from SuperQuery. """ raise NotImplementedError - def search_data(self, delta_t): + # ---------------------------------------------------------------------- + # Validation + + def _validate_inputs(self): """ - query dataset given the spatio temporal criteria - and other params specic to the dataset + Create any additional validation functions for verifying inputs. + This function is not explicitly called by QUEST, + but is frequently needed for preparing API requests. + + See Also + -------- + quest.dataset_scripts.argo.Argo._validate_parameters """ raise NotImplementedError - def download(self, out_path): + # ---------------------------------------------------------------------- + # Querying and Getting Data + + def search_data(self): """ - once data is querried, the user may choose to dowload the - data locally + Query the dataset (i.e. search for available data) + given the spatiotemporal criteria and other parameters specific to the dataset. """ raise NotImplementedError - def visualize(self): + def download(self): """ - (once data is downloaded)?, makes a quick plot showing where - data are located - e.g. Plots location of Argo profile or highlights ATL03 photon track + Download the data to your local machine. """ raise NotImplementedError - def _add2colocated_plot(self): + # ---------------------------------------------------------------------- + # Working with Data + + def visualize(self): """ - Takes visualise() functionality and adds the plot to central - plot with other coincident data. This will be called by - show_area_overlap() in Colocateddata class + Tells QUEST how to plot data (for instance, which parameters to plot) on a basemap. + For ICESat-2, it might show a photon track, and for Argo it might show a profile location. """ raise NotImplementedError - - """ - The following are low priority functions - Not sure these are even worth keeping. Doesn't make sense for - all datasets. - """ - - # def get_meltpond_fraction(self): - # raise NotImplementedError - # - # def get_sea_ice_fraction(self): - # raise NotImplementedError - # - # def get_roughness(self): - # raise NotImplementedError diff --git a/icepyx/quest/quest.py b/icepyx/quest/quest.py index 2855a879c..c54e49b73 100644 --- a/icepyx/quest/quest.py +++ b/icepyx/quest/quest.py @@ -1,25 +1,26 @@ import matplotlib.pyplot as plt -from icepyx.core.query import GenQuery +from icepyx.core.query import GenQuery, Query + +# from icepyx.quest.dataset_scripts.argo import Argo # todo: implement the subclass inheritance class Quest(GenQuery): """ QUEST - Query Unify Explore SpatioTemporal - object to query, obtain, and perform basic - operations on datasets for combined analysis with ICESat-2 data products. - A new dataset can be added using the `dataset.py` template. - A list of already supported datasets is available at: - Expands the icepyx GenQuery superclass. + operations on datasets (i.e. Argo, BGC Argo, MODIS, etc) for combined analysis with ICESat-2 + data products. A new dataset can be added using the `dataset.py` template. + QUEST expands the icepyx GenQuery superclass. See the doc page for GenQuery for details on temporal and spatial input parameters. Parameters ---------- - projection : proj4 string - Not yet implemented - Ex text: a string name of projection to be used for plotting (e.g. 'Mercator', 'NorthPolarStereographic') + proj : proj4 string + Geospatial projection. + Not yet implemented Returns ------- @@ -38,7 +39,6 @@ class Quest(GenQuery): Date range: (2019-02-20 00:00:00, 2019-02-28 23:59:59) Data sets: None - # todo: make this work with real datasets Add datasets to the quest object. >>> reg_a.datasets = {'ATL07':None, 'Argo':None} @@ -61,13 +61,11 @@ def __init__( end_time=None, proj="Default", ): + """ + Tells QUEST to initialize data given the user input spatiotemporal data. + """ super().__init__(spatial_extent, date_range, start_time, end_time) self.datasets = {} - self.projection = self._determine_proj(proj) - - # todo: maybe move this to icepyx superquery class - def _determine_proj(self, proj): - return None def __str__(self): str = super(Quest, self).__str__() @@ -83,4 +81,82 @@ def __str__(self): return str + # ---------------------------------------------------------------------- + # Datasets + + def add_icesat2( + self, + product=None, + start_time=None, + end_time=None, + version=None, + cycles=None, + tracks=None, + files=None, + **kwargs, + ): + """ + Adds ICESat-2 datasets to QUEST structure. + """ + + query = Query( + product, + self._spatial.extent, + [self._temporal.start, self._temporal.end], + start_time, + end_time, + version, + cycles, + tracks, + files, + **kwargs, + ) + + self.datasets["icesat2"] = query + + # def add_argo(self, params=["temperature"], presRange=None): + + # argo = Argo(self._spatial, self._temporal, params, presRange) + # self.datasets["argo"] = argo + + # ---------------------------------------------------------------------- + # Methods (on all datasets) + + # error handling? what happens when one of i fails... + def search_all(self): + """ + Searches for requred dataset within platform (i.e. ICESat-2, Argo) of interest. + """ + print("\nSearching all datasets...") + + for i in self.datasets.values(): + print() + try: + # querying ICESat-2 data + if isinstance(i, Query): + print("---ICESat-2---") + msg = i.avail_granules() + print(msg) + else: # querying all other data sets + print(i) + i.search_data() + except: + dataset_name = type(i).__name__ + print("Error querying data from {0}".format(dataset_name)) + + # error handling? what happens when one of i fails... + def download_all(self, path=""): + ' ' 'Downloads requested dataset(s).' ' ' + print("\nDownloading all datasets...") + + for i in self.datasets.values(): + print() + if isinstance(i, Query): + print("---ICESat-2---") + msg = i.download_granules(path) + print(msg) + else: + i.download() + print(i) + # DEVNOTE: see colocated data branch and phyto team files for code that expands quest functionality diff --git a/icepyx/tests/test_query.py b/icepyx/tests/test_query.py index 55b25ef4a..7738c424a 100644 --- a/icepyx/tests/test_query.py +++ b/icepyx/tests/test_query.py @@ -41,6 +41,18 @@ def test_icepyx_boundingbox_query(): assert obs_tuple == exp_tuple +def test_temporal_properties_cycles_tracks(): + reg_a = ipx.Query( + "ATL06", + [-55, 68, -48, 71], + cycles=["03", "04", "05", "06", "07"], + tracks=["0849", "0902"], + ) + exp = ["No temporal parameters set"] + + assert [obs == exp for obs in (reg_a.dates, reg_a.start_time, reg_a.end_time)] + + # Tests need to add (given can't do them within docstrings/they're behind NSIDC login) # reqparams post-order # product_all_info diff --git a/icepyx/tests/test_quest.py b/icepyx/tests/test_quest.py new file mode 100644 index 000000000..043ee159e --- /dev/null +++ b/icepyx/tests/test_quest.py @@ -0,0 +1,80 @@ +import pytest +import re + +import icepyx as ipx +from icepyx.quest.quest import Quest + + +@pytest.fixture +def quest_instance(scope="module", autouse=True): + bounding_box = [-150, 30, -120, 60] + date_range = ["2022-06-07", "2022-06-14"] + my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) + return my_quest + + +########## PER-DATASET ADDITION TESTS ########## + +# Paramaterize these add_dataset tests once more datasets are added +def test_add_is2(quest_instance): + # Add ATL06 as a test to QUEST + + prod = "ATL06" + quest_instance.add_icesat2(product=prod) + exp_key = "icesat2" + exp_type = ipx.Query + + obs = quest_instance.datasets + + assert type(obs) == dict + assert exp_key in obs.keys() + assert type(obs[exp_key]) == exp_type + assert quest_instance.datasets[exp_key].product == prod + + +# def test_add_argo(quest_instance): +# params = ["down_irradiance412", "temperature"] +# quest_instance.add_argo(params=params) +# exp_key = "argo" +# exp_type = ipx.quest.dataset_scripts.argo.Argo + +# obs = quest_instance.datasets + +# assert type(obs) == dict +# assert exp_key in obs.keys() +# assert type(obs[exp_key]) == exp_type +# assert quest_instance.datasets[exp_key].params == params + +# def test_add_multiple_datasets(): +# bounding_box = [-150, 30, -120, 60] +# date_range = ["2022-06-07", "2022-06-14"] +# my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) +# +# # print(my_quest.spatial) +# # print(my_quest.temporal) +# +# # my_quest.add_argo(params=["down_irradiance412", "temperature"]) +# # print(my_quest.datasets["argo"].params) +# +# my_quest.add_icesat2(product="ATL06") +# # print(my_quest.datasets["icesat2"].product) +# +# print(my_quest) +# +# # my_quest.search_all() +# # +# # # this one still needs work for IS2 because of auth... +# # my_quest.download_all() + +########## ALL DATASET METHODS TESTS ########## + +# is successful execution enough here? +# each of the query functions should be tested in their respective modules +def test_search_all(quest_instance): + # Search and test all datasets + quest_instance.search_all() + + +def test_download_all(): + # this will require auth in some cases... + pass From 80c4dc1e77081e6e369fce19b4f409916d4520b2 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 18 Oct 2023 13:05:10 -0400 Subject: [PATCH 08/21] temporarily disable OpenAltimetry API tests (#459) * add OA API warning * comment out tests that use OA API --------- Co-authored-by: GitHub Action --- .../documentation/classes_dev_uml.svg | 300 +++++++++--------- .../documentation/classes_user_uml.svg | 206 ++++++------ icepyx/core/visualization.py | 9 +- icepyx/tests/test_visualization.py | 3 +- 4 files changed, 263 insertions(+), 255 deletions(-) diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index 8e83d4dc1..34e13b41c 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -4,11 +4,11 @@ - - + + classes_dev_uml - + icepyx.core.auth.AuthenticationError @@ -30,32 +30,38 @@ icepyx.core.auth.EarthdataAuthMixin - -EarthdataAuthMixin - -_auth : NoneType -_s3_initial_ts : NoneType, datetime -_s3login_credentials : NoneType -_session : NoneType -auth -s3login_credentials -session - -__init__(auth) -__str__() -earthdata_login(uid, email, s3token): None + +EarthdataAuthMixin + +_auth : Auth, NoneType +_s3_initial_ts : NoneType, datetime +_s3login_credentials : NoneType, dict +_session : NoneType +auth +s3login_credentials +session + +__init__(auth) +__str__() +earthdata_login(uid, email, s3token): None icepyx.core.query.GenQuery - -GenQuery - -_spatial -_temporal - -__init__(spatial_extent, date_range, start_time, end_time) -__str__() + +GenQuery + +_spatial +_temporal +dates +end_time +spatial +spatial_extent +start_time +temporal + +__init__(spatial_extent, date_range, start_time, end_time) +__str__() @@ -75,38 +81,32 @@ icepyx.core.query.Query - -Query - -CMRparams -_CMRparams -_about_product -_cust_options : dict -_cycles : list -_file_vars -_granules -_order_vars -_prod : NoneType, str -_readable_granule_name : list -_reqparams -_source : str -_subsetparams : NoneType -_tracks : list -_version -cycles -dataset -dates -end_time -file_vars -granules -order_vars -product -product_version -reqparams -spatial -spatial_extent -start_time -temporal + +Query + +CMRparams +_CMRparams +_about_product +_cust_options : dict +_cycles : list +_file_vars +_granules +_order_vars +_prod : NoneType, str +_readable_granule_name : list +_reqparams +_source : str +_subsetparams : NoneType +_tracks : list +_version +cycles +dataset +file_vars +granules +order_vars +product +product_version +reqparams tracks __init__(product, spatial_extent, date_range, start_time, end_time, version, cycles, tracks, files, auth) @@ -125,15 +125,15 @@ icepyx.core.granules.Granules->icepyx.core.query.Query - - + + _granules icepyx.core.granules.Granules->icepyx.core.query.Query - - + + _granules @@ -160,17 +160,17 @@ icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + @@ -195,122 +195,122 @@ icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _CMRparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _reqparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _subsetparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _subsetparams icepyx.core.query.Query->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + icepyx.core.read.Read - -Read - -_filelist : NoneType, list -_out_obj : Dataset -_pattern : str -_prod : str -_read_vars -_source_type : str -data_source -vars - -__init__(data_source, product, filename_pattern, catalog, out_obj_type) -_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) -_build_dataset_template(file) -_build_single_file_dataset(file, groups_list) -_check_source_for_pattern(source, filename_pattern) -_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) -_read_single_grp(file, grp_path) -load() + +Read + +_filelist : NoneType, list +_out_obj : Dataset +_pattern : str +_prod : str +_read_vars +_source_type : str +data_source +vars + +__init__(data_source, product, filename_pattern, catalog, out_obj_type) +_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) +_build_dataset_template(file) +_build_single_file_dataset(file, groups_list) +_check_source_for_pattern(source, filename_pattern) +_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) +_read_single_grp(file, grp_path) +load() icepyx.core.spatial.Spatial - -Spatial - -_ext_type : str -_gdf_spat : GeoDataFrame -_geom_file : NoneType -_spatial_ext -_xdateln -extent -extent_as_gdf -extent_file -extent_type - -__init__(spatial_extent) -__str__() -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +_ext_type : str +_gdf_spat : GeoDataFrame +_geom_file : NoneType +_spatial_ext +_xdateln +extent +extent_as_gdf +extent_file +extent_type + +__init__(spatial_extent) +__str__() +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.temporal.Temporal - -Temporal - -_end : datetime -_start : datetime -end -start - -__init__(date_range, start_time, end_time) -__str__() + +Temporal + +_end : datetime +_start : datetime +end +start + +__init__(date_range, start_time, end_time) +__str__() icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal @@ -339,36 +339,36 @@ icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.variables.Variables->icepyx.core.query.Query - - + + _order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_file_vars + + +_file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars diff --git a/doc/source/user_guide/documentation/classes_user_uml.svg b/doc/source/user_guide/documentation/classes_user_uml.svg index 1c9184379..640f76815 100644 --- a/doc/source/user_guide/documentation/classes_user_uml.svg +++ b/doc/source/user_guide/documentation/classes_user_uml.svg @@ -4,11 +4,11 @@ - - + + classes_user_uml - + icepyx.core.auth.AuthenticationError @@ -30,23 +30,29 @@ icepyx.core.auth.EarthdataAuthMixin - -EarthdataAuthMixin - -auth -s3login_credentials -session - -earthdata_login(uid, email, s3token): None + +EarthdataAuthMixin + +auth +s3login_credentials +session + +earthdata_login(uid, email, s3token): None icepyx.core.query.GenQuery - -GenQuery - - - + +GenQuery + +dates +end_time +spatial +spatial_extent +start_time +temporal + + @@ -64,24 +70,18 @@ icepyx.core.query.Query - -Query - -CMRparams -cycles -dataset -dates -end_time -file_vars -granules -order_vars -product -product_version -reqparams -spatial -spatial_extent -start_time -temporal + +Query + +CMRparams +cycles +dataset +file_vars +granules +order_vars +product +product_version +reqparams tracks avail_granules(ids, cycles, tracks, cloud) @@ -98,15 +98,15 @@ icepyx.core.granules.Granules->icepyx.core.query.Query - - + + _granules icepyx.core.granules.Granules->icepyx.core.query.Query - - + + _granules @@ -132,17 +132,17 @@ icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + @@ -161,99 +161,99 @@ icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _CMRparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _reqparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _subsetparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - + + _subsetparams icepyx.core.query.Query->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + icepyx.core.read.Read - -Read - -data_source -vars - -load() + +Read + +data_source +vars + +load() icepyx.core.spatial.Spatial - -Spatial - -extent -extent_as_gdf -extent_file -extent_type - -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +extent +extent_as_gdf +extent_file +extent_type + +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.temporal.Temporal - -Temporal - -end -start - - + +Temporal + +end +start + + icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal @@ -273,35 +273,35 @@ icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.variables.Variables->icepyx.core.query.Query - - + + _order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_file_vars + + +_file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - - + + _read_vars diff --git a/icepyx/core/visualization.py b/icepyx/core/visualization.py index c6bef2333..a2b8fe5dc 100644 --- a/icepyx/core/visualization.py +++ b/icepyx/core/visualization.py @@ -4,6 +4,7 @@ import concurrent.futures import datetime import re +import warnings import backoff import dask.array as da @@ -332,7 +333,13 @@ def request_OA_data(self, paras) -> da.array: A dask array containing the ICESat-2 elevation data. """ - base_url = "https://openaltimetry.org/data/api/icesat2/level3a" + warnings.warn( + "NOTICE: visualizations requiring the OpenAltimetry API are currently (October 2023) ", + "unavailable while hosting of OpenAltimetry transitions from UCSD to NSIDC.", + "A ticket has been issued to restore programmatic API access.", + ) + + base_url = "http://openaltimetry.earthdatacloud.nasa.gov/data/api/icesat2" trackId, Date, cycle, bbox, product = paras # Generate API diff --git a/icepyx/tests/test_visualization.py b/icepyx/tests/test_visualization.py index 8056a453f..0a1f2fa43 100644 --- a/icepyx/tests/test_visualization.py +++ b/icepyx/tests/test_visualization.py @@ -70,7 +70,7 @@ def test_gran_paras(filename, expect): # 2023-01-27: for the commented test below, r (in visualization line 444) is returning None even though I can see OA data there via a browser - +""" @pytest.mark.parametrize( "product, date_range, bbox, expect", [ @@ -112,3 +112,4 @@ def test_visualization_orbits(product, bbox, cycles, tracks, expect): data_size = region_viz.parallel_request_OA().size assert data_size == expect +""" From 8ba390b6b620bba8e0b145a195a641f0c588c225 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 18 Oct 2023 13:23:05 -0400 Subject: [PATCH 09/21] fix spot number calculation (#458) --------- Co-authored-by: GitHub Action --- icepyx/core/is2ref.py | 23 ++++++++++++++--------- icepyx/core/visualization.py | 6 +++--- icepyx/tests/test_is2ref.py | 16 ++++++++-------- 3 files changed, 25 insertions(+), 20 deletions(-) diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index 52cf0e3a1..a3a0311bb 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -265,8 +265,11 @@ def _default_varlists(product): return common_list -# dev goal: check and test this function def gt2spot(gt, sc_orient): + warnings.warn( + "icepyx versions 0.8.0 and earlier used an incorrect spot number calculation." + "As a result, computations depending on spot number may be incorrect and should be redone." + ) assert gt in [ "gt1l", @@ -280,12 +283,13 @@ def gt2spot(gt, sc_orient): gr_num = np.uint8(gt[2]) gr_lr = gt[3] + # spacecraft oriented forward if sc_orient == 1: if gr_num == 1: if gr_lr == "l": - spot = 2 + spot = 6 elif gr_lr == "r": - spot = 1 + spot = 5 elif gr_num == 2: if gr_lr == "l": spot = 4 @@ -293,16 +297,17 @@ def gt2spot(gt, sc_orient): spot = 3 elif gr_num == 3: if gr_lr == "l": - spot = 6 + spot = 2 elif gr_lr == "r": - spot = 5 + spot = 1 + # spacecraft oriented backward elif sc_orient == 0: if gr_num == 1: if gr_lr == "l": - spot = 5 + spot = 1 elif gr_lr == "r": - spot = 6 + spot = 2 elif gr_num == 2: if gr_lr == "l": spot = 3 @@ -310,9 +315,9 @@ def gt2spot(gt, sc_orient): spot = 4 elif gr_num == 3: if gr_lr == "l": - spot = 1 + spot = 5 elif gr_lr == "r": - spot = 2 + spot = 6 if "spot" not in locals(): raise ValueError("Could not compute the spot number.") diff --git a/icepyx/core/visualization.py b/icepyx/core/visualization.py index a2b8fe5dc..32c81e3e7 100644 --- a/icepyx/core/visualization.py +++ b/icepyx/core/visualization.py @@ -334,9 +334,9 @@ def request_OA_data(self, paras) -> da.array: """ warnings.warn( - "NOTICE: visualizations requiring the OpenAltimetry API are currently (October 2023) ", - "unavailable while hosting of OpenAltimetry transitions from UCSD to NSIDC.", - "A ticket has been issued to restore programmatic API access.", + "NOTICE: visualizations requiring the OpenAltimetry API are currently (October 2023) " + "unavailable while hosting of OpenAltimetry transitions from UCSD to NSIDC." + "A ticket has been issued to restore programmatic API access." ) base_url = "http://openaltimetry.earthdatacloud.nasa.gov/data/api/icesat2" diff --git a/icepyx/tests/test_is2ref.py b/icepyx/tests/test_is2ref.py index 8d50568fe..b22709c98 100644 --- a/icepyx/tests/test_is2ref.py +++ b/icepyx/tests/test_is2ref.py @@ -556,12 +556,12 @@ def test_unsupported_default_varlist(): def test_gt2spot_sc_orient_1(): # gt1l obs = is2ref.gt2spot("gt1l", 1) - expected = 2 + expected = 6 assert obs == expected # gt1r obs = is2ref.gt2spot("gt1r", 1) - expected = 1 + expected = 5 assert obs == expected # gt2l @@ -576,24 +576,24 @@ def test_gt2spot_sc_orient_1(): # gt3l obs = is2ref.gt2spot("gt3l", 1) - expected = 6 + expected = 2 assert obs == expected # gt3r obs = is2ref.gt2spot("gt3r", 1) - expected = 5 + expected = 1 assert obs == expected def test_gt2spot_sc_orient_0(): # gt1l obs = is2ref.gt2spot("gt1l", 0) - expected = 5 + expected = 1 assert obs == expected # gt1r obs = is2ref.gt2spot("gt1r", 0) - expected = 6 + expected = 2 assert obs == expected # gt2l @@ -608,10 +608,10 @@ def test_gt2spot_sc_orient_0(): # gt3l obs = is2ref.gt2spot("gt3l", 0) - expected = 1 + expected = 5 assert obs == expected # gt3r obs = is2ref.gt2spot("gt3r", 0) - expected = 2 + expected = 6 assert obs == expected From 9727e3e3877e818c338c0e347c003f571d64b219 Mon Sep 17 00:00:00 2001 From: Whyjay Zheng Date: Thu, 19 Oct 2023 01:39:02 +0800 Subject: [PATCH 10/21] Fix a broken link in IS2_data_access.ipynb (#456) --- doc/source/example_notebooks/IS2_data_access.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/example_notebooks/IS2_data_access.ipynb b/doc/source/example_notebooks/IS2_data_access.ipynb index d9d50cdc0..0b4a12244 100644 --- a/doc/source/example_notebooks/IS2_data_access.ipynb +++ b/doc/source/example_notebooks/IS2_data_access.ipynb @@ -79,7 +79,7 @@ "\n", "There are three required inputs, depending on how you want to search for data. Two are required in all cases:\n", "- `short_name` = the data product of interest, known as its \"short name\".\n", - "See https://nsidc.org/data/icesat-2/data-sets for a list of the available data products.\n", + "See https://nsidc.org/data/icesat-2/products for a list of the available data products.\n", "- `spatial extent` = a region of interest to search within. This can be entered as a bounding box, polygon vertex coordinate pairs, or a polygon geospatial file (currently shp, kml, and gpkg are supported).\n", " - bounding box: Given in decimal degrees for the lower left longitude, lower left latitude, upper right longitude, and upper right latitude\n", " - polygon vertices: Given as longitude, latitude coordinate pairs of decimal degrees with the last entry a repeat of the first.\n", From e591d83ed92d67ca368f10a3af709d5c9ff3cb80 Mon Sep 17 00:00:00 2001 From: Rachel Wegener <35503632+rwegener2@users.noreply.github.com> Date: Wed, 18 Oct 2023 14:02:15 -0400 Subject: [PATCH 11/21] update Read input arguments (#444) * add filelist and product properties to Read object * deprecate filename_pattern and product class Read inputs * transition to data_source input as a string (including glob string) or list * update tutorial with changes and user guidance for using glob --------- Co-authored-by: Jessica Scheick --- .../example_notebooks/IS2_data_read-in.ipynb | 182 +++++++++++----- .../documentation/classes_dev_uml.svg | 122 +++++------ .../documentation/classes_user_uml.svg | 21 +- doc/source/user_guide/documentation/read.rst | 2 + icepyx/core/is2ref.py | 5 +- icepyx/core/read.py | 200 +++++++++++++----- icepyx/tests/test_is2ref.py | 4 +- 7 files changed, 353 insertions(+), 183 deletions(-) diff --git a/doc/source/example_notebooks/IS2_data_read-in.ipynb b/doc/source/example_notebooks/IS2_data_read-in.ipynb index 115c63044..9bbac368b 100644 --- a/doc/source/example_notebooks/IS2_data_read-in.ipynb +++ b/doc/source/example_notebooks/IS2_data_read-in.ipynb @@ -63,9 +63,8 @@ "metadata": {}, "outputs": [], "source": [ - "path_root = '/full/path/to/your/data/'\n", - "pattern = \"processed_ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"\n", - "reader = ipx.Read(path_root, \"ATL06\", pattern) # or ipx.Read(filepath, \"ATLXX\") if your filenames match the default pattern" + "path_root = '/full/path/to/your/ATL06_data/'\n", + "reader = ipx.Read(path_root)" ] }, { @@ -111,10 +110,9 @@ "\n", "Reading in ICESat-2 data with icepyx happens in a few simple steps:\n", "1. Let icepyx know where to find your data (this might be local files or urls to data in cloud storage)\n", - "2. Tell icepyx how to interpret the filename format\n", - "3. Create an icepyx `Read` object\n", - "4. Make a list of the variables you want to read in (does not apply for gridded products)\n", - "5. Load your data into memory (or read it in lazily, if you're using Dask)\n", + "2. Create an icepyx `Read` object\n", + "3. Make a list of the variables you want to read in (does not apply for gridded products)\n", + "4. Load your data into memory (or read it in lazily, if you're using Dask)\n", "\n", "We go through each of these steps in more detail in this notebook." ] @@ -168,21 +166,18 @@ { "cell_type": "markdown", "id": "e8da42c1", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Step 1: Set data source path\n", "\n", "Provide a full path to the data to be read in (i.e. opened).\n", "Currently accepted inputs are:\n", - "* a directory\n", - "* a single file\n", - "\n", - "All files to be read in *must* have a consistent filename pattern.\n", - "If a directory is supplied as the data source, all files in any subdirectories that match the filename pattern will be included.\n", - "\n", - "S3 bucket data access is currently under development, and requires you are registered with NSIDC as a beta tester for cloud-based ICESat-2 data.\n", - "icepyx is working to ensure a smooth transition to working with remote files.\n", - "We'd love your help exploring and testing these features as they become available!" + "* a string path to directory - all files from the directory will be opened\n", + "* a string path to single file - one file will be opened\n", + "* a list of filepaths - all files in the list will be opened\n", + "* a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened" ] }, { @@ -208,86 +203,147 @@ { "cell_type": "code", "execution_count": null, - "id": "e683ebf7", + "id": "fac636c2-e0eb-4e08-adaa-8f47623e46a1", "metadata": {}, "outputs": [], "source": [ - "# urlpath = 's3://nsidc-cumulus-prod-protected/ATLAS/ATL03/004/2019/11/30/ATL03_20191130221008_09930503_004_01.h5'" + "# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5', \n", + "# '/my/other/data/ATL06/processed_ATL06_20191202102922_10160505_006_01.h5']" ] }, { "cell_type": "markdown", - "id": "92743496", + "id": "ba3ebeb0-3091-4712-b0f7-559ddb95ca5a", "metadata": { "user_expressions": [] }, "source": [ - "### Step 2: Create a filename pattern for your data files\n", + "#### Glob Strings\n", + "\n", + "[glob](https://docs.python.org/3/library/glob.html) is a Python library which allows users to list files in their file systems whose paths match a given pattern. Icepyx uses the glob library to give users greater flexibility over their input file lists.\n", + "\n", + "glob works using `*` and `?` as wildcard characters, where `*` matches any number of characters and `?` matches a single character. For example:\n", "\n", - "Files provided by NSIDC typically match the format `\"ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"` where the parameters in curly brackets indicate a parameter name (left of the colon) and character length or format (right of the colon).\n", - "Some of this information is used during data opening to help correctly read and label the data within the data structure, particularly when multiple files are opened simultaneously.\n", + "* `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: \"/this/path/processed_ATL03_20191130221008_09930503_006_01.h5\" or \"/this/path/myfavoriteicsat-2file.h5\")\n", + "* `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: \"/this/path/ATL07-02_20221012220720_03391701_005_01.h5\" or \"/this/path/processed_ATL07.h5\")\n", + "* `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: \"/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5\", \"/this/path/ATL06/myfile.h5\")\n", "\n", - "By default, icepyx will assume your filenames follow the default format.\n", - "However, you can easily read in other ICESat-2 data files by supplying your own filename pattern.\n", - "For instance, `pattern=\"ATL{product:2}-{datetime:%Y%m%d%H%M%S}-Sample.h5\"`. A few example patterns are provided below." + "See the glob documentation or other online explainer tutorials for more in depth explanation, or advanced glob paths such as character classes and ranges." ] }, { - "cell_type": "code", - "execution_count": null, - "id": "7318abd0", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "20286c76-5632-4420-b2c9-a5a6b1952672", + "metadata": { + "user_expressions": [] + }, + "source": [ + "#### Recursive Directory Search" + ] + }, + { + "cell_type": "markdown", + "id": "632bd1ce-2397-4707-a63f-9d5d2fc02fbc", + "metadata": { + "user_expressions": [] + }, + "source": [ + "glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so.\n", + "\n", + "If you would like to search recursively, you can achieve this by either:\n", + "1. passing the `recursive` argument into `glob_kwargs` and including `\\**\\` in your filepath\n", + "2. using glob directly to create a list of filepaths\n", + "\n", + "Each of these two methods are shown below." + ] + }, + { + "cell_type": "markdown", + "id": "da0cacd8-9ddc-4c31-86b6-167d850b989e", + "metadata": { + "user_expressions": [] + }, "source": [ - "# pattern = 'ATL06-{datetime:%Y%m%d%H%M%S}-Sample.h5'\n", - "# pattern = 'ATL{product:2}-{datetime:%Y%m%d%H%M%S}-Sample.h5'" + "Method 1: passing the `recursive` argument into `glob_kwargs`" ] }, { "cell_type": "code", "execution_count": null, - "id": "f43e8664", + "id": "e276b876-9ec7-4991-8520-05c97824b896", "metadata": {}, "outputs": [], "source": [ - "# pattern = \"ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"" + "ipx.Read('/path/to/**/folder', glob_kwargs={'recursive': True})" + ] + }, + { + "cell_type": "markdown", + "id": "f5a1e85e-fc4a-405f-9710-0cb61b827f2c", + "metadata": { + "user_expressions": [] + }, + "source": [ + "You can use `glob_kwargs` for any additional argument to Python's builtin `glob.glob` that you would like to pass in via icepyx." + ] + }, + { + "cell_type": "markdown", + "id": "76de9539-710c-49f6-9e9e-238849382c33", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Method 2: using glob directly to create a list of filepaths" ] }, { "cell_type": "code", "execution_count": null, - "id": "992a77fb", + "id": "be79b0dd-efcf-4d50-bdb0-8e3ae8e8e38c", "metadata": {}, "outputs": [], "source": [ - "# grid_pattern = \"ATL{product:2}_GL_0311_{res:3}m_{version:3}_{revision:2}.nc\"" + "import glob" ] }, { "cell_type": "code", "execution_count": null, - "id": "6aec1a70", - "metadata": {}, + "id": "5d088571-496d-479a-9fb7-833ed7e98676", + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "pattern = \"processed_ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5\"" + "list_of_files = glob.glob('/path/to/**/folder', recursive=True)\n", + "ipx.Read(list_of_files)" ] }, { "cell_type": "markdown", - "id": "4275b04c", + "id": "08df2874-7c54-4670-8f37-9135ea296ff5", "metadata": { "user_expressions": [] }, "source": [ - "### Step 3: Create an icepyx read object\n", + "```{admonition} Read Module Update\n", + "Previously, icepyx required two additional conditions: 1) a `product` argument and 2) that your files either matched the default `filename_pattern` or that the user provided their own `filename_pattern`. These two requirements have been removed. `product` is now read directly from the file metadata (the root group's `short_name` attribute). Flexibility to specify multiple files via the `filename_pattern` has been replaced with the [glob string](https://docs.python.org/3/library/glob.html) feature, and by allowing a list of filepaths as an argument.\n", "\n", - "The `Read` object has two required inputs:\n", - "- `path` = a string with the full file path or full directory path to your hdf5 (.h5) format files.\n", - "- `product` = the data product you're working with, also known as the \"short name\".\n", + "The `product` and `filename_pattern` arguments have been maintained for backwards compatibility, but will be fully removed in icepyx version 1.0.0.\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "4275b04c", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Step 2: Create an icepyx read object\n", "\n", - "The `Read` object also accepts the optional keyword input:\n", - "- `pattern` = a formatted string indicating the filename pattern required for Intake's path_as_pattern argument." + "Using the `data_source` described in Step 1, we can create our Read object." ] }, { @@ -299,7 +355,17 @@ }, "outputs": [], "source": [ - "reader = ipx.Read(data_source=path_root, product=\"ATL06\", filename_pattern=pattern) # or ipx.Read(filepath, \"ATLXX\") if your filenames match the default pattern" + "reader = ipx.Read(data_source=path_root)" + ] + }, + { + "cell_type": "markdown", + "id": "7b2acfdb-75eb-4c64-b583-2ab19326aaee", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The Read object now contains the list of matching files that will eventually be loaded into Python. You can inspect its properties, such as the files that were located or the identified product, directly on the Read object." ] }, { @@ -309,7 +375,17 @@ "metadata": {}, "outputs": [], "source": [ - "reader._filelist" + "reader.filelist" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7455ee3f-f9ab-486e-b4c7-2fa2314d4084", + "metadata": {}, + "outputs": [], + "source": [ + "reader.product" ] }, { @@ -319,7 +395,7 @@ "user_expressions": [] }, "source": [ - "### Step 4: Specify variables to be read in\n", + "### Step 3: Specify variables to be read in\n", "\n", "To load your data into memory or prepare it for analysis, icepyx needs to know which variables you'd like to read in.\n", "If you've used icepyx to download data from NSIDC with variable subsetting (which is the default), then you may already be familiar with the icepyx `Variables` module and how to create and modify lists of variables.\n", @@ -426,7 +502,7 @@ "user_expressions": [] }, "source": [ - "### Step 5: Loading your data\n", + "### Step 4: Loading your data\n", "\n", "Now that you've set up all the options, you're ready to read your ICESat-2 data into memory!" ] @@ -541,9 +617,9 @@ ], "metadata": { "kernelspec": { - "display_name": "general", + "display_name": "icepyx-dev", "language": "python", - "name": "general" + "name": "icepyx-dev" }, "language_info": { "codemirror_mode": { diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index 34e13b41c..0cd08c9e9 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -4,11 +4,11 @@ - + classes_dev_uml - + icepyx.core.auth.AuthenticationError @@ -139,38 +139,38 @@ icepyx.core.icesat2data.Icesat2Data - -Icesat2Data - - -__init__() + +Icesat2Data + + +__init__() icepyx.core.exceptions.NsidcQueryError - -NsidcQueryError - -errmsg -msgtxt : str - -__init__(errmsg, msgtxt) -__str__() + +NsidcQueryError + +errmsg +msgtxt : str + +__init__(errmsg, msgtxt) +__str__() icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + @@ -235,24 +235,24 @@ icepyx.core.read.Read - -Read - -_filelist : NoneType, list -_out_obj : Dataset -_pattern : str -_prod : str -_read_vars -_source_type : str -data_source -vars - -__init__(data_source, product, filename_pattern, catalog, out_obj_type) -_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) -_build_dataset_template(file) -_build_single_file_dataset(file, groups_list) -_check_source_for_pattern(source, filename_pattern) -_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) + +Read + +_filelist : NoneType, list +_out_obj : Dataset +_product : NoneType, str +_read_vars +filelist +product +vars + +__init__(data_source, product, filename_pattern, catalog, glob_kwargs, out_obj_type) +_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) +_build_dataset_template(file) +_build_single_file_dataset(file, groups_list) +_check_source_for_pattern(source, filename_pattern) +_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) +_extract_product(filepath) _read_single_grp(file, grp_path) load() @@ -366,30 +366,30 @@ icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars icepyx.core.visualization.Visualize - -Visualize - -bbox : list -cycles : NoneType -date_range : NoneType -product : NoneType, str -tracks : NoneType - -__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) -generate_OA_parameters(): list -grid_bbox(binsize): list -make_request(base_url, payload) -parallel_request_OA(): da.array -query_icesat2_filelist(): tuple -request_OA_data(paras): da.array -viz_elevation(): (hv.DynamicMap, hv.Layout) + +Visualize + +bbox : list +cycles : NoneType +date_range : NoneType +product : NoneType, str +tracks : NoneType + +__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) +generate_OA_parameters(): list +grid_bbox(binsize): list +make_request(base_url, payload) +parallel_request_OA(): da.array +query_icesat2_filelist(): tuple +request_OA_data(paras): da.array +viz_elevation(): (hv.DynamicMap, hv.Layout) diff --git a/doc/source/user_guide/documentation/classes_user_uml.svg b/doc/source/user_guide/documentation/classes_user_uml.svg index 640f76815..a9c116469 100644 --- a/doc/source/user_guide/documentation/classes_user_uml.svg +++ b/doc/source/user_guide/documentation/classes_user_uml.svg @@ -201,13 +201,14 @@ icepyx.core.read.Read - -Read - -data_source -vars - -load() + +Read + +filelist +product +vars + +load() @@ -300,9 +301,9 @@ icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars diff --git a/doc/source/user_guide/documentation/read.rst b/doc/source/user_guide/documentation/read.rst index a5beedf4e..68da03b1d 100644 --- a/doc/source/user_guide/documentation/read.rst +++ b/doc/source/user_guide/documentation/read.rst @@ -19,6 +19,8 @@ Attributes .. autosummary:: :toctree: ../../_icepyx/ + Read.filelist + Read.product Read.vars diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index a3a0311bb..5faaef110 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -15,6 +15,7 @@ def _validate_product(product): """ Confirm a valid ICESat-2 product was specified """ + error_msg = "A valid product string was not provided. Check user input, if given, or file metadata." if isinstance(product, str): product = str.upper(product) assert product in [ @@ -40,9 +41,9 @@ def _validate_product(product): "ATL20", "ATL21", "ATL23", - ], "Please enter a valid product" + ], error_msg else: - raise TypeError("Please enter a product string") + raise TypeError(error_msg) return product diff --git a/icepyx/core/read.py b/icepyx/core/read.py index 627395be2..5ef1867f2 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -1,7 +1,9 @@ import fnmatch +import glob import os import warnings +import h5py import numpy as np import xarray as xr @@ -10,8 +12,6 @@ from icepyx.core.variables import Variables as Variables from icepyx.core.variables import list_of_dict_vals -# from icepyx.core.query import Query - def _make_np_datetime(df, keyword): """ @@ -266,24 +266,28 @@ class Read: Parameters ---------- - data_source : string - A string with a full file path or full directory path to ICESat-2 hdf5 (.h5) format files. - Files within a directory must have a consistent filename pattern that includes the "ATL??" data product name. - Files must all be within a single directory. + data_source : string, List + A string or list which specifies the files to be read. The string can be either: 1) the path of a single file 2) the path to a directory or 3) a [glob string](https://docs.python.org/3/library/glob.html). + The List must be a list of strings, each of which is the path of a single file. product : string ICESat-2 data product ID, also known as "short name" (e.g. ATL03). Available data products can be found at: https://nsidc.org/data/icesat-2/data-sets + **Deprecation warning:** This argument is no longer required and will be deprecated in version 1.0.0. The dataset product is read from the file metadata. - filename_pattern : string, default 'ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5' - String that shows the filename pattern as required for Intake's path_as_pattern argument. + filename_pattern : string, default None + String that shows the filename pattern as previously required for Intake's path_as_pattern argument. The default describes files downloaded directly from NSIDC (subsetted and non-subsetted) for most products (e.g. ATL06). The ATL11 filename pattern from NSIDC is: 'ATL{product:2}_{rgt:4}{orbitsegment:2}_{cycles:4}_{version:3}_{revision:2}.h5'. + **Deprecation warning:** This argument is no longer required and will be deprecated in version 1.0.0. catalog : string, default None Full path to an Intake catalog for reading in data. If you still need to create a catalog, leave as default. - **Deprecation warning:** This argument has been depreciated. Please use the data_source argument to pass in valid data. + **Deprecation warning:** This argument has been deprecated. Please use the data_source argument to pass in valid data. + + glob_kwargs : dict, default {} + Additional arguments to be passed into the [glob.glob()](https://docs.python.org/3/library/glob.html#glob.glob)function out_obj_type : object, default xarray.Dataset The desired format for the data to be read in. @@ -296,6 +300,21 @@ class Read: Examples -------- + Reading a single file + >>> ipx.Read('/path/to/data/processed_ATL06_20190226005526_09100205_006_02.h5') # doctest: +SKIP + + Reading all files in a directory + >>> ipx.Read('/path/to/data/') # doctest: +SKIP + + Reading files that match a particular pattern (here, all .h5 files that start with `processed_ATL06_`). + >>> ipx.Read('/path/to/data/processed_ATL06_*.h5') # doctest: +SKIP + + Reading a specific list of files + >>> list_of_files = [ + ... '/path/to/data/processed_ATL06_20190226005526_09100205_006_02.h5', + ... '/path/to/more/data/processed_ATL06_20191202102922_10160505_006_01.h5', + ... ] + >>> ipx.Read(list_of_files) # doctest: +SKIP """ @@ -306,11 +325,12 @@ def __init__( self, data_source=None, product=None, - filename_pattern="ATL{product:2}_{datetime:%Y%m%d%H%M%S}_{rgt:4}{cycle:2}{orbitsegment:2}_{version:3}_{revision:2}.h5", + filename_pattern=None, catalog=None, + glob_kwargs={}, out_obj_type=None, # xr.Dataset, ): - # Raise error for depreciated argument + # Raise error for deprecated argument if catalog: raise DeprecationError( "The `catalog` argument has been deprecated and intake is no longer supported. " @@ -318,43 +338,93 @@ def __init__( ) if data_source is None: - raise ValueError("Please provide a data source.") - else: - self._source_type = _check_datasource(data_source) - self.data_source = data_source + raise ValueError("data_source is a required arguemnt") - if product is None: - raise ValueError( - "Please provide the ICESat-2 data product of your file(s)." + # Raise warnings for deprecated arguments + if filename_pattern: + warnings.warn( + "The `filename_pattern` argument is deprecated. Instead please provide a " + "string, list, or glob string to the `data_source` argument.", + stacklevel=2, ) - else: - self._prod = is2ref._validate_product(product) - pattern_ck, filelist = Read._check_source_for_pattern( - data_source, filename_pattern - ) - assert pattern_ck - # Note: need to check if this works for subset and non-subset NSIDC files (processed_ prepends the former) - self._pattern = filename_pattern - - # this is a first pass at getting rid of mixed product types and warning the user. - # it takes an approach assuming the product name is in the filename, but needs reworking if we let multiple products be loaded - # one way to handle this would be bring in the product info during the loading step and fill in product there instead of requiring it from the user - filtered_filelist = [file for file in filelist if self._prod in file] - if len(filtered_filelist) == 0: + + if product: + product = is2ref._validate_product(product) warnings.warn( - "Your filenames do not contain a product identifier (e.g. ATL06). " - "You will likely need to manually merge your dataframes." + "The `product` argument is no longer required. If the `data_source` argument given " + "contains files with multiple products the `product` argument will be used " + "to filter that list. In all other cases the product argument is ignored. " + "The recommended approach is to not include a `product` argument and instead " + "provide a `data_source` with files of only a single product type`.", + stacklevel=2, ) + + # Create the filelist from the `data_source` argument + if filename_pattern: + # maintained for backward compatibility + pattern_ck, filelist = Read._check_source_for_pattern( + data_source, filename_pattern + ) + assert pattern_ck self._filelist = filelist - elif len(filtered_filelist) < len(filelist): - warnings.warn( - "Some files matching your filename pattern were removed as they were not the specified product." + elif isinstance(data_source, list): + self._filelist = data_source + elif os.path.isdir(data_source): + data_source = os.path.join(data_source, "*") + self._filelist = glob.glob(data_source, **glob_kwargs) + else: + self._filelist = glob.glob(data_source, **glob_kwargs) + # Remove any directories from the list + self._filelist = [f for f in self._filelist if not os.path.isdir(f)] + + # Create a dictionary of the products as read from the metadata + product_dict = {} + for file_ in self._filelist: + product_dict[file_] = self._extract_product(file_) + + # Raise warnings or errors for multiple products or products not matching the user-specified product + all_products = list(set(product_dict.values())) + if len(all_products) > 1: + if product: + warnings.warn( + f"Multiple products found in list of files: {product_dict}. Files that " + "do not match the user specified product will be removed from processing.\n" + "Filtering files using a `product` argument is deprecated. Please use the " + "`data_source` argument to specify a list of files with the same product.", + stacklevel=2, + ) + self._filelist = [] + for key, value in product_dict.items(): + if value == product: + self._filelist.append(key) + if len(self._filelist) == 0: + raise TypeError( + "No files found in the file list matching the user-specified " + "product type" + ) + # Use the cleaned filelist to assign a product + self._product = product + else: + raise TypeError( + f"Multiple product types were found in the file list: {product_dict}." + "Please provide a valid `data_source` parameter indicating files of a single " + "product" + ) + elif len(all_products) == 0: + raise TypeError( + "No files found matching the specified `data_source`. Check your glob " + "string or file list." ) - self._filelist = filtered_filelist else: - self._filelist = filelist - - # after validation, use the notebook code and code outline to start implementing the rest of the class + # Assign the identified product to the property + self._product = all_products[0] + # Raise a warning if the metadata-located product differs from the user-specified product + if product and self._product != product: + warnings.warn( + f"User specified product {product} does not match the product from the file" + " metadata {self._product}", + stacklevel=2, + ) if out_obj_type is not None: print( @@ -387,14 +457,43 @@ def vars(self): if not hasattr(self, "_read_vars"): self._read_vars = Variables( - "file", path=self._filelist[0], product=self._prod + "file", path=self.filelist[0], product=self.product ) return self._read_vars + @property + def filelist(self): + """ + Return the list of files represented by this Read object. + """ + return self._filelist + + @property + def product(self): + """ + Return the product associated with the Read object. + """ + return self._product + # ---------------------------------------------------------------------- # Methods + @staticmethod + def _extract_product(filepath): + """ + Read the product type from the metadata of the file. Return the product as a string. + """ + with h5py.File(filepath, "r") as f: + try: + product = f.attrs["short_name"].decode() + product = is2ref._validate_product(product) + except KeyError: + raise AttributeError( + f"Unable to extract the product name from file metadata." + ) + return product + @staticmethod def _check_source_for_pattern(source, filename_pattern): """ @@ -651,7 +750,7 @@ def load(self): # However, this led to errors when I tried to combine two identical datasets because the single dimension was equal. # In these situations, xarray recommends manually controlling the merge/concat process yourself. # While unlikely to be a broad issue, I've heard of multiple matching timestamps causing issues for combining multiple IS2 datasets. - for file in self._filelist: + for file in self.filelist: all_dss.append( self._build_single_file_dataset(file, groups_list) ) # wanted_groups, vgrp.keys())) @@ -686,7 +785,7 @@ def _build_dataset_template(self, file): gran_idx=[np.uint64(999999)], source_file=(["gran_idx"], [file]), ), - attrs=dict(data_product=self._prod), + attrs=dict(data_product=self.product), ) return is2ds @@ -734,20 +833,11 @@ def _build_single_file_dataset(self, file, groups_list): ------- Xarray Dataset """ - file_product = self._read_single_grp(file, "/").attrs["identifier_product_type"] - assert ( - file_product == self._prod - ), "Your product specification does not match the product specification within your files." - # I think the below method might NOT read the file into memory as the above might? - # import h5py - # with h5py.File(filepath,'r') as h5pt: - # prod_id = h5pt.attrs["identifier_product_type"] - # DEVNOTE: if and elif does not actually apply wanted variable list, and has not been tested for merging multiple files into one ds # if a gridded product # TODO: all products need to be tested, and quicklook products added or explicitly excluded # Level 3b, gridded (netcdf): ATL14, 15, 16, 17, 18, 19, 20, 21 - if self._prod in [ + if self.product in [ "ATL14", "ATL15", "ATL16", @@ -761,7 +851,7 @@ def _build_single_file_dataset(self, file, groups_list): is2ds = xr.open_dataset(file) # Level 3b, hdf5: ATL11 - elif self._prod in ["ATL11"]: + elif self.product in ["ATL11"]: is2ds = self._build_dataset_template(file) # returns the wanted groups as a single list of full group path strings diff --git a/icepyx/tests/test_is2ref.py b/icepyx/tests/test_is2ref.py index b22709c98..fb8d16cad 100644 --- a/icepyx/tests/test_is2ref.py +++ b/icepyx/tests/test_is2ref.py @@ -8,14 +8,14 @@ def test_num_product(): dsnum = 6 - ermsg = "Please enter a product string" + ermsg = "A valid product string was not provided. Check user input, if given, or file metadata." with pytest.raises(TypeError, match=ermsg): is2ref._validate_product(dsnum) def test_bad_product(): wrngds = "atl-6" - ermsg = "Please enter a valid product" + ermsg = "A valid product string was not provided. Check user input, if given, or file metadata." with pytest.raises(AssertionError, match=ermsg): is2ref._validate_product(wrngds) From 96fe05d686ae4b8d503bde43bfab0c06b57e72d8 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 19 Oct 2023 18:13:18 -0400 Subject: [PATCH 12/21] enable QUEST kwarg handling (#452) * add kwarg acceptance for data queries and download_all in quest * Add QUEST dataset page to RTD --------- Co-authored-by: zachghiaccio --- doc/source/index.rst | 1 + icepyx/quest/quest.py | 99 ++++++++++++++++++++++++++++---------- icepyx/tests/test_quest.py | 31 ++++++++++-- 3 files changed, 102 insertions(+), 29 deletions(-) diff --git a/doc/source/index.rst b/doc/source/index.rst index 719f528b2..586c8810f 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -146,6 +146,7 @@ ICESat-2 datasets to enable scientific discovery. contributing/contribution_guidelines contributing/how_to_contribute contributing/icepyx_internals + contributing/quest-available-datasets contributing/attribution_link contributing/development_plan contributing/release_guide diff --git a/icepyx/quest/quest.py b/icepyx/quest/quest.py index c54e49b73..fe3039a39 100644 --- a/icepyx/quest/quest.py +++ b/icepyx/quest/quest.py @@ -59,7 +59,7 @@ def __init__( date_range=None, start_time=None, end_time=None, - proj="Default", + proj="default", ): """ Tells QUEST to initialize data given the user input spatiotemporal data. @@ -94,9 +94,23 @@ def add_icesat2( tracks=None, files=None, **kwargs, - ): + ) -> None: """ Adds ICESat-2 datasets to QUEST structure. + + Parameters + ---------- + + For details on inputs, see the Query documentation. + + Returns + ------- + None + + See Also + -------- + icepyx.core.GenQuery + icepyx.core.Query """ query = Query( @@ -122,41 +136,76 @@ def add_icesat2( # ---------------------------------------------------------------------- # Methods (on all datasets) - # error handling? what happens when one of i fails... - def search_all(self): + # error handling? what happens when the user tries to re-query? + def search_all(self, **kwargs): """ Searches for requred dataset within platform (i.e. ICESat-2, Argo) of interest. + + Parameters + ---------- + **kwargs : default None + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"IDs":True}, argo = {"presRange":"10,500"}`. """ print("\nSearching all datasets...") - for i in self.datasets.values(): + for k, v in self.datasets.items(): print() try: - # querying ICESat-2 data - if isinstance(i, Query): + if isinstance(v, Query): print("---ICESat-2---") - msg = i.avail_granules() + try: + msg = v.avail_granules(kwargs[k]) + except KeyError: + msg = v.avail_granules() print(msg) - else: # querying all other data sets - print(i) - i.search_data() + else: + print(k) + try: + v.search_data(kwargs[k]) + except KeyError: + v.search_data() except: - dataset_name = type(i).__name__ + dataset_name = type(v).__name__ print("Error querying data from {0}".format(dataset_name)) - # error handling? what happens when one of i fails... - def download_all(self, path=""): - ' ' 'Downloads requested dataset(s).' ' ' + # error handling? what happens if the user tries to re-download? + def download_all(self, path="", **kwargs): + """ + Downloads requested dataset(s). + + Parameters + ---------- + **kwargs : default None + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"verbose":True}, argo = {"keep_existing":True}`. + """ print("\nDownloading all datasets...") - for i in self.datasets.values(): + for k, v in self.datasets.items(): print() - if isinstance(i, Query): - print("---ICESat-2---") - msg = i.download_granules(path) - print(msg) - else: - i.download() - print(i) - - # DEVNOTE: see colocated data branch and phyto team files for code that expands quest functionality + try: + + if isinstance(v, Query): + print("---ICESat-2---") + try: + msg = v.download_granules(path, kwargs[k]) + except KeyError: + msg = v.download_granules(path) + print(msg) + else: + print(k) + try: + msg = v.download(kwargs[k]) + except KeyError: + msg = v.download() + print(msg) + except: + dataset_name = type(v).__name__ + print("Error downloading data from {0}".format(dataset_name)) diff --git a/icepyx/tests/test_quest.py b/icepyx/tests/test_quest.py index 043ee159e..f50b1bea2 100644 --- a/icepyx/tests/test_quest.py +++ b/icepyx/tests/test_quest.py @@ -68,13 +68,36 @@ def test_add_is2(quest_instance): ########## ALL DATASET METHODS TESTS ########## -# is successful execution enough here? # each of the query functions should be tested in their respective modules def test_search_all(quest_instance): # Search and test all datasets quest_instance.search_all() -def test_download_all(): - # this will require auth in some cases... - pass +@pytest.mark.parametrize( + "kwargs", + [ + {"icesat2": {"IDs": True}}, + # {"argo":{"presRange":"10,500"}}, + # {"icesat2":{"IDs":True}, "argo":{"presRange":"10,500"}} + ], +) +def test_search_all_kwargs(quest_instance, kwargs): + quest_instance.search_all(**kwargs) + + +# TESTS NOT IMPLEMENTED +# def test_download_all(): +# # this will require auth in some cases... +# pass + +# @pytest.mark.parametrize( +# "kwargs", +# [ +# {"icesat2": {"verbose":True}}, +# # {"argo":{"keep_existing":True}, +# # {"icesat2":{"verbose":True}, "argo":{"keep_existing":True} +# ], +# ) +# def test_download_all_kwargs(quest_instance, kwargs): +# pass From c2a875e6cd195ff50ddbb82011b66fd6ffa0f962 Mon Sep 17 00:00:00 2001 From: "allcontributors[bot]" <46447321+allcontributors[bot]@users.noreply.github.com> Date: Thu, 26 Oct 2023 11:41:03 -0400 Subject: [PATCH 13/21] docs: add rwegener2 as a contributor for bug, code, and 6 more (#460) Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Jessica Scheick --- .all-contributorsrc | 19 ++++++++++++++++++- CONTRIBUTORS.rst | 11 ++++++----- 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/.all-contributorsrc b/.all-contributorsrc index 8f9a076e4..3b321715a 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -422,6 +422,22 @@ "contributions": [ "review" ] + }, + { + "login": "rwegener2", + "name": "Rachel Wegener", + "avatar_url": "https://avatars.githubusercontent.com/u/35503632?v=4", + "profile": "https://rwegener2.github.io/", + "contributions": [ + "bug", + "code", + "doc", + "ideas", + "maintenance", + "review", + "test", + "tutorial" + ] } ], "contributorsPerLine": 7, @@ -430,5 +446,6 @@ "repoType": "github", "repoHost": "https://github.com", "skipCi": true, - "commitConvention": "angular" + "commitConvention": "angular", + "commitType": "docs" } diff --git a/CONTRIBUTORS.rst b/CONTRIBUTORS.rst index c6b0c84f5..be362bb28 100644 --- a/CONTRIBUTORS.rst +++ b/CONTRIBUTORS.rst @@ -31,41 +31,42 @@ Thanks goes to these wonderful people (`emoji key Nicole Abib
Nicole Abib
💻 🤔 + Rachel Wegener
Rachel Wegener

🐛 💻 📖 🤔 🚧 👀 ⚠️ Raphael Hagen
Raphael Hagen

📖 🎨 💻 🚇 👀 Romina Piunno
Romina Piunno

💻 🤔 🧑‍🏫 👀 Sarah Hall
Sarah Hall

🐛 💻 📖 🚧 ⚠️ Scott Henderson
Scott Henderson

🚧 Sebastian Alvis
Sebastian Alvis

📖 🚇 Shashank Bhushan
Shashank Bhushan

💡 - Tian Li
Tian Li

🐛 💻 📖 💡 🤔 👀 ⚠️ 🔧 + Tian Li
Tian Li

🐛 💻 📖 💡 🤔 👀 ⚠️ 🔧 Tom Johnson
Tom Johnson

📖 🚇 Tyler Sutterley
Tyler Sutterley

📖 💻 🤔 💬 🛡️ ⚠️ Wei Ji
Wei Ji

🐛 💻 📖 💡 🤔 🚇 🚧 🧑‍🏫 💬 👀 ⚠️ 📢 Wilson Sauthoff
Wilson Sauthoff

👀 Zach Fair
Zach Fair

🐛 💻 📖 🤔 💬 👀 alexdibella
alexdibella

🐛 🤔 💻 - bidhya
bidhya

💡 + bidhya
bidhya

💡 learn2phoenix
learn2phoenix

💻 liuzheng-arctic
liuzheng-arctic

📖 🐛 💻 🤔 👀 🔧 💡 nitin-ravinder
nitin-ravinder

🐛 👀 ravindraK08
ravindraK08

👀 smithb
smithb

🤔 tedmaksym
tedmaksym

🤔 - trevorskaggs
trevorskaggs

🐛 💻 + trevorskaggs
trevorskaggs

🐛 💻 trey-stafford
trey-stafford

💻 🤔 🚧 👀 💬 - + - + This project follows the `all-contributors `_ specification. Contributions of any kind welcome! From c888ea1ff3a8a4a04e401f6a414c77ebc2d1ce7f Mon Sep 17 00:00:00 2001 From: "allcontributors[bot]" <46447321+allcontributors[bot]@users.noreply.github.com> Date: Thu, 26 Oct 2023 12:07:55 -0400 Subject: [PATCH 14/21] docs: add jpswinski as a contributor for review (#461) Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Jessica Scheick --- .all-contributorsrc | 3 ++- CONTRIBUTORS.rst | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.all-contributorsrc b/.all-contributorsrc index 3b321715a..85b5486b9 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -382,7 +382,8 @@ "avatar_url": "https://avatars.githubusercontent.com/u/54070345?v=4", "profile": "https://github.com/jpswinski", "contributions": [ - "code" + "code", + "review" ] }, { diff --git a/CONTRIBUTORS.rst b/CONTRIBUTORS.rst index be362bb28..337ff6661 100644 --- a/CONTRIBUTORS.rst +++ b/CONTRIBUTORS.rst @@ -23,7 +23,7 @@ Thanks goes to these wonderful people (`emoji key Fernando Perez
Fernando Perez

🎨 💼 🤔 - JP Swinski
JP Swinski

💻 + JP Swinski
JP Swinski

💻 👀 Jessica
Jessica

🐛 💻 🖋 📖 🎨 💡 🤔 🚧 🧑‍🏫 📆 💬 👀 Joachim Meyer
Joachim Meyer

🧑‍🏫 🚧 Kelsey Bisson
Kelsey Bisson

🐛 💻 📖 🤔 💡 🤔 🧑‍🏫 💬 👀 From bd0176185bd11ef52e1a125aad90b08ae06583b3 Mon Sep 17 00:00:00 2001 From: "allcontributors[bot]" <46447321+allcontributors[bot]@users.noreply.github.com> Date: Fri, 27 Oct 2023 14:13:58 -0400 Subject: [PATCH 15/21] docs: add whyjz as a contributor for tutorial (#462) Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Jessica Scheick --- .all-contributorsrc | 9 +++++++++ CONTRIBUTORS.rst | 5 +++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/.all-contributorsrc b/.all-contributorsrc index 85b5486b9..6b24eac03 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -439,6 +439,15 @@ "test", "tutorial" ] + }, + { + "login": "whyjz", + "name": "Whyjay Zheng", + "avatar_url": "https://avatars.githubusercontent.com/u/19339926?v=4", + "profile": "https://whyjz.github.io/", + "contributions": [ + "tutorial" + ] } ], "contributorsPerLine": 7, diff --git a/CONTRIBUTORS.rst b/CONTRIBUTORS.rst index 337ff6661..1fd8bab42 100644 --- a/CONTRIBUTORS.rst +++ b/CONTRIBUTORS.rst @@ -44,20 +44,21 @@ Thanks goes to these wonderful people (`emoji key Tom Johnson
Tom Johnson

📖 🚇 Tyler Sutterley
Tyler Sutterley

📖 💻 🤔 💬 🛡️ ⚠️ Wei Ji
Wei Ji

🐛 💻 📖 💡 🤔 🚇 🚧 🧑‍🏫 💬 👀 ⚠️ 📢 + Whyjay Zheng
Whyjay Zheng

Wilson Sauthoff
Wilson Sauthoff

👀 Zach Fair
Zach Fair

🐛 💻 📖 🤔 💬 👀 - alexdibella
alexdibella

🐛 🤔 💻 + alexdibella
alexdibella

🐛 🤔 💻 bidhya
bidhya

💡 learn2phoenix
learn2phoenix

💻 liuzheng-arctic
liuzheng-arctic

📖 🐛 💻 🤔 👀 🔧 💡 nitin-ravinder
nitin-ravinder

🐛 👀 ravindraK08
ravindraK08

👀 smithb
smithb

🤔 - tedmaksym
tedmaksym

🤔 + tedmaksym
tedmaksym

🤔 trevorskaggs
trevorskaggs

🐛 💻 trey-stafford
trey-stafford

💻 🤔 🚧 👀 💬 From 9378cecd906ef2592f1d0df376cf681c239bfb57 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 2 Nov 2023 10:04:30 -0400 Subject: [PATCH 16/21] add newest icepyx citations (#455) --- doc/source/tracking/citations.rst | 2 ++ doc/source/tracking/icepyx_pubs.bib | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/doc/source/tracking/citations.rst b/doc/source/tracking/citations.rst index b31132be8..bf5672587 100644 --- a/doc/source/tracking/citations.rst +++ b/doc/source/tracking/citations.rst @@ -49,6 +49,8 @@ Research that utilizes icepyx for ICESat-2 data .. bibliography:: icepyx_pubs.bib :style: mystyle + Freer2023 + Idestrom2023 Shean2023 Eidam2022 Leeuwen:2022 diff --git a/doc/source/tracking/icepyx_pubs.bib b/doc/source/tracking/icepyx_pubs.bib index a1d945c01..d13c9653f 100644 --- a/doc/source/tracking/icepyx_pubs.bib +++ b/doc/source/tracking/icepyx_pubs.bib @@ -183,6 +183,30 @@ @inProceedings{Fernando:2021 } +@Article{Freer2023, +AUTHOR = {Freer, B. I. D. and Marsh, O. J. and Hogg, A. E. and Fricker, H. A. and Padman, L.}, +TITLE = {Modes of {Antarctic} tidal grounding line migration revealed by {Ice, Cloud, and land Elevation Satellite-2 (ICESat-2)} laser altimetry}, +JOURNAL = {The Cryosphere}, +VOLUME = {17}, +YEAR = {2023}, +NUMBER = {9}, +PAGES = {4079--4101}, +URL = {https://tc.copernicus.org/articles/17/4079/2023/}, +DOI = {10.5194/tc-17-4079-2023} +} + + +@mastersthesis{Idestrom2023, + author = {Petter Idestr\"{o}m}, + title = {Remote Sensing of Cryospheric Surfaces: Small Scale Surface Roughness Signatures in Satellite Altimetry Data}, + school = {Ume\aa University}, + year = {2023}, + address = {Sweden}, + month = {Sept.}, + url = {https://www.diva-portal.org/smash/get/diva2:1801057/FULLTEXT01.pdf} +} + + @misc{Leeuwen:2022, author = {van Leeuwen, Gijs}, title = {The automated retrieval of supraglacial lake depth and extent from {ICESat-2} photon clouds leveraging {DBSCAN} clustering}, From 142b7ab4d7ea15d2107b60495d7cc0273dfb2fd5 Mon Sep 17 00:00:00 2001 From: Rachel Wegener <35503632+rwegener2@users.noreply.github.com> Date: Tue, 7 Nov 2023 07:17:42 -0500 Subject: [PATCH 17/21] Variables as an independent class (#451) Refactor Variables class to be user facing functionality --- .../IS2_data_access2-subsetting.ipynb | 42 +- .../IS2_data_variables.ipynb | 351 ++++++++++++- .../documentation/classes_dev_uml.svg | 497 +++++++++--------- .../documentation/classes_user_uml.svg | 33 +- .../user_guide/documentation/components.rst | 8 - .../user_guide/documentation/icepyx.rst | 1 + .../documentation/packages_user_uml.svg | 60 ++- .../user_guide/documentation/variables.rst | 25 + icepyx/__init__.py | 1 + icepyx/core/is2ref.py | 53 +- icepyx/core/query.py | 51 +- icepyx/core/read.py | 59 ++- icepyx/core/variables.py | 160 +++--- 13 files changed, 880 insertions(+), 461 deletions(-) create mode 100644 doc/source/user_guide/documentation/variables.rst diff --git a/doc/source/example_notebooks/IS2_data_access2-subsetting.ipynb b/doc/source/example_notebooks/IS2_data_access2-subsetting.ipynb index 89247de5f..3803b9fd6 100644 --- a/doc/source/example_notebooks/IS2_data_access2-subsetting.ipynb +++ b/doc/source/example_notebooks/IS2_data_access2-subsetting.ipynb @@ -51,7 +51,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "Create a query object and log in to Earthdata\n", "\n", @@ -83,7 +85,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "## Discover Subsetting Options\n", "\n", @@ -108,7 +112,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "By default, spatial and temporal subsetting based on your initial inputs is applied to your order unless you specify `subset=False` to `order_granules()` or `download_granules()` (which calls `order_granules` under the hood if you have not already placed your order) functions.\n", "Additional subsetting options must be specified as keyword arguments to the order/download functions.\n", @@ -118,7 +124,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### _Why do I have to provide spatial bounds to icepyx even if I don't use them to subset my data order?_\n", "\n", @@ -132,7 +140,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "## About Data Variables in a query object\n", "\n", @@ -145,7 +155,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### Determine what variables are available for your data product\n", "There are multiple ways to get a complete list of available variables.\n", @@ -159,7 +171,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "region_a.order_vars.avail()" @@ -167,7 +181,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "By passing the boolean `options=True` to the `avail` method, you can obtain lists of unique possible variable inputs (var_list inputs) and path subdirectory inputs (keyword_list and beam_list inputs) for your data product. These can be helpful for building your wanted variable list." ] @@ -175,7 +191,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "region_a.order_vars.avail(options=True)" @@ -353,9 +371,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "icepyx-dev", "language": "python", - "name": "python3" + "name": "icepyx-dev" }, "language_info": { "codemirror_mode": { @@ -367,7 +385,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.11.4" } }, "nbformat": 4, diff --git a/doc/source/example_notebooks/IS2_data_variables.ipynb b/doc/source/example_notebooks/IS2_data_variables.ipynb index 3ac1f99fe..78a250789 100644 --- a/doc/source/example_notebooks/IS2_data_variables.ipynb +++ b/doc/source/example_notebooks/IS2_data_variables.ipynb @@ -2,7 +2,9 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "# ICESat-2's Nested Variables\n", "\n", @@ -13,10 +15,10 @@ "\n", "A given ICESat-2 product may have over 200 variable + path combinations.\n", "icepyx includes a custom `Variables` module that is \"aware\" of the ATLAS sensor and how the ICESat-2 data products are stored.\n", - "The module can be accessed independently, but is optimally used as a component of a `Query` object (Case 1) or `Read` object (Case 2).\n", + "The module can be accessed independently, and can also be accessed as a component of a `Query` object or `Read` object.\n", "\n", - "This notebook illustrates in detail how the `Variables` module behaves using a `Query` data access example.\n", - "However, module usage is analogous through an icepyx ICESat-2 `Read` object.\n", + "This notebook illustrates in detail how the `Variables` module behaves. We use the module independently and also show how powerful it is directly in the icepyx workflow using a `Query` data access example.\n", + "Module usage using `Query` is analogous through an icepyx ICESat-2 `Read` object.\n", "More detailed example workflows specifically for the [query](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html) and [read](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) tools within icepyx are available as separate Jupyter Notebooks.\n", "\n", "Questions? Be sure to check out the FAQs throughout this notebook, indicated as italic headings." @@ -24,11 +26,15 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### _Why do ICESat-2 products need a custom variable manager?_\n", "\n", "_It can be confusing and cumbersome to comb through the 200+ variable and path combinations contained in ICESat-2 data products._\n", + "_An hdf5 file is built like a folder with files in it. Opening an ICESat-2 file can be like opening a new folder with over 200 files in it and manually searching for only ones you want!_\n", + "\n", "_The icepyx `Variables` module makes it easier for users to quickly find and extract the specific variables they would like to work with across multiple beams, keywords, and variables and provides reader-friendly formatting to browse variables._\n", "_A future development goal for `icepyx` includes developing an interactive widget to further improve the user experience._\n", "_For data read-in, additional tools are available to target specific beam characteristics (e.g. strong versus weak beams)._" @@ -38,35 +44,245 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Some technical details about the Variables module\n", - "For those eager to push the limits or who want to know more implementation details...\n", + "Import packages, including icepyx" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import icepyx as ipx\n", + "from pprint import pprint" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Creating or Accessing ICESat-2 Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "There are three ways to create or access an ICESat-2 Variables object in icepyx:\n", + "1. Access via the `.order_vars` property of a Query object\n", + "2. Access via the `.vars` property of a Read object\n", + "3. Create a stand-alone ICESat-2 Variables object using a local file or a product name\n", "\n", - "The only required input to the `Variables` module is `vartype`.\n", - "`vartype` has two acceptible string values, 'order' and 'file'.\n", - "If you use the module as shown in icepyx examples (namely through a `Read` or `Query` object), then this flag will be passed automatically.\n", - "It simply tells the software how to generate the list of possible variable values - either by pinging NSIDC for a list of available variables (`query`) or from the user-supplied file (`read`)." + "An example of each of these is shown below." ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ - "Import packages, including icepyx" + "### 1. Access `Variables` via the `.order_vars` property of a Query object" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "import icepyx as ipx\n", - "from pprint import pprint" + "region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \\\n", + " start_time='00:00:00', end_time='23:59:59')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Accessing Variables\n", + "region_a.order_vars" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Showing the variable paths\n", + "region_a.order_vars.avail()" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### 2. Access via the `.vars` property of a Read object" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "path_root = '/full/path/to/your/data/'\n", + "reader = ipx.Read(path_root)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Accessing Variables\n", + "reader.vars" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Showing the variable paths\n", + "# reader.vars.avail()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### 3. Create a stand-alone Variables object\n", + "\n", + "You can also generate an independent Variables object. This can be done using either:\n", + "1. The filepath to a file you'd like a variables list for\n", + "2. The product name (and optionally version) of a an ICESat-2 product" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Create a variables object from a filepath:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "filepath = '/full/path/to/your/data.h5'\n", + "v = ipx.Variables(path=filepath)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# v.avail()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Create a variables object from a product. The version argument is optional." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "v = ipx.Variables(product='ATL03')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# v.avail()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "v = ipx.Variables(product='ATL03', version='004')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# v.avail()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now that you know how to create or access Variables the remainder of this notebook showcases the functions availble for building and modifying variables lists. Remember, the example shown below uses a Query object, but the same methods are available if you are using a Read object or a Variables object." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, "source": [ "## Interacting with ICESat-2 Data Variables\n", "\n", @@ -88,7 +304,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "Create a query object and log in to Earthdata\n", "\n", @@ -134,7 +352,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "### ICESat-2 data variables\n", "\n", @@ -157,7 +377,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "To increase readability, you can use built in functions to show the 200+ variable + path combinations as a dictionary where the keys are variable names and the values are the paths to that variable.\n", "`region_a.order_vars.parse_var_list(region_a.order_vars.avail())` will return a dictionary of variable:paths key:value pairs." @@ -174,7 +396,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "By passing the boolean `options=True` to the `avail` method, you can obtain lists of unique possible variable inputs (var_list inputs) and path subdirectory inputs (keyword_list and beam_list inputs) for your data product. These can be helpful for building your wanted variable list." ] @@ -188,6 +412,30 @@ "region_a.order_vars.avail(options=True)" ] }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "```{admonition} Remember\n", + "You can run these same methods no matter how you created or accessed your ICESat-2 Variables. So the methods in this section could be equivalently be accessed using a Read object, or by directly accessing a file on your computer:\n", + "\n", + "```\n", + "```python\n", + "# Using a Read object\n", + "reader.vars.avail()\n", + "reader.vars.parse_var_list(reader.vars.avail())\n", + "reader.vars.avail(options=True)\n", + "\n", + "# Using a file on your computer\n", + "v = Variables(path='/my/file.h5')\n", + "v.avail()\n", + "v.parse_var_list(v.avail())\n", + "v.avail(options=True)\n", + "```\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -228,7 +476,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "The keywords available for this product are shown in the error message upon entering a blank keyword_list, as seen in the next cell." ] @@ -745,13 +995,62 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "user_expressions": [] + }, "source": [ "#### With a `Read` object\n", "Calling the `load()` method on your `Read` object will automatically look for your wanted variable list and use it.\n", "Please see the [read-in example Jupyter Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for a complete example of this usage.\n" ] }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "#### With a local filepath\n", + "\n", + "One of the benefits of using a local filepath in variables is that it allows you to easily inspect the variables that are available in your file. Once you have a variable of interest from the `avail` list, you could read that variable in with another library, such as xarray. The example below demonstrates this assuming an ATL06 ICESat-2 file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "filepath = '/full/path/to/my/ATL06_file.h5'\n", + "v = ipx.Variables(path=filepath)\n", + "v.avail()\n", + "# Browse paths and decide you need `gt1l/land_ice_segments/`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import xarray as xr\n", + "\n", + "xr.open_dataset(filepath, group='gt1l/land_ice_segments/', engine='h5netcdf')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "You'll notice in this workflow you are limited to viewing data only within a particular group. Icepyx also provides functionality for merging variables within or even across files. See the [read-in example Jupyter Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for more details about these features of icepyx." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -763,9 +1062,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "icepyx-dev", "language": "python", - "name": "python3" + "name": "icepyx-dev" }, "language_info": { "codemirror_mode": { @@ -777,7 +1076,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.11.4" } }, "nbformat": 4, diff --git a/doc/source/user_guide/documentation/classes_dev_uml.svg b/doc/source/user_guide/documentation/classes_dev_uml.svg index 0cd08c9e9..765e0d531 100644 --- a/doc/source/user_guide/documentation/classes_dev_uml.svg +++ b/doc/source/user_guide/documentation/classes_dev_uml.svg @@ -4,328 +4,329 @@ - - + + classes_dev_uml - + icepyx.core.auth.AuthenticationError - -AuthenticationError - - - + +AuthenticationError + + + icepyx.core.exceptions.DeprecationError - -DeprecationError - - - + +DeprecationError + + + icepyx.core.auth.EarthdataAuthMixin - -EarthdataAuthMixin - -_auth : Auth, NoneType -_s3_initial_ts : NoneType, datetime -_s3login_credentials : NoneType, dict -_session : NoneType -auth -s3login_credentials -session - -__init__(auth) -__str__() -earthdata_login(uid, email, s3token): None + +EarthdataAuthMixin + +_auth : NoneType +_s3_initial_ts : NoneType, datetime +_s3login_credentials : NoneType +_session : NoneType +auth +s3login_credentials +session + +__init__(auth) +__str__() +earthdata_login(uid, email, s3token): None icepyx.core.query.GenQuery - -GenQuery - -_spatial -_temporal -dates -end_time -spatial -spatial_extent -start_time -temporal - -__init__(spatial_extent, date_range, start_time, end_time) -__str__() + +GenQuery + +_spatial +_temporal +dates +end_time +spatial +spatial_extent +start_time +temporal + +__init__(spatial_extent, date_range, start_time, end_time) +__str__() icepyx.core.granules.Granules - -Granules - -avail : list -orderIDs : list - -__init__ -() -download(verbose, path, session, restart) -get_avail(CMRparams, reqparams, cloud) -place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) + +Granules + +avail : list +orderIDs : list + +__init__ +() +download(verbose, path, session, restart) +get_avail(CMRparams, reqparams, cloud) +place_order(CMRparams, reqparams, subsetparams, verbose, subset, session, geom_filepath) icepyx.core.query.Query - -Query - -CMRparams -_CMRparams -_about_product -_cust_options : dict -_cycles : list -_file_vars -_granules -_order_vars -_prod : NoneType, str -_readable_granule_name : list -_reqparams -_source : str -_subsetparams : NoneType -_tracks : list -_version -cycles -dataset -file_vars -granules -order_vars -product -product_version -reqparams -tracks - -__init__(product, spatial_extent, date_range, start_time, end_time, version, cycles, tracks, files, auth) -__str__() -avail_granules(ids, cycles, tracks, cloud) -download_granules(path, verbose, subset, restart) -latest_version() -order_granules(verbose, subset, email) -product_all_info() -product_summary_info() -show_custom_options(dictview) -subsetparams() -visualize_elevation() -visualize_spatial_extent() + +Query + +CMRparams +_CMRparams +_about_product +_cust_options : dict +_cycles : list +_file_vars +_granules +_order_vars +_prod : NoneType, str +_readable_granule_name : list +_reqparams +_source : str +_subsetparams : NoneType +_tracks : list +_version +cycles +dataset +file_vars +granules +order_vars +product +product_version +reqparams +tracks + +__init__(product, spatial_extent, date_range, start_time, end_time, version, cycles, tracks, files, auth) +__str__() +avail_granules(ids, cycles, tracks, cloud) +download_granules(path, verbose, subset, restart) +latest_version() +order_granules(verbose, subset, email) +product_all_info() +product_summary_info() +show_custom_options(dictview) +subsetparams() +visualize_elevation() +visualize_spatial_extent() icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules icepyx.core.granules.Granules->icepyx.core.query.Query - - -_granules + + +_granules icepyx.core.icesat2data.Icesat2Data - -Icesat2Data - - -__init__() + +Icesat2Data + + +__init__() icepyx.core.exceptions.NsidcQueryError - -NsidcQueryError - -errmsg -msgtxt : str - -__init__(errmsg, msgtxt) -__str__() + +NsidcQueryError + +errmsg +msgtxt : str + +__init__(errmsg, msgtxt) +__str__() icepyx.core.exceptions.QueryError - -QueryError - - - + +QueryError + + + icepyx.core.exceptions.NsidcQueryError->icepyx.core.exceptions.QueryError - - + + icepyx.core.APIformatting.Parameters - -Parameters - -_fmted_keys : NoneType, dict -_poss_keys : dict -_reqtype : NoneType, str -fmted_keys -partype -poss_keys - -__init__(partype, values, reqtype) -_check_valid_keys() -_get_possible_keys() -build_params() -check_req_values() -check_values() + +Parameters + +_fmted_keys : NoneType, dict +_poss_keys : dict +_reqtype : NoneType, str +fmted_keys +partype +poss_keys + +__init__(partype, values, reqtype) +_check_valid_keys() +_get_possible_keys() +build_params() +check_req_values() +check_values() icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_CMRparams + + +_CMRparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_reqparams + + +_reqparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.APIformatting.Parameters->icepyx.core.query.Query - - -_subsetparams + + +_subsetparams icepyx.core.query.Query->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.query.Query->icepyx.core.query.GenQuery - - + + icepyx.core.read.Read - -Read - -_filelist : NoneType, list -_out_obj : Dataset -_product : NoneType, str -_read_vars -filelist -product -vars - -__init__(data_source, product, filename_pattern, catalog, glob_kwargs, out_obj_type) -_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) -_build_dataset_template(file) -_build_single_file_dataset(file, groups_list) -_check_source_for_pattern(source, filename_pattern) -_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) -_extract_product(filepath) -_read_single_grp(file, grp_path) -load() + +Read + +_filelist : NoneType, list +_out_obj : Dataset +_product : NoneType, str +_read_vars +filelist +product +vars + +__init__(data_source, product, filename_pattern, catalog, glob_kwargs, out_obj_type) +_add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict) +_build_dataset_template(file) +_build_single_file_dataset(file, groups_list) +_check_source_for_pattern(source, filename_pattern) +_combine_nested_vars(is2ds, ds, grp_path, wanted_dict) +_read_single_grp(file, grp_path) +load() icepyx.core.spatial.Spatial - -Spatial - -_ext_type : str -_gdf_spat : GeoDataFrame -_geom_file : NoneType -_spatial_ext -_xdateln -extent -extent_as_gdf -extent_file -extent_type - -__init__(spatial_extent) -__str__() -fmt_for_CMR() -fmt_for_EGI() + +Spatial + +_ext_type : str +_gdf_spat : GeoDataFrame +_geom_file : NoneType +_spatial_ext +_xdateln +extent +extent_as_gdf +extent_file +extent_type + +__init__(spatial_extent) +__str__() +fmt_for_CMR() +fmt_for_EGI() icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.spatial.Spatial->icepyx.core.query.GenQuery - - -_spatial + + +_spatial icepyx.core.temporal.Temporal - -Temporal - -_end : datetime -_start : datetime -end -start - -__init__(date_range, start_time, end_time) -__str__() + +Temporal + +_end : datetime +_start : datetime +end +start + +__init__(date_range, start_time, end_time) +__str__() icepyx.core.temporal.Temporal->icepyx.core.query.GenQuery - - -_temporal + + +_temporal icepyx.core.variables.Variables - -Variables - -_avail : NoneType, list -_vartype -_version : NoneType -path : NoneType -product : NoneType -wanted : NoneType, dict + +Variables + +_avail : NoneType, list +_path : NoneType +_product : NoneType, str +_version +path +product +version +wanted : NoneType, dict -__init__(vartype, avail, wanted, product, version, path, auth) +__init__(vartype, path, product, version, avail, wanted, auth) _check_valid_lists(vgrp, allpaths, var_list, beam_list, keyword_list) _get_combined_list(beam_list, keyword_list) _get_sum_varlist(var_list, all_vars, defaults) @@ -339,57 +340,57 @@ icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - - + + icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_order_vars + + +_order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - - -_file_vars + + +_file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - - -_read_vars + + +_read_vars icepyx.core.visualization.Visualize - -Visualize - -bbox : list -cycles : NoneType -date_range : NoneType -product : NoneType, str -tracks : NoneType - -__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) -generate_OA_parameters(): list -grid_bbox(binsize): list -make_request(base_url, payload) -parallel_request_OA(): da.array -query_icesat2_filelist(): tuple -request_OA_data(paras): da.array -viz_elevation(): (hv.DynamicMap, hv.Layout) + +Visualize + +bbox : list +cycles : NoneType +date_range : NoneType +product : NoneType, str +tracks : NoneType + +__init__(query_obj, product, spatial_extent, date_range, cycles, tracks) +generate_OA_parameters(): list +grid_bbox(binsize): list +make_request(base_url, payload) +parallel_request_OA(): da.array +query_icesat2_filelist(): tuple +request_OA_data(paras): da.array +viz_elevation(): (hv.DynamicMap, hv.Layout) diff --git a/doc/source/user_guide/documentation/classes_user_uml.svg b/doc/source/user_guide/documentation/classes_user_uml.svg index a9c116469..59b8e8e6f 100644 --- a/doc/source/user_guide/documentation/classes_user_uml.svg +++ b/doc/source/user_guide/documentation/classes_user_uml.svg @@ -259,49 +259,50 @@ icepyx.core.variables.Variables - -Variables - -path : NoneType -product : NoneType -wanted : NoneType, dict - -append(defaults, var_list, beam_list, keyword_list) -avail(options, internal) -parse_var_list(varlist, tiered, tiered_vars) -remove(all, var_list, beam_list, keyword_list) + +Variables + +path +product +version +wanted : NoneType, dict + +append(defaults, var_list, beam_list, keyword_list) +avail(options, internal) +parse_var_list(varlist, tiered, tiered_vars) +remove(all, var_list, beam_list, keyword_list) icepyx.core.variables.Variables->icepyx.core.auth.EarthdataAuthMixin - + icepyx.core.variables.Variables->icepyx.core.query.Query - + _order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - + _order_vars icepyx.core.variables.Variables->icepyx.core.query.Query - + _file_vars icepyx.core.variables.Variables->icepyx.core.read.Read - + _read_vars diff --git a/doc/source/user_guide/documentation/components.rst b/doc/source/user_guide/documentation/components.rst index b4b658385..dea41a970 100644 --- a/doc/source/user_guide/documentation/components.rst +++ b/doc/source/user_guide/documentation/components.rst @@ -67,14 +67,6 @@ validate\_inputs :undoc-members: :show-inheritance: -variables ---------- - -.. automodule:: icepyx.core.variables - :members: - :undoc-members: - :show-inheritance: - visualize --------- diff --git a/doc/source/user_guide/documentation/icepyx.rst b/doc/source/user_guide/documentation/icepyx.rst index 56ff7f496..a8a9a6f8e 100644 --- a/doc/source/user_guide/documentation/icepyx.rst +++ b/doc/source/user_guide/documentation/icepyx.rst @@ -23,4 +23,5 @@ Diagrams are updated automatically after a pull request (PR) is approved and bef query read quest + variables components diff --git a/doc/source/user_guide/documentation/packages_user_uml.svg b/doc/source/user_guide/documentation/packages_user_uml.svg index 44a041c77..8d8cf0dc9 100644 --- a/doc/source/user_guide/documentation/packages_user_uml.svg +++ b/doc/source/user_guide/documentation/packages_user_uml.svg @@ -4,11 +4,11 @@ - + packages_user_uml - + icepyx.core @@ -24,14 +24,14 @@ icepyx.core.auth - -icepyx.core.auth + +icepyx.core.auth icepyx.core.exceptions - -icepyx.core.exceptions + +icepyx.core.exceptions @@ -42,14 +42,14 @@ icepyx.core.icesat2data - -icepyx.core.icesat2data + +icepyx.core.icesat2data icepyx.core.is2ref - -icepyx.core.is2ref + +icepyx.core.is2ref @@ -60,8 +60,8 @@ icepyx.core.query->icepyx.core.auth - - + + @@ -96,44 +96,50 @@ icepyx.core.read - -icepyx.core.read + +icepyx.core.read icepyx.core.read->icepyx.core.exceptions - - + + icepyx.core.read->icepyx.core.variables - - + + icepyx.core.spatial - -icepyx.core.spatial + +icepyx.core.spatial icepyx.core.temporal - -icepyx.core.temporal + +icepyx.core.temporal icepyx.core.validate_inputs - -icepyx.core.validate_inputs + +icepyx.core.validate_inputs icepyx.core.variables->icepyx.core.auth - - + + + + + +icepyx.core.variables->icepyx.core.exceptions + + diff --git a/doc/source/user_guide/documentation/variables.rst b/doc/source/user_guide/documentation/variables.rst new file mode 100644 index 000000000..e147bfd64 --- /dev/null +++ b/doc/source/user_guide/documentation/variables.rst @@ -0,0 +1,25 @@ +Variables Class +================= + +.. currentmodule:: icepyx + + +Constructor +----------- + +.. autosummary:: + :toctree: ../../_icepyx/ + + Variables + + +Methods +------- + +.. autosummary:: + :toctree: ../../_icepyx/ + + Variables.avail + Variables.parse_var_list + Variables.append + Variables.remove diff --git a/icepyx/__init__.py b/icepyx/__init__.py index 3d92e2e60..40ea9e1ec 100644 --- a/icepyx/__init__.py +++ b/icepyx/__init__.py @@ -1,5 +1,6 @@ from icepyx.core.query import Query, GenQuery from icepyx.core.read import Read from icepyx.quest.quest import Quest +from icepyx.core.variables import Variables from _icepyx_version import version as __version__ diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index 5faaef110..a90c8fafa 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -1,3 +1,4 @@ +import h5py import json import numpy as np import requests @@ -110,7 +111,11 @@ def _get_custom_options(session, product, version): # reformatting formats = [Format.attrib for Format in root.iter("Format")] format_vals = [formats[i]["value"] for i in range(len(formats))] - format_vals.remove("") + try: + format_vals.remove("") + except KeyError: + # ATL23 does not have an empty value + pass cust_options.update({"fileformats": format_vals}) # reprojection only applicable on ICESat-2 L3B products. @@ -324,3 +329,49 @@ def gt2spot(gt, sc_orient): raise ValueError("Could not compute the spot number.") return np.uint8(spot) + +def latest_version(product): + """ + Determine the most recent version available for the given product. + + Examples + -------- + >>> latest_version('ATL03') + '006' + """ + _about_product = about_product(product) + return max( + [entry["version_id"] for entry in _about_product["feed"]["entry"]] + ) + +def extract_product(filepath): + """ + Read the product type from the metadata of the file. Return the product as a string. + """ + with h5py.File(filepath, 'r') as f: + try: + product = f.attrs['short_name'] + if isinstance(product, bytes): + # For most products the short name is stored in a bytes string + product = product.decode() + elif isinstance(product, np.ndarray): + # ATL14 saves the short_name as an array ['ATL14'] + product = product[0] + product = _validate_product(product) + except KeyError: + raise 'Unable to parse the product name from file metadata' + return product + +def extract_version(filepath): + """ + Read the version from the metadata of the file. Return the version as a string. + """ + with h5py.File(filepath, 'r') as f: + try: + version = f['METADATA']['DatasetIdentification'].attrs['VersionID'] + if isinstance(version, np.ndarray): + # ATL14 stores the version as an array ['00x'] + version = version[0] + except KeyError: + raise 'Unable to parse the version from file metadata' + return version diff --git a/icepyx/core/query.py b/icepyx/core/query.py index 3459fd132..8700d5655 100644 --- a/icepyx/core/query.py +++ b/icepyx/core/query.py @@ -12,6 +12,7 @@ import icepyx.core.APIformatting as apifmt from icepyx.core.auth import EarthdataAuthMixin import icepyx.core.granules as granules + # QUESTION: why doesn't from granules import Granules work, since granules=icepyx.core.granules? from icepyx.core.granules import Granules import icepyx.core.is2ref as is2ref @@ -432,7 +433,7 @@ def __init__( super().__init__(spatial_extent, date_range, start_time, end_time, **kwargs) - self._version = val.prod_version(self.latest_version(), version) + self._version = val.prod_version(is2ref.latest_version(self._prod), version) # build list of available CMR parameters if reducing by cycle or RGT # or a list of explicitly named files (full or partial names) @@ -448,6 +449,7 @@ def __init__( # initialize authentication properties EarthdataAuthMixin.__init__(self) + # ---------------------------------------------------------------------- # Properties @@ -646,6 +648,27 @@ def subsetparams(self, **kwargs): if self._subsetparams == None and not kwargs: return {} else: + # If the user has supplied a subset list of variables, append the + # icepyx required variables to the Coverage dict + if "Coverage" in kwargs.keys(): + var_list = [ + "orbit_info/sc_orient", + "orbit_info/sc_orient_time", + "ancillary_data/atlas_sdp_gps_epoch", + "orbit_info/cycle_number", + "orbit_info/rgt", + "ancillary_data/data_start_utc", + "ancillary_data/data_end_utc", + "ancillary_data/granule_start_utc", + "ancillary_data/granule_end_utc", + "ancillary_data/start_delta_time", + "ancillary_data/end_delta_time", + ] + # Add any variables from var_list to Coverage that are not already included + for var in var_list: + if var not in kwargs["Coverage"].keys(): + kwargs["Coverage"][var.split("/")[-1]] = [var] + if self._subsetparams == None: self._subsetparams = apifmt.Parameters("subset") if self._spatial._geom_file is not None: @@ -688,17 +711,16 @@ def order_vars(self): # DevGoal: check for active session here if hasattr(self, "_cust_options"): self._order_vars = Variables( - self._source, - auth = self.auth, product=self.product, + version=self._version, avail=self._cust_options["variables"], + auth=self.auth, ) else: self._order_vars = Variables( - self._source, - auth=self.auth, product=self.product, version=self._version, + auth=self.auth, ) # I think this is where property setters come in, and one should be used here? Right now order_vars.avail is only filled in @@ -722,17 +744,18 @@ def file_vars(self): Examples -------- >>> reg_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-20','2019-02-28']) # doctest: +SKIP - + >>> reg_a.file_vars # doctest: +SKIP """ if not hasattr(self, "_file_vars"): if self._source == "file": - self._file_vars = Variables(self._source, - auth=self.auth, - product=self.product, - ) + self._file_vars = Variables( + auth=self.auth, + product=self.product, + version=self._version, + ) return self._file_vars @@ -815,6 +838,8 @@ def product_all_info(self): def latest_version(self): """ + A reference function to is2ref.latest_version. + Determine the most recent version available for the given product. Examples @@ -823,11 +848,7 @@ def latest_version(self): >>> reg_a.latest_version() '006' """ - if not hasattr(self, "_about_product"): - self._about_product = is2ref.about_product(self._prod) - return max( - [entry["version_id"] for entry in self._about_product["feed"]["entry"]] - ) + return is2ref.latest_version(self.product) def show_custom_options(self, dictview=False): """ diff --git a/icepyx/core/read.py b/icepyx/core/read.py index 5ef1867f2..b62e5d2fe 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -320,10 +320,10 @@ class Read: # ---------------------------------------------------------------------- # Constructors - + def __init__( self, - data_source=None, + data_source=None, # DevNote: Make this a required arg when catalog is removed product=None, filename_pattern=None, catalog=None, @@ -336,10 +336,9 @@ def __init__( "The `catalog` argument has been deprecated and intake is no longer supported. " "Please use the `data_source` argument to specify your dataset instead." ) - + if data_source is None: raise ValueError("data_source is a required arguemnt") - # Raise warnings for deprecated arguments if filename_pattern: warnings.warn( @@ -380,7 +379,7 @@ def __init__( # Create a dictionary of the products as read from the metadata product_dict = {} for file_ in self._filelist: - product_dict[file_] = self._extract_product(file_) + product_dict[file_] = is2ref.extract_product(file_) # Raise warnings or errors for multiple products or products not matching the user-specified product all_products = list(set(product_dict.values())) @@ -456,12 +455,9 @@ def vars(self): """ if not hasattr(self, "_read_vars"): - self._read_vars = Variables( - "file", path=self.filelist[0], product=self.product - ) - + self._read_vars = Variables(path=self.filelist[0]) return self._read_vars - + @property def filelist(self): """ @@ -478,22 +474,6 @@ def product(self): # ---------------------------------------------------------------------- # Methods - - @staticmethod - def _extract_product(filepath): - """ - Read the product type from the metadata of the file. Return the product as a string. - """ - with h5py.File(filepath, "r") as f: - try: - product = f.attrs["short_name"].decode() - product = is2ref._validate_product(product) - except KeyError: - raise AttributeError( - f"Unable to extract the product name from file metadata." - ) - return product - @staticmethod def _check_source_for_pattern(source, filename_pattern): """ @@ -739,8 +719,33 @@ def load(self): # so to get a combined dataset, we need to keep track of spots under the hood, open each group, and then combine them into one xarray where the spots are IDed somehow (or only the strong ones are returned) # this means we need to get/track from each dataset we open some of the metadata, which we include as mandatory variables when constructing the wanted list + if not self.vars.wanted: + raise AttributeError( + 'No variables listed in self.vars.wanted. Please use the Variables class ' + 'via self.vars to search for desired variables to read and self.vars.append(...) ' + 'to add variables to the wanted variables list.' + ) + + # Append the minimum variables needed for icepyx to merge the datasets + # Skip products which do not contain required variables + if self.product not in ['ATL14', 'ATL15', 'ATL23']: + var_list=[ + "sc_orient", + "atlas_sdp_gps_epoch", + "cycle_number", + "rgt", + "data_start_utc", + "data_end_utc", + ] + + # Adjust the nec_varlist for individual products + if self.product == "ATL11": + var_list.remove("sc_orient") + + self.vars.append(defaults=False, var_list=var_list) + try: - groups_list = list_of_dict_vals(self._read_vars.wanted) + groups_list = list_of_dict_vals(self.vars.wanted) except AttributeError: pass diff --git a/icepyx/core/variables.py b/icepyx/core/variables.py index d46561f46..94645ca94 100644 --- a/icepyx/core/variables.py +++ b/icepyx/core/variables.py @@ -1,9 +1,13 @@ import numpy as np import os import pprint +import warnings from icepyx.core.auth import EarthdataAuthMixin import icepyx.core.is2ref as is2ref +from icepyx.core.exceptions import DeprecationError +import icepyx.core.validate_inputs as val +import icepyx.core as ipxc # DEVGOAL: use h5py to simplify some of these tasks, if possible! @@ -25,11 +29,21 @@ class Variables(EarthdataAuthMixin): contained in ICESat-2 products. Parameters - ---------- + ---------- vartype : string + This argument is deprecated. The vartype will be inferred from data_source. One of ['order', 'file'] to indicate the source of the input variables. This field will be auto-populated when a variable object is created as an attribute of a query object. + path : string, default None + The path to a local Icesat-2 file. The variables list will contain the variables + present in this file. Either path or product are required input arguments. + product : string, default None + Properly formatted string specifying a valid ICESat-2 product. The variables list will + contain all available variables for this product. Either product or path are required + input arguments. + version : string, default None + Properly formatted string specifying a valid version of the ICESat-2 product. avail : dictionary, default None Dictionary (key:values) of available variable names (keys) and paths (values). wanted : dictionary, default None @@ -38,47 +52,72 @@ class Variables(EarthdataAuthMixin): A session object authenticating the user to download data using their Earthdata login information. The session object will automatically be passed from the query object if you have successfully logged in there. - product : string, default None - Properly formatted string specifying a valid ICESat-2 product - version : string, default None - Properly formatted string specifying a valid version of the ICESat-2 product - path : string, default None - For vartype file, a path to a directory of or single input data file (not yet implemented) + """ def __init__( self, - vartype, - avail=None, - wanted=None, + vartype=None, + path=None, product=None, version=None, - path=None, + avail=None, + wanted=None, auth=None, ): - - assert vartype in ["order", "file"], "Please submit a valid variables type flag" + # Deprecation error + if vartype in ['order', 'file']: + raise DeprecationError( + 'It is no longer required to specify the variable type `vartype`. Instead please ', + 'provide either the path to a local file (arg: `path`) or the product you would ', + 'like variables for (arg: `product`).' + ) + + if path and product: + raise TypeError( + 'Please provide either a filepath or a product. If a filepath is provided ', + 'variables will be read from the file. If a product is provided all available ', + 'variables for that product will be returned.' + ) + # Set the product and version from either the input args or the file + if path: + self._path = path + self._product = is2ref.extract_product(self._path) + self._version = is2ref.extract_version(self._path) + elif product: + # Check for valid product string + self._product = is2ref._validate_product(product) + # Check for valid version string + # If version is not specified by the user assume the most recent version + self._version = val.prod_version(is2ref.latest_version(self._product), version) + else: + raise TypeError('Either a filepath or a product need to be given as input arguments.') + # initialize authentication properties EarthdataAuthMixin.__init__(self, auth=auth) - self._vartype = vartype - self.product = product self._avail = avail self.wanted = wanted # DevGoal: put some more/robust checks here to assess validity of inputs - - if self._vartype == "order": - if self._avail == None: - self._version = version - elif self._vartype == "file": - # DevGoal: check that the list or string are valid dir/files - self.path = path - - # @property - # def wanted(self): - # return self._wanted + + @property + def path(self): + if self._path: + path = self._path + else: + path = None + return path + + @property + def product(self): + return self._product + + @property + def version(self): + return self._version + def avail(self, options=False, internal=False): """ @@ -97,16 +136,14 @@ def avail(self, options=False, internal=False): . 'quality_assessment/gt3r/signal_selection_source_fraction_3'] """ - # if hasattr(self, '_avail'): - # return self._avail - # else: + if not hasattr(self, "_avail") or self._avail == None: - if self._vartype == "order": + if not hasattr(self, 'path'): self._avail = is2ref._get_custom_options( - self.session, self.product, self._version + self.session, self.product, self.version )["variables"] - - elif self._vartype == "file": + else: + # If a path was given, use that file to read the variables import h5py self._avail = [] @@ -446,53 +483,14 @@ def append(self, defaults=False, var_list=None, beam_list=None, keyword_list=Non and keyword_list == None ), "You must enter parameters to add to a variable subset list. If you do not want to subset by variable, ensure your is2.subsetparams dictionary does not contain the key 'Coverage'." - req_vars = {} + final_vars = {} - # if not hasattr(self, 'avail') or self.avail==None: self.get_avail() - # vgrp, paths = self.parse_var_list(self.avail) - # allpaths = [] - # [allpaths.extend(np.unique(np.array(paths[p]))) for p in range(len(paths))] vgrp, allpaths = self.avail(options=True, internal=True) - self._check_valid_lists(vgrp, allpaths, var_list, beam_list, keyword_list) - # add the mandatory variables to the data object - if self._vartype == "order": - nec_varlist = [ - "sc_orient", - "sc_orient_time", - "atlas_sdp_gps_epoch", - "data_start_utc", - "data_end_utc", - "granule_start_utc", - "granule_end_utc", - "start_delta_time", - "end_delta_time", - ] - elif self._vartype == "file": - nec_varlist = [ - "sc_orient", - "atlas_sdp_gps_epoch", - "cycle_number", - "rgt", - "data_start_utc", - "data_end_utc", - ] - - # Adjust the nec_varlist for individual products - if self.product == "ATL11": - nec_varlist.remove("sc_orient") - - try: - self._check_valid_lists(vgrp, allpaths, var_list=nec_varlist) - except ValueError: - # Assume gridded product since user input lists were previously validated - nec_varlist = [] - + # Instantiate self.wanted to an empty dictionary if it doesn't exist if not hasattr(self, "wanted") or self.wanted == None: - for varid in nec_varlist: - req_vars[varid] = vgrp[varid] - self.wanted = req_vars + self.wanted = {} # DEVGOAL: add a secondary var list to include uncertainty/error information for lower level data if specific data variables have been specified... @@ -501,21 +499,21 @@ def append(self, defaults=False, var_list=None, beam_list=None, keyword_list=Non # Case only variables (but not keywords or beams) are specified if beam_list == None and keyword_list == None: - req_vars.update(self._iter_vars(sum_varlist, req_vars, vgrp)) + final_vars.update(self._iter_vars(sum_varlist, final_vars, vgrp)) # Case a beam and/or keyword list is specified (with or without variables) else: - req_vars.update( - self._iter_paths(sum_varlist, req_vars, vgrp, beam_list, keyword_list) + final_vars.update( + self._iter_paths(sum_varlist, final_vars, vgrp, beam_list, keyword_list) ) # update the data object variables - for vkey in req_vars.keys(): + for vkey in final_vars.keys(): # add all matching keys and paths for new variables if vkey not in self.wanted.keys(): - self.wanted[vkey] = req_vars[vkey] + self.wanted[vkey] = final_vars[vkey] else: - for vpath in req_vars[vkey]: + for vpath in final_vars[vkey]: if vpath not in self.wanted[vkey]: self.wanted[vkey].append(vpath) From e8e12e66109a2de08310209b42100d702ea0f49f Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Mon, 27 Nov 2023 13:10:17 -0500 Subject: [PATCH 18/21] Update read module coordinate dimension manipulations to use new xarray index (#473) --- icepyx/core/read.py | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/icepyx/core/read.py b/icepyx/core/read.py index b62e5d2fe..e136a1d64 100644 --- a/icepyx/core/read.py +++ b/icepyx/core/read.py @@ -320,7 +320,7 @@ class Read: # ---------------------------------------------------------------------- # Constructors - + def __init__( self, data_source=None, # DevNote: Make this a required arg when catalog is removed @@ -336,7 +336,7 @@ def __init__( "The `catalog` argument has been deprecated and intake is no longer supported. " "Please use the `data_source` argument to specify your dataset instead." ) - + if data_source is None: raise ValueError("data_source is a required arguemnt") # Raise warnings for deprecated arguments @@ -457,7 +457,7 @@ def vars(self): if not hasattr(self, "_read_vars"): self._read_vars = Variables(path=self.filelist[0]) return self._read_vars - + @property def filelist(self): """ @@ -591,13 +591,11 @@ def _add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict): .assign_coords( { spot_dim_name: (spot_dim_name, [spot]), - "delta_time": ("delta_time", photon_ids), + "photon_idx": ("delta_time", photon_ids), } ) .assign({spot_var_name: (("gran_idx", spot_dim_name), [[track_str]])}) - .rename_dims({"delta_time": "photon_idx"}) - .rename({"delta_time": "photon_idx"}) - # .set_index("photon_idx") + .swap_dims({"delta_time": "photon_idx"}) ) # handle cases where the delta time is 2d due to multiple cycles in that group @@ -605,8 +603,6 @@ def _add_vars_to_ds(is2ds, ds, grp_path, wanted_groups_tiered, wanted_dict): ds = ds.assign_coords( {"delta_time": (("photon_idx", "cycle_number"), hold_delta_times)} ) - else: - ds = ds.assign_coords({"delta_time": ("photon_idx", hold_delta_times)}) # for ATL11 if "ref_pt" in ds.coords: @@ -721,15 +717,15 @@ def load(self): if not self.vars.wanted: raise AttributeError( - 'No variables listed in self.vars.wanted. Please use the Variables class ' - 'via self.vars to search for desired variables to read and self.vars.append(...) ' - 'to add variables to the wanted variables list.' + "No variables listed in self.vars.wanted. Please use the Variables class " + "via self.vars to search for desired variables to read and self.vars.append(...) " + "to add variables to the wanted variables list." ) - + # Append the minimum variables needed for icepyx to merge the datasets # Skip products which do not contain required variables - if self.product not in ['ATL14', 'ATL15', 'ATL23']: - var_list=[ + if self.product not in ["ATL14", "ATL15", "ATL23"]: + var_list = [ "sc_orient", "atlas_sdp_gps_epoch", "cycle_number", @@ -743,7 +739,7 @@ def load(self): var_list.remove("sc_orient") self.vars.append(defaults=False, var_list=var_list) - + try: groups_list = list_of_dict_vals(self.vars.wanted) except AttributeError: From d26422e49e0395be934777cdfda090bdbc7c5dc7 Mon Sep 17 00:00:00 2001 From: Rachel Wegener <35503632+rwegener2@users.noreply.github.com> Date: Mon, 27 Nov 2023 16:25:04 -0500 Subject: [PATCH 19/21] Expand Variables class to read s3 urls (#464) * expand extract_product and extract_version to check for s3 url * add cloud notes to variables notebook --------- Co-authored-by: Jessica Scheick --- .../IS2_data_variables.ipynb | 13 ++- icepyx/core/is2ref.py | 110 +++++++++++++----- icepyx/core/query.py | 4 + icepyx/core/validate_inputs.py | 13 +++ icepyx/core/variables.py | 33 +++--- 5 files changed, 122 insertions(+), 51 deletions(-) diff --git a/doc/source/example_notebooks/IS2_data_variables.ipynb b/doc/source/example_notebooks/IS2_data_variables.ipynb index 78a250789..c66445731 100644 --- a/doc/source/example_notebooks/IS2_data_variables.ipynb +++ b/doc/source/example_notebooks/IS2_data_variables.ipynb @@ -15,7 +15,7 @@ "\n", "A given ICESat-2 product may have over 200 variable + path combinations.\n", "icepyx includes a custom `Variables` module that is \"aware\" of the ATLAS sensor and how the ICESat-2 data products are stored.\n", - "The module can be accessed independently, and can also be accessed as a component of a `Query` object or `Read` object.\n", + "The module can be accessed independently and can also be accessed as a component of a `Query` object or `Read` object.\n", "\n", "This notebook illustrates in detail how the `Variables` module behaves. We use the module independently and also show how powerful it is directly in the icepyx workflow using a `Query` data access example.\n", "Module usage using `Query` is analogous through an icepyx ICESat-2 `Read` object.\n", @@ -75,7 +75,7 @@ "There are three ways to create or access an ICESat-2 Variables object in icepyx:\n", "1. Access via the `.order_vars` property of a Query object\n", "2. Access via the `.vars` property of a Read object\n", - "3. Create a stand-alone ICESat-2 Variables object using a local file or a product name\n", + "3. Create a stand-alone ICESat-2 Variables object using a local file, cloud file, or a product name\n", "\n", "An example of each of these is shown below." ] @@ -180,8 +180,11 @@ "### 3. Create a stand-alone Variables object\n", "\n", "You can also generate an independent Variables object. This can be done using either:\n", - "1. The filepath to a file you'd like a variables list for\n", - "2. The product name (and optionally version) of a an ICESat-2 product" + "1. The filepath to a local or cloud file you'd like a variables list for\n", + "2. The product name (and optionally version) of a an ICESat-2 product\n", + "\n", + "*Note: Cloud data access requires a valid Earthdata login; \n", + "you will be prompted to log in if you are not already authenticated.*" ] }, { @@ -255,7 +258,7 @@ }, "outputs": [], "source": [ - "v = ipx.Variables(product='ATL03', version='004')" + "v = ipx.Variables(product='ATL03', version='006')" ] }, { diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index a90c8fafa..d49d15f04 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -5,11 +5,10 @@ import warnings from xml.etree import ElementTree as ET +import earthaccess -import icepyx # ICESat-2 specific reference functions -# options to get customization options for ICESat-2 data (though could be used generally) def _validate_product(product): @@ -48,9 +47,6 @@ def _validate_product(product): return product -# DevGoal: See if there's a way to dynamically get this list so it's automatically updated - - def _validate_OA_product(product): """ Confirm a valid ICESat-2 product was specified @@ -87,6 +83,7 @@ def about_product(prod): # DevGoal: use a mock of this output to test later functions, such as displaying options and widgets, etc. +# options to get customization options for ICESat-2 data (though could be used generally) def _get_custom_options(session, product, version): """ Get lists of what customization options are available for the product from NSIDC. @@ -330,6 +327,7 @@ def gt2spot(gt, sc_orient): return np.uint8(spot) + def latest_version(product): """ Determine the most recent version available for the given product. @@ -340,38 +338,86 @@ def latest_version(product): '006' """ _about_product = about_product(product) - return max( - [entry["version_id"] for entry in _about_product["feed"]["entry"]] - ) + return max([entry["version_id"] for entry in _about_product["feed"]["entry"]]) -def extract_product(filepath): + +def extract_product(filepath, auth=None): """ - Read the product type from the metadata of the file. Return the product as a string. + Read the product type from the metadata of the file. Valid for local or s3 files, but must + provide an auth object if reading from s3. Return the product as a string. + + Parameters + ---------- + filepath: string + local or remote location of a file. Could be a local string or an s3 filepath + auth: earthaccess.auth.Auth, default None + An earthaccess authentication object. Optional, but necessary if accessing data in an + s3 bucket. """ - with h5py.File(filepath, 'r') as f: - try: - product = f.attrs['short_name'] - if isinstance(product, bytes): - # For most products the short name is stored in a bytes string - product = product.decode() - elif isinstance(product, np.ndarray): - # ATL14 saves the short_name as an array ['ATL14'] - product = product[0] - product = _validate_product(product) - except KeyError: - raise 'Unable to parse the product name from file metadata' + # Generate a file reader object relevant for the file location + if filepath.startswith("s3"): + if not auth: + raise AttributeError( + "Must provide credentials to `auth` if accessing s3 data" + ) + # Read the s3 file + s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth) + f = h5py.File(s3.open(filepath, "rb")) + else: + # Otherwise assume a local filepath. Read with h5py. + f = h5py.File(filepath, "r") + + # Extract the product information + try: + product = f.attrs["short_name"] + if isinstance(product, bytes): + # For most products the short name is stored in a bytes string + product = product.decode() + elif isinstance(product, np.ndarray): + # ATL14 saves the short_name as an array ['ATL14'] + product = product[0] + product = _validate_product(product) + except KeyError: + raise "Unable to parse the product name from file metadata" + # Close the file reader + f.close() return product -def extract_version(filepath): + +def extract_version(filepath, auth=None): """ - Read the version from the metadata of the file. Return the version as a string. + Read the version from the metadata of the file. Valid for local or s3 files, but must + provide an auth object if reading from s3. Return the version as a string. + + Parameters + ---------- + filepath: string + local or remote location of a file. Could be a local string or an s3 filepath + auth: earthaccess.auth.Auth, default None + An earthaccess authentication object. Optional, but necessary if accessing data in an + s3 bucket. """ - with h5py.File(filepath, 'r') as f: - try: - version = f['METADATA']['DatasetIdentification'].attrs['VersionID'] - if isinstance(version, np.ndarray): - # ATL14 stores the version as an array ['00x'] - version = version[0] - except KeyError: - raise 'Unable to parse the version from file metadata' + # Generate a file reader object relevant for the file location + if filepath.startswith("s3"): + if not auth: + raise AttributeError( + "Must provide credentials to `auth` if accessing s3 data" + ) + # Read the s3 file + s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth) + f = h5py.File(s3.open(filepath, "rb")) + else: + # Otherwise assume a local filepath. Read with h5py. + f = h5py.File(filepath, "r") + + # Read the version information + try: + version = f["METADATA"]["DatasetIdentification"].attrs["VersionID"] + if isinstance(version, np.ndarray): + # ATL14 stores the version as an array ['00x'] + version = version[0] + except KeyError: + raise "Unable to parse the version from file metadata" + # Close the file reader + f.close() return version diff --git a/icepyx/core/query.py b/icepyx/core/query.py index 8700d5655..4ffe4c241 100644 --- a/icepyx/core/query.py +++ b/icepyx/core/query.py @@ -350,6 +350,10 @@ class Query(GenQuery, EarthdataAuthMixin): reference ground tracks are used. Example: "0594" files : string, default None A placeholder for future development. Not used for any purposes yet. + auth : earthaccess.auth.Auth, default None + An earthaccess authentication object. Available as an argument so an existing + earthaccess.auth.Auth object can be used for authentication. If not given, a new auth + object will be created whenever authentication is needed. Returns ------- diff --git a/icepyx/core/validate_inputs.py b/icepyx/core/validate_inputs.py index c7ba55a6d..d74768eea 100644 --- a/icepyx/core/validate_inputs.py +++ b/icepyx/core/validate_inputs.py @@ -104,3 +104,16 @@ def tracks(track): warnings.warn("Listed Reference Ground Track is not available") return track_list + +def check_s3bucket(path): + """ + Check if the given path is an s3 path. Raise a warning if the data being referenced is not + in the NSIDC bucket + """ + split_path = path.split('/') + if split_path[0] == 's3:' and split_path[2] != 'nsidc-cumulus-prod-protected': + warnings.warn( + 's3 data being read from outside the NSIDC data bucket. Icepyx can ' + 'read this data, but available data lists may not be accurate.', stacklevel=2 + ) + return path diff --git a/icepyx/core/variables.py b/icepyx/core/variables.py index 94645ca94..4c52003df 100644 --- a/icepyx/core/variables.py +++ b/icepyx/core/variables.py @@ -48,11 +48,10 @@ class Variables(EarthdataAuthMixin): Dictionary (key:values) of available variable names (keys) and paths (values). wanted : dictionary, default None As avail, but for the desired list of variables - session : requests.session object - A session object authenticating the user to download data using their Earthdata login information. - The session object will automatically be passed from the query object if you - have successfully logged in there. - + auth : earthaccess.auth.Auth, default None + An earthaccess authentication object. Available as an argument so an existing + earthaccess.auth.Auth object can be used for authentication. If not given, a new auth + object will be created whenever authentication is needed. """ def __init__( @@ -75,16 +74,25 @@ def __init__( if path and product: raise TypeError( - 'Please provide either a filepath or a product. If a filepath is provided ', + 'Please provide either a path or a product. If a path is provided ', 'variables will be read from the file. If a product is provided all available ', 'variables for that product will be returned.' ) + + # initialize authentication properties + EarthdataAuthMixin.__init__(self, auth=auth) # Set the product and version from either the input args or the file if path: - self._path = path - self._product = is2ref.extract_product(self._path) - self._version = is2ref.extract_version(self._path) + self._path = val.check_s3bucket(path) + # Set up auth + if self._path.startswith('s3'): + auth = self.auth + else: + auth = None + # Read the product and version from the file + self._product = is2ref.extract_product(self._path, auth=auth) + self._version = is2ref.extract_version(self._path, auth=auth) elif product: # Check for valid product string self._product = is2ref._validate_product(product) @@ -92,10 +100,7 @@ def __init__( # If version is not specified by the user assume the most recent version self._version = val.prod_version(is2ref.latest_version(self._product), version) else: - raise TypeError('Either a filepath or a product need to be given as input arguments.') - - # initialize authentication properties - EarthdataAuthMixin.__init__(self, auth=auth) + raise TypeError('Either a path or a product need to be given as input arguments.') self._avail = avail self.wanted = wanted @@ -138,7 +143,7 @@ def avail(self, options=False, internal=False): """ if not hasattr(self, "_avail") or self._avail == None: - if not hasattr(self, 'path'): + if not hasattr(self, 'path') or self.path.startswith('s3'): self._avail = is2ref._get_custom_options( self.session, self.product, self.version )["variables"] From 84f76aba3cb16a718a610398228d2e69763ab158 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 7 Dec 2023 12:10:08 -0500 Subject: [PATCH 20/21] add argo functionality to QUEST (#427) - add argo.py dataset functionality and implementation through QUEST - demonstrate QUEST usage via example notebook - add save to QUEST DataSet class template Co-authored-by: Kelsey Bisson <48059682+kelseybisson@users.noreply.github.com> Co-authored-by: Romina Co-authored-by: zachghiaccio --- .../contributing/quest-available-datasets.rst | 16 +- .../QUEST_argo_data_access.ipynb | 626 ++++++++++++++++++ doc/source/index.rst | 3 +- icepyx/quest/__init__.py | 0 icepyx/quest/dataset_scripts/__init__.py | 2 +- icepyx/quest/dataset_scripts/argo.py | 515 ++++++++++++++ icepyx/quest/dataset_scripts/dataset.py | 10 +- icepyx/quest/quest.py | 84 ++- icepyx/tests/test_quest.py | 78 +-- icepyx/tests/test_quest_argo.py | 247 +++++++ 10 files changed, 1509 insertions(+), 72 deletions(-) create mode 100644 doc/source/example_notebooks/QUEST_argo_data_access.ipynb create mode 100644 icepyx/quest/__init__.py create mode 100644 icepyx/quest/dataset_scripts/argo.py create mode 100644 icepyx/tests/test_quest_argo.py diff --git a/doc/source/contributing/quest-available-datasets.rst b/doc/source/contributing/quest-available-datasets.rst index 91a6283a0..86901f7ed 100644 --- a/doc/source/contributing/quest-available-datasets.rst +++ b/doc/source/contributing/quest-available-datasets.rst @@ -9,10 +9,13 @@ On this page, we outline the datasets that are supported by the QUEST module. Cl List of Datasets ---------------- -* `Argo `_ - * The Argo mission involves a series of floats that are designed to capture vertical ocean profiles of temperature, salinity, and pressure down to ~2000 m. Some floats are in support of BGC-Argo, which also includes data relevant for biogeochemical applications: oxygen, nitrate, chlorophyll, backscatter, and solar irradiance. - * (Link Kelsey's paper here) - * (Link to example workbook here) +`Argo `_ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The Argo mission involves a series of floats that are designed to capture vertical ocean profiles of temperature, salinity, and pressure down to ~2000 m. Some floats are in support of BGC-Argo, which also includes data relevant for biogeochemical applications: oxygen, nitrate, chlorophyll, backscatter, and solar irradiance. + +A paper outlining the Argo extension to QUEST is currently in preparation, with a citable preprint available in the near future. + +:ref:`Argo Workflow Example` Adding a Dataset to QUEST @@ -20,6 +23,7 @@ Adding a Dataset to QUEST Want to add a new dataset to QUEST? No problem! QUEST includes a template script (``dataset.py``) that may be used to create your own querying module for a dataset of interest. -Guidelines on how to construct your dataset module may be found here: (link to be added) +Once you have developed a script with the template, you may request for the module to be added to QUEST via GitHub. +Please see the How to Contribute page :ref:`dev_guide_label` for instructions on how to contribute to icepyx. -Once you have developed a script with the template, you may request for the module to be added to QUEST via Github. Please see the How to Contribute page :ref:`dev_guide_label` for instructions on how to contribute to icepyx. \ No newline at end of file +Detailed guidelines on how to construct your dataset module are currently a work in progress. diff --git a/doc/source/example_notebooks/QUEST_argo_data_access.ipynb b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb new file mode 100644 index 000000000..1bdb5fd0c --- /dev/null +++ b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb @@ -0,0 +1,626 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "16806722-f5bb-4063-bd4b-60c8b0d24d2a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "# QUEST Example: Finding Argo and ICESat-2 data\n", + "\n", + "In this notebook, we are going to find Argo and ICESat-2 data over a region of the Pacific Ocean. Normally, we would require multiple data portals or Python packages to accomplish this. However, thanks to the [QUEST (Query, Unify, Explore SpatioTemporal) module](https://icepyx.readthedocs.io/en/latest/contributing/quest-available-datasets.html), we can use icepyx to find both!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed25d839-4114-41db-9166-8c027368686c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Basic packages\n", + "import geopandas as gpd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from pprint import pprint\n", + "\n", + "# icepyx and QUEST\n", + "import icepyx as ipx" + ] + }, + { + "cell_type": "markdown", + "id": "5c35f5df-b4fb-4a36-8d6f-d20f1552767a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Define the Quest Object\n", + "\n", + "QUEST builds off of the general querying process originally designed for ICESat-2, but makes it applicable to other datasets.\n", + "\n", + "Just like the ICESat-2 Query object, we begin by defining our Quest object. We provide the following bounding parameters:\n", + "* `spatial_extent`: Data is constrained to the given box over the Pacific Ocean.\n", + "* `date_range`: Only grab data from April 18-19, 2022 (to keep download sizes small for this example)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5d0546d-f0b8-475d-9fd4-62ace696e316", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Spatial bounds, given as SW/NE corners\n", + "spatial_extent = [-154, 30, -143, 37]\n", + "\n", + "# Start and end dates, in YYYY-MM-DD format\n", + "date_range = ['2022-04-18', '2022-04-19']\n", + "\n", + "# Initialize the QUEST object\n", + "reg_a = ipx.Quest(spatial_extent=spatial_extent, date_range=date_range)\n", + "\n", + "print(reg_a)" + ] + }, + { + "cell_type": "markdown", + "id": "8732bf56-1d44-4182-83f7-4303a87d231a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Notice that we have defined our spatial and temporal domains, but we do not have any datasets in our QUEST object. The next section leads us through that process." + ] + }, + { + "cell_type": "markdown", + "id": "1598bbca-3dcb-4b63-aeb1-81c27d92a1a2", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Getting the data\n", + "\n", + "Let's first query the ICESat-2 data. If we want to extract information about the water column, the ATL03 product is likely the desired choice.\n", + "* `short_name`: ATL03" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "309a7b26-cfc3-46fc-a683-43e154412074", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# ICESat-2 product\n", + "short_name = 'ATL03'\n", + "\n", + "# Add ICESat-2 to QUEST datasets\n", + "reg_a.add_icesat2(product=short_name)\n", + "print(reg_a)" + ] + }, + { + "cell_type": "markdown", + "id": "ad4bbcfe-3199-4a28-8739-c930d1572538", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Let's see the available files over this region." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2b4e56f-ceff-45e7-b52c-e7725dc6c812", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pprint(reg_a.datasets['icesat2'].avail_granules(ids=True))" + ] + }, + { + "cell_type": "markdown", + "id": "7a081854-dae4-4e99-a550-02c02a71b6de", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Note the ICESat-2 functions shown here are the same as those used for direct icepyx queries. The user is referred to other [example workbooks](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html) for detailed explanations about icepyx functionality.\n", + "\n", + "Accessing ICESat-2 data requires Earthdata login credentials. When running the `download_all()` function below, an authentication check will be passed when attempting to download the ICESat-2 files." + ] + }, + { + "cell_type": "markdown", + "id": "8264515a-00f1-4f57-b927-668a71294079", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now let's grab Argo data using the same constraints. This is as simple as using the below function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c857fdcc-e271-4960-86a9-02f693cc13fe", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Add argo to the desired QUEST datasets\n", + "reg_a.add_argo()" + ] + }, + { + "cell_type": "markdown", + "id": "7bade19e-5939-410a-ad54-363636289082", + "metadata": { + "user_expressions": [] + }, + "source": [ + "When accessing Argo data, the variables of interest will be organized as vertical profiles as a function of pressure. By default, only temperature is queried, so the user should supply a list of desired parameters using the code below. The user may also limit the pressure range of the returned data by passing `presRange=\"0,200\"`.\n", + "\n", + "*Note: Our example shows only physical Argo float parameters, but the process is identical for including BGC float parameters.*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6739c3aa-1a88-4d8e-9fd8-479528c20e97", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Customized variable query to retrieve salinity instead of temperature\n", + "reg_a.add_argo(params=['salinity'])" + ] + }, + { + "cell_type": "markdown", + "id": "2d06436c-2271-4229-8196-9f5180975ab1", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Additionally, a user may view or update the list of requested Argo and Argo-BGC parameters at any time through `reg_a.datasets['argo'].params`. If a user submits an invalid parameter (\"temp\" instead of \"temperature\", for example), an `AssertionError` will be raised. `reg_a.datasets['argo'].presRange` behaves anologously for limiting the pressure range of Argo data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e34756b8", + "metadata": {}, + "outputs": [], + "source": [ + "# update the list of argo parameters\n", + "reg_a.datasets['argo'].params = ['temperature','salinity']\n", + "\n", + "# show the current list\n", + "reg_a.datasets['argo'].params" + ] + }, + { + "cell_type": "markdown", + "id": "453900c1-cd62-40c9-820c-0615f63f17f5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "As for ICESat-2 data, the user can interact directly with the Argo data object (`reg_a.datasets['argo']`) to search or download data outside of the `Quest.search_all()` and `Quest.download_all()` functionality shown below.\n", + "\n", + "The approach to directly search or download Argo data is to use `reg_a.datasets['argo'].search_data()`, and `reg_a.datasets['argo'].download()`. In both cases, the existing parameters and pressure ranges are used unless the user passes new `params` and/or `presRange` kwargs, respectively, which will directly update those values (stored attributes)." + ] + }, + { + "cell_type": "markdown", + "id": "3f55be4e-d261-49c1-ac14-e19d8e0ff828", + "metadata": { + "user_expressions": [] + }, + "source": [ + "With our current setup, let's see what Argo parameters we will get." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "435a1243", + "metadata": {}, + "outputs": [], + "source": [ + "# see what argo parameters will be searched for or downloaded\n", + "reg_a.datasets['argo'].params" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c15675df", + "metadata": {}, + "outputs": [], + "source": [ + "reg_a.datasets['argo'].search_data()" + ] + }, + { + "cell_type": "markdown", + "id": "70d36566-0d3c-4781-a199-09bb11dad975", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now we can access the data for both Argo and ICESat-2! The below function will do this for us.\n", + "\n", + "**Important**: The Argo data will be compiled into a Pandas DataFrame, which must be manually saved by the user as demonstrated below. The ICESat-2 data is saved as processed HDF-5 files to the directory provided." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a818c5d7-d69a-4aad-90a2-bc670a54c3a7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "path = './quest/downloaded-data/'\n", + "\n", + "# Access Argo and ICESat-2 data simultaneously\n", + "reg_a.download_all(path=path)" + ] + }, + { + "cell_type": "markdown", + "id": "ad29285e-d161-46ea-8a57-95891fa2b237", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "We now have one available Argo profile, containing `temperature` and `pressure`, in a Pandas DataFrame. BGC Argo is also available through QUEST, so we could add more variables to this list.\n", + "\n", + "If the user wishes to add more profiles, parameters, and/or pressure ranges to a pre-existing DataFrame, then they should use `reg_a.datasets['argo'].download(keep_existing=True)` to retain previously downloaded data and have the new data added." + ] + }, + { + "cell_type": "markdown", + "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `reg_a.download_all()` function also provided a file containing ICESat-2 ATL03 data. Recall that because these data files are very large, we focus on only one file for this example.\n", + "\n", + "The below workflow uses the icepyx Read module to quickly load ICESat-2 data into an Xarray DataSet. To read in multiple files, see the [icepyx Read tutorial](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for how to change your input source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88f4b1b0-8c58-414c-b6a8-ce1662979943", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "filename = 'processed_ATL03_20220419002753_04111506_006_02.h5'\n", + "\n", + "reader = ipx.Read(data_source=path+filename)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "665d79a7-7360-4846-99c2-222b34df2a92", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# decide which portions of the file to read in\n", + "reader.vars.append(beam_list=['gt2l'], \n", + " var_list=['h_ph', \"lat_ph\", \"lon_ph\", 'signal_conf_ph'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7158814-50f0-4940-980c-9bb800360982", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ds = reader.load()\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "1040438c-d806-4964-b4f0-1247da9f3f1f", + "metadata": { + "user_expressions": [] + }, + "source": [ + "To make the data more easily plottable, let's convert the data into a Pandas DataFrame. Note that this method is memory-intensive for ATL03 data, so users are suggested to look at small spatial domains to prevent the notebook from crashing. Here, since we only have data from one granule and ground track, we have sped up the conversion to a dataframe by first removing extra data dimensions we don't need for our plots. Several of the other steps completed below using Pandas have analogous operations in Xarray that would further reduce memory requirements and computation times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50d23a8e", + "metadata": {}, + "outputs": [], + "source": [ + "is2_pd =(ds.squeeze()\n", + " .reset_coords()\n", + " .drop_vars([\"source_file\",\"data_start_utc\",\"data_end_utc\",\"gran_idx\"])\n", + " .to_dataframe()\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01bb5a12", + "metadata": {}, + "outputs": [], + "source": [ + "is2_pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc67e039-338c-4348-acaf-96f605cf0030", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Create a new dataframe with only \"ocean\" photons, as indicated by the \"ds_surf_type\" flag\n", + "is2_pd = is2_pd.reset_index(level=[0,1])\n", + "is2_pd_ocean = is2_pd[is2_pd.ds_surf_type==1].drop(columns=\"photon_idx\")\n", + "is2_pd_ocean" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "976ed530-1dc9-412f-9d2d-e51abd28c564", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Set Argo data as its own DataFrame\n", + "argo_df = reg_a.datasets['argo'].argodata" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9a3b8cf-f3b9-4522-841b-bf760672e37f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Convert both DataFrames into GeoDataFrames\n", + "is2_gdf = gpd.GeoDataFrame(is2_pd_ocean, \n", + " geometry=gpd.points_from_xy(is2_pd_ocean['lon_ph'], is2_pd_ocean['lat_ph']),\n", + " crs='EPSG:4326'\n", + ")\n", + "argo_gdf = gpd.GeoDataFrame(argo_df, \n", + " geometry=gpd.points_from_xy(argo_df.lon, argo_df.lat),\n", + " crs='EPSG:4326'\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "86cb8463-dc14-4c1d-853e-faf7bf4300a5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. The time variables cause errors in the function, so we will drop those variables first. \n", + "\n", + "Note that for large datasets like ICESat-2, loading the map might take a while." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7178fecc-6ca1-42a1-98d4-08f57c050daa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Drop time variables that would cause errors in explore() function\n", + "is2_gdf = is2_gdf.drop(['delta_time','atlas_sdp_gps_epoch'], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ff40f7b-3a0f-4e32-8187-322a5b7cb44d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Plot ICESat-2 track (medium/high confidence photons only) on a map\n", + "m = is2_gdf[is2_gdf['signal_conf_ph']>=3].explore(column='rgt', tiles='Esri.WorldImagery',\n", + " name='ICESat-2')\n", + "\n", + "# Add Argo float locations to map\n", + "argo_gdf.explore(m=m, name='Argo', marker_kwds={\"radius\": 6}, color='red')" + ] + }, + { + "cell_type": "markdown", + "id": "8b7063ec-a2f8-4509-a7ce-5b0482b48682", + "metadata": { + "user_expressions": [] + }, + "source": [ + "While we're at it, let's plot temperature and pressure profiles for each of the Argo floats in the area." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da2748b7-b174-4abb-a44a-bd73d1d36eba", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Plot vertical profile of temperature vs. pressure for all of the floats\n", + "fig, ax = plt.subplots(figsize=(12, 6))\n", + "for pid in np.unique(argo_df['profile_id']):\n", + " argo_df[argo_df['profile_id']==pid].plot(ax=ax, x='temperature', y='pressure', label=pid)\n", + "plt.gca().invert_yaxis()\n", + "plt.xlabel('Temperature [$\\degree$C]')\n", + "plt.ylabel('Pressure [hPa]')\n", + "plt.ylim([750, -10])\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "id": "08481fbb-2298-432b-bd50-df2e1ca45cf5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Lastly, let's look at some near-coincident ICESat-2 and Argo data in a multi-panel plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1269de3c-c15d-4120-8284-3b072069d5ee", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Only consider ICESat-2 signal photons\n", + "is2_pd_signal = is2_pd_ocean[is2_pd_ocean['signal_conf_ph']>=0]\n", + "\n", + "## Multi-panel plot showing ICESat-2 and Argo data\n", + "\n", + "# Calculate Extent\n", + "lons = [-154, -143, -143, -154, -154]\n", + "lats = [30, 30, 37, 37, 30]\n", + "lon_margin = (max(lons) - min(lons)) * 0.1\n", + "lat_margin = (max(lats) - min(lats)) * 0.1\n", + "\n", + "# Create Plot\n", + "fig,([ax1,ax2],[ax3,ax4]) = plt.subplots(2, 2, figsize=(12, 6))\n", + "\n", + "# Plot Relative Global View\n", + "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n", + "world.plot(ax=ax1, color='0.8', edgecolor='black')\n", + "argo_df.plot.scatter(ax=ax1, x='lon', y='lat', s=25.0, c='green', zorder=3, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax1, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", + "ax1.plot(lons, lats, linewidth=1.5, color='orange', zorder=2)\n", + "ax1.set_xlim(-160,-100)\n", + "ax1.set_ylim(20,50)\n", + "ax1.set_aspect('equal', adjustable='box')\n", + "ax1.set_xlabel('Longitude', fontsize=18)\n", + "ax1.set_ylabel('Latitude', fontsize=18)\n", + "\n", + "# Plot Zoomed View of Ground Tracks\n", + "argo_df.plot.scatter(ax=ax2, x='lon', y='lat', s=50.0, c='green', zorder=3, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax2, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", + "ax2.plot(lons, lats, linewidth=1.5, color='orange', zorder=1)\n", + "ax2.set_xlim(min(lons) - lon_margin, max(lons) + lon_margin)\n", + "ax2.set_ylim(min(lats) - lat_margin, max(lats) + lat_margin)\n", + "ax2.set_aspect('equal', adjustable='box')\n", + "ax2.set_xlabel('Longitude', fontsize=18)\n", + "ax2.set_ylabel('Latitude', fontsize=18)\n", + "\n", + "# Plot ICESat-2 along-track vertical profile. A dotted line notes the location of a nearby Argo float\n", + "is2 = ax3.scatter(is2_pd_signal['lat_ph'], is2_pd_signal['h_ph']+13.1, s=0.1)\n", + "ax3.axvline(34.43885, linestyle='--', linewidth=3, color='black')\n", + "ax3.set_xlim([34.3, 34.5])\n", + "ax3.set_ylim([-20, 5])\n", + "ax3.set_xlabel('Latitude', fontsize=18)\n", + "ax3.set_ylabel('Approx. IS-2 Depth [m]', fontsize=16)\n", + "ax3.set_yticklabels(['15', '10', '5', '0', '-5'])\n", + "\n", + "# Plot vertical ocean profile of the nearby Argo float\n", + "argo_df.plot(ax=ax4, x='temperature', y='pressure', linewidth=3)\n", + "# ax4.set_yscale('log')\n", + "ax4.invert_yaxis()\n", + "ax4.get_legend().remove()\n", + "ax4.set_xlabel('Temperature [$\\degree$C]', fontsize=18)\n", + "ax4.set_ylabel('Argo Pressure', fontsize=16)\n", + "\n", + "plt.tight_layout()\n", + "\n", + "# Save figure\n", + "#plt.savefig('/icepyx/quest/figures/is2_argo_figure.png', dpi=500)" + ] + }, + { + "cell_type": "markdown", + "id": "37720c79", + "metadata": {}, + "source": [ + "Recall that the Argo data must be saved manually.\n", + "The dataframe associated with the Quest object can be saved using `reg_a.save_all(path)`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b6548e2-0662-4c8b-a251-55ca63aff99b", + "metadata": {}, + "outputs": [], + "source": [ + "reg_a.save_all(path)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/source/index.rst b/doc/source/index.rst index 586c8810f..612af6adc 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -128,6 +128,7 @@ ICESat-2 datasets to enable scientific discovery. example_notebooks/IS2_data_visualization example_notebooks/IS2_data_read-in example_notebooks/IS2_cloud_data_access + example_notebooks/QUEST_argo_data_access .. toctree:: :maxdepth: 2 @@ -145,9 +146,9 @@ ICESat-2 datasets to enable scientific discovery. contributing/contributors_link contributing/contribution_guidelines contributing/how_to_contribute + contributing/attribution_link contributing/icepyx_internals contributing/quest-available-datasets - contributing/attribution_link contributing/development_plan contributing/release_guide contributing/code_of_conduct_link diff --git a/icepyx/quest/__init__.py b/icepyx/quest/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/icepyx/quest/dataset_scripts/__init__.py b/icepyx/quest/dataset_scripts/__init__.py index c7b28ee49..7834127ff 100644 --- a/icepyx/quest/dataset_scripts/__init__.py +++ b/icepyx/quest/dataset_scripts/__init__.py @@ -1 +1 @@ -from .dataset import * \ No newline at end of file +from .dataset import * diff --git a/icepyx/quest/dataset_scripts/argo.py b/icepyx/quest/dataset_scripts/argo.py new file mode 100644 index 000000000..8c614d301 --- /dev/null +++ b/icepyx/quest/dataset_scripts/argo.py @@ -0,0 +1,515 @@ +import os.path + +import numpy as np +import pandas as pd +import requests + +from icepyx.core.spatial import geodataframe +from icepyx.quest.dataset_scripts.dataset import DataSet + + +class Argo(DataSet): + """ + Initialises an Argo Dataset object via a Quest object. + Used to query physical and BGC Argo profiles. + + Parameters + --------- + aoi : + area of interest supplied via the spatial parameter of the QUEST object + toi : + time period of interest supplied via the temporal parameter of the QUEST object + params : list of str, default ["temperature"] + A list of strings, where each string is a requested parameter. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`; + be careful using all for floats with BGC data, as this may be result in a large download. + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + Input as a "shallow-limit,deep-limit" string. + + See Also + -------- + DataSet + """ + + # Note: it looks like ArgoVis now accepts polygons, not just bounding boxes + def __init__(self, aoi, toi, params=["temperature"], presRange=None): + self._params = self._validate_parameters(params) + self._presRange = presRange + self._spatial = aoi + self._temporal = toi + # todo: verify that this will only work with a bounding box (I think our code can accept arbitrary polygons) + assert self._spatial._ext_type == "bounding_box" + self.argodata = None + self._apikey = "92259861231b55d32a9c0e4e3a93f4834fc0b6fa" + + def __str__(self): + if self.presRange is None: + prange = "All" + else: + prange = str(self.presRange) + + if self.argodata is None: + df = "No data yet" + else: + df = "\n" + str(self.argodata.head()) + s = ( + "---Argo---\n" + "Parameters: {0}\n" + "Pressure range: {1}\n" + "Dataframe head: {2}".format(self.params, prange, df) + ) + + return s + + # ---------------------------------------------------------------------- + # Properties + + @property + def params(self) -> list: + """ + User's list of Argo parameters to search (query) and download. + + The user may modify this list directly. + """ + + return self._params + + @params.setter + def params(self, value): + """ + Validate the input list of parameters. + """ + + self._params = list(set(self._validate_parameters(value))) + + @property + def presRange(self) -> str: + """ + User's pressure range to search (query) and download. + + The user may modify this string directly. + """ + + return self._presRange + + @presRange.setter + def presRange(self, value): + """ + Update the presRange based on the user input + """ + + self._presRange = value + + # ---------------------------------------------------------------------- + # Formatting API Inputs + + def _fmt_coordinates(self) -> str: + """ + Convert spatial extent into string format needed by argovis API + i.e. list of polygon coords [[[lat1,lon1],[lat2,lon2],...]] + """ + + gdf = geodataframe(self._spatial._ext_type, self._spatial._spatial_ext) + coordinates_array = np.asarray(gdf.geometry[0].exterior.coords) + x = "" + for i in coordinates_array: + coord = "[{0},{1}]".format(i[0], i[1]) + if x == "": + x = coord + else: + x += "," + coord + + x = "[" + x + "]" + return x + + # ---------------------------------------------------------------------- + # Validation + + def _valid_params(self) -> list: + """ + A list of valid Argo measurement parameters (including BGC). + + To get a list of valid parameters, comment out the validation line in `search_data` herein, + submit a search with an invalid parameter, and get the list from the response. + """ + + valid_params = [ + # all argo + "pressure", + "pressure_argoqc", + "salinity", + "salinity_argoqc", + "salinity_sfile", + "salinity_sfile_argoqc", + "temperature", + "temperature_argoqc", + "temperature_sfile", + "temperature_sfile_argoqc", + # BGC params + "bbp470", + "bbp470_argoqc", + "bbp532", + "bbp532_argoqc", + "bbp700", + "bbp700_argoqc", + "bbp700_2", + "bbp700_2_argoqc", + "bisulfide", + "bisulfide_argoqc", + "cdom", + "cdom_argoqc", + "chla", + "chla_argoqc", + "cndc", + "cndc_argoqc", + "cndx", + "cndx_argoqc", + "cp660", + "cp660_argoqc", + "down_irradiance380", + "down_irradiance380_argoqc", + "down_irradiance412", + "down_irradiance412_argoqc", + "down_irradiance442", + "down_irradiance442_argoqc", + "down_irradiance443", + "down_irradiance443_argoqc", + "down_irradiance490", + "down_irradiance490_argoqc", + "down_irradiance555", + "down_irradiance555_argoqc", + "down_irradiance670", + "down_irradiance670_argoqc", + "downwelling_par", + "downwelling_par_argoqc", + "doxy", + "doxy_argoqc", + "doxy2", + "doxy2_argoqc", + "doxy3", + "doxy3_argoqc", + "molar_doxy", + "molar_doxy_argoqc", + "nitrate", + "nitrate_argoqc", + "ph_in_situ_total", + "ph_in_situ_total_argoqc", + "turbidity", + "turbidity_argoqc", + "up_radiance412", + "up_radiance412_argoqc", + "up_radiance443", + "up_radiance443_argoqc", + "up_radiance490", + "up_radiance490_argoqc", + "up_radiance555", + "up_radiance555_argoqc", + # all params + "all", + ] + return valid_params + + def _validate_parameters(self, params) -> list: + """ + Checks that the list of user requested parameters are valid. + + Returns + ------- + The list of valid parameters + """ + + if "all" in params: + params = ["all"] + else: + valid_params = self._valid_params() + # checks that params are valid + for i in params: + assert ( + i in valid_params + ), "Parameter '{0}' is not valid. Valid parameters are {1}".format( + i, valid_params + ) + + return list(set(params)) + + # ---------------------------------------------------------------------- + # Querying and Getting Data + + def search_data(self, params=None, presRange=None, printURL=False) -> str: + """ + Query for available argo profiles given the spatio temporal criteria + and other params specific to the dataset. + Searches will automatically use the parameter and pressure range inputs + supplied when the `quest.argo` object was created unless replacement arguments + are added here. + + Parameters + --------- + params : list of str, default None + A list of strings, where each string is a requested parameter. + This kwarg is used to replace the existing list in `self.params`. + Do not submit this kwarg if you would like to use the existing `self.params` list. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`; + be careful using all for floats with BGC data, as this may be result in a large download. + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + This kwarg is used to replace the existing pressure range in `self.presRange`. + Do not submit this kwarg if you would like to use the existing `self.presRange` values. + Input as a "shallow-limit,deep-limit" string. + printURL : boolean, default False + Print the URL of the data request. Useful for debugging and when no data is returned. + + Returns + ------ + str : message on the success status of the search + """ + + # if search is called with replaced parameters or presRange + if not params is None: + self.params = params + + if not presRange is None: + self.presRange = presRange + + # builds URL to be submitted + baseURL = "https://argovis-api.colorado.edu/argo" + payload = { + "startDate": self._temporal._start.strftime("%Y-%m-%dT%H:%M:%S.%fZ"), + "endDate": self._temporal._end.strftime("%Y-%m-%dT%H:%M:%S.%fZ"), + "polygon": [self._fmt_coordinates()], + "data": self.params, + } + + if self.presRange is not None: + payload["presRange"] = self.presRange + + # submit request + resp = requests.get( + baseURL, headers={"x-argokey": self._apikey}, params=payload + ) + + if printURL: + print(resp.url) + + selectionProfiles = resp.json() + + # Consider any status other than 2xx an error + if not resp.status_code // 100 == 2: + # check for the existence of profiles from query + if selectionProfiles == []: + msg = ( + "Warning: Query returned no profiles\n" + "Please try different search parameters" + ) + print(msg) + return msg + + else: + msg = "Error: Unexpected response {}".format(resp) + print(msg) + return msg + + # record the profile ids for the profiles that contain the requested parameters + prof_ids = [] + for i in selectionProfiles: + prof_ids.append(i["_id"]) + # should we be doing a set/duplicates check here?? + self.prof_ids = prof_ids + + msg = "{0} valid profiles have been identified".format(len(prof_ids)) + print(msg) + return msg + + def _download_profile( + self, + profile_number, + printURL=False, + ) -> dict: + """ + Download available argo data for a particular profile_ID. + + Parameters + --------- + profile_number: str + String containing the argo profile ID of the data being downloaded. + printURL: boolean, default False + Print the URL of the data request. Useful for debugging and when no data is returned. + + Returns + ------ + dict : json formatted dictionary of the profile data + """ + + # builds URL to be submitted + baseURL = "https://argovis-api.colorado.edu/argo" + payload = { + "id": profile_number, + "data": self.params, + } + + if self.presRange: + payload["presRange"] = self.presRange + + # submit request + resp = requests.get( + baseURL, headers={"x-argokey": self._apikey}, params=payload + ) + + if printURL: + print(resp.url) + + # Consider any status other than 2xx an error + if not resp.status_code // 100 == 2: + return "Error: Unexpected response {}".format(resp) + profile = resp.json() + return profile + + def _parse_into_df(self, profile_data) -> pd.DataFrame: + """ + Parses downloaded data from a single profile into dataframe. + Appends data to any existing profile data stored in the `argodata` property. + + Parameters + ---------- + profile_data: dict + The downloaded profile data. + The data is contained in the requests response and converted into a json formatted dictionary + by `_download_profile` before being passed into this function. + + Returns + ------- + pd.DataFrame : DataFrame of profile data + """ + + profileDf = pd.DataFrame( + np.transpose(profile_data["data"]), columns=profile_data["data_info"][0] + ) + + # this block tries to catch changes to the ArgoVis API that will break the dataframe creation + try: + profileDf["profile_id"] = profile_data["_id"] + # there's also a geolocation field that provides the geospatial info as shapely points + profileDf["lat"] = profile_data["geolocation"]["coordinates"][1] + profileDf["lon"] = profile_data["geolocation"]["coordinates"][0] + profileDf["date"] = profile_data["timestamp"] + except KeyError as err: + msg = "We cannot automatically parse your profile into a dataframe due to {0}".format( + err + ) + print(msg) + return msg + + profileDf.replace("None", np.nan, inplace=True, regex=True) + + return profileDf + + def download(self, params=None, presRange=None, keep_existing=True) -> pd.DataFrame: + """ + Downloads the requested data for a list of profile IDs (stored under .prof_ids) and returns it in a DataFrame. + + Data is also stored in self.argodata. + Note that if new inputs (`params` or `presRange`) are supplied and `keep_existing=True`, + the existing data will not be limited to the new input parameters. + + Parameters + ---------- + params : list of str, default None + A list of strings, where each string is a requested parameter. + This kwarg is used to replace the existing list in `self.params`. + Do not submit this kwarg if you would like to use the existing `self.params` list. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`. + For a list of available parameters, see: `reg._valid_params` + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + This kwarg is used to replace the existing pressure range in `self.presRange`. + Do not submit this kwarg if you would like to use the existing `self.presRange` values. + Input as a "shallow-limit,deep-limit" string. + keep_existing : boolean, default True + Provides the option to clear any existing downloaded data before downloading more. + + Returns + ------- + pd.DataFrame : DataFrame of requested data + """ + + # TODO: do some basic testing of this block and how the dataframe merging actually behaves + if keep_existing == False: + print( + "Your previously stored data in reg.argodata", + "will be deleted before new data is downloaded.", + ) + self.argodata = None + elif keep_existing == True and hasattr(self, "argodata"): + print( + "The data requested by running this line of code\n", + "will be added to previously downloaded data.", + ) + + # if download is called with replaced parameters or presRange + if not params is None: + self.params = params + + if not presRange is None: + self.presRange = presRange + + # Add qc data for each of the parameters requested + if self.params == ["all"]: + pass + else: + for p in self.params: + if p.endswith("_argoqc") or (p + "_argoqc" in self.params): + pass + else: + self.params.append(p + "_argoqc") + + # intentionally resubmit search to reset prof_ids, in case the user requested different parameters + self.search_data() + + # create a dataframe for each profile and merge it with the rest of the profiles from this set of parameters being downloaded + merged_df = pd.DataFrame(columns=["profile_id"]) + for i in self.prof_ids: + print("processing profile", i) + try: + profile_data = self._download_profile(i) + profile_df = self._parse_into_df(profile_data[0]) + merged_df = pd.concat([merged_df, profile_df], sort=False) + except: + print("\tError processing profile {0}. Skipping.".format(i)) + + # now that we have a df from this round of downloads, we can add it to any existing dataframe + # note that if a given column has previously been added, update needs to be used to replace nans (merge will not replace the nan values) + if not self.argodata is None: + self.argodata = self.argodata.merge(merged_df, how="outer") + else: + self.argodata = merged_df + + self.argodata.reset_index(inplace=True, drop=True) + + return self.argodata + + def save(self, filepath): + """ + Saves the argo dataframe to a csv at the specified location + + Parameters + ---------- + filepath : str + String containing complete filepath and name of file + Any extension will be removed and replaced with csv. + Also appends '_argo.csv' to filename + e.g. /path/to/file/my_data(_argo.csv) + """ + + # create the directory if it doesn't exist + path, file = os.path.split(filepath) + if not os.path.exists(path): + os.mkdir(path) + + # remove any file extension + base, ext = os.path.splitext(filepath) + + self.argodata.to_csv(base + "_argo.csv") diff --git a/icepyx/quest/dataset_scripts/dataset.py b/icepyx/quest/dataset_scripts/dataset.py index e76081e08..193fab22e 100644 --- a/icepyx/quest/dataset_scripts/dataset.py +++ b/icepyx/quest/dataset_scripts/dataset.py @@ -11,9 +11,7 @@ class DataSet: All sub-classes must support the following methods for use via the QUEST class. """ - def __init__( - self, spatial_extent=None, date_range=None, start_time=None, end_time=None - ): + def __init__(self, spatial_extent, date_range, start_time=None, end_time=None): """ Complete any dataset specific initializations (i.e. beyond space and time) required here. For instance, ICESat-2 requires a product, and Argo requires parameters. @@ -70,6 +68,12 @@ def download(self): """ raise NotImplementedError + def save(self, filepath): + """ + Save the downloaded data to a directory on your local machine. + """ + raise NotImplementedError + # ---------------------------------------------------------------------- # Working with Data diff --git a/icepyx/quest/quest.py b/icepyx/quest/quest.py index fe3039a39..966b19dca 100644 --- a/icepyx/quest/quest.py +++ b/icepyx/quest/quest.py @@ -2,10 +2,9 @@ from icepyx.core.query import GenQuery, Query -# from icepyx.quest.dataset_scripts.argo import Argo +from icepyx.quest.dataset_scripts.argo import Argo -# todo: implement the subclass inheritance class Quest(GenQuery): """ QUEST - Query Unify Explore SpatioTemporal - object to query, obtain, and perform basic @@ -15,7 +14,6 @@ class Quest(GenQuery): See the doc page for GenQuery for details on temporal and spatial input parameters. - Parameters ---------- proj : proj4 string @@ -55,8 +53,8 @@ class Quest(GenQuery): def __init__( self, - spatial_extent=None, - date_range=None, + spatial_extent, + date_range, start_time=None, end_time=None, proj="default", @@ -64,6 +62,7 @@ def __init__( """ Tells QUEST to initialize data given the user input spatiotemporal data. """ + super().__init__(spatial_extent, date_range, start_time, end_time) self.datasets = {} @@ -86,7 +85,7 @@ def __str__(self): def add_icesat2( self, - product=None, + product, start_time=None, end_time=None, version=None, @@ -100,7 +99,6 @@ def add_icesat2( Parameters ---------- - For details on inputs, see the Query documentation. Returns @@ -128,10 +126,32 @@ def add_icesat2( self.datasets["icesat2"] = query - # def add_argo(self, params=["temperature"], presRange=None): + def add_argo(self, params=["temperature"], presRange=None) -> None: + """ + Adds Argo (including Argo-BGC) to QUEST structure. + + Parameters + ---------- + For details on inputs, see the Argo dataset script documentation. + + Returns + ------- + None + + See Also + -------- + quest.dataset_scripts.argo + icepyx.query.GenQuery + + Examples + -------- + # example with profiles available + >>> reg_a = Quest([-154, 30,-143, 37], ['2022-04-12', '2022-04-26']) + >>> reg_a.add_argo(params=["temperature", "salinity"]) + """ - # argo = Argo(self._spatial, self._temporal, params, presRange) - # self.datasets["argo"] = argo + argo = Argo(self._spatial, self._temporal, params, presRange) + self.datasets["argo"] = argo # ---------------------------------------------------------------------- # Methods (on all datasets) @@ -144,11 +164,11 @@ def search_all(self, **kwargs): Parameters ---------- **kwargs : default None - Optional passing of keyword arguments to supply additional search constraints per datasets. - Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), - and the value is a dictionary of acceptable keyword arguments - and values allowable for the `search_data()` function for that dataset. - For instance: `icesat2 = {"IDs":True}, argo = {"presRange":"10,500"}`. + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"IDs":True}, argo = {"presRange":"10,500"}`. """ print("\nSearching all datasets...") @@ -168,6 +188,7 @@ def search_all(self, **kwargs): v.search_data(kwargs[k]) except KeyError: v.search_data() + except: dataset_name = type(v).__name__ print("Error querying data from {0}".format(dataset_name)) @@ -180,18 +201,19 @@ def download_all(self, path="", **kwargs): Parameters ---------- **kwargs : default None - Optional passing of keyword arguments to supply additional search constraints per datasets. - Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), - and the value is a dictionary of acceptable keyword arguments - and values allowable for the `search_data()` function for that dataset. - For instance: `icesat2 = {"verbose":True}, argo = {"keep_existing":True}`. + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"verbose":True}, argo = {"keep_existing":True}`. """ + print("\nDownloading all datasets...") for k, v in self.datasets.items(): print() - try: + try: if isinstance(v, Query): print("---ICESat-2---") try: @@ -208,4 +230,22 @@ def download_all(self, path="", **kwargs): print(msg) except: dataset_name = type(v).__name__ - print("Error downloading data from {0}".format(dataset_name)) + print("Error downloading data from {0}".format(dataset_name)) + + def save_all(self, path): + """ + Saves all datasets according to their respective `.save()` functionality. + + Parameters + ---------- + path : str + Path at which to save the dataset files. + + """ + + for k, v in self.datasets.items(): + if isinstance(v, Query): + print("ICESat-2 granules are saved during download") + else: + print("Saving " + k) + v.save(path) diff --git a/icepyx/tests/test_quest.py b/icepyx/tests/test_quest.py index f50b1bea2..0ba7325a6 100644 --- a/icepyx/tests/test_quest.py +++ b/icepyx/tests/test_quest.py @@ -15,6 +15,7 @@ def quest_instance(scope="module", autouse=True): ########## PER-DATASET ADDITION TESTS ########## + # Paramaterize these add_dataset tests once more datasets are added def test_add_is2(quest_instance): # Add ATL06 as a test to QUEST @@ -32,44 +33,39 @@ def test_add_is2(quest_instance): assert quest_instance.datasets[exp_key].product == prod -# def test_add_argo(quest_instance): -# params = ["down_irradiance412", "temperature"] -# quest_instance.add_argo(params=params) -# exp_key = "argo" -# exp_type = ipx.quest.dataset_scripts.argo.Argo - -# obs = quest_instance.datasets - -# assert type(obs) == dict -# assert exp_key in obs.keys() -# assert type(obs[exp_key]) == exp_type -# assert quest_instance.datasets[exp_key].params == params - -# def test_add_multiple_datasets(): -# bounding_box = [-150, 30, -120, 60] -# date_range = ["2022-06-07", "2022-06-14"] -# my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) -# -# # print(my_quest.spatial) -# # print(my_quest.temporal) -# -# # my_quest.add_argo(params=["down_irradiance412", "temperature"]) -# # print(my_quest.datasets["argo"].params) -# -# my_quest.add_icesat2(product="ATL06") -# # print(my_quest.datasets["icesat2"].product) -# -# print(my_quest) -# -# # my_quest.search_all() -# # -# # # this one still needs work for IS2 because of auth... -# # my_quest.download_all() +def test_add_argo(quest_instance): + params = ["down_irradiance412", "temperature"] + quest_instance.add_argo(params=params) + exp_key = "argo" + exp_type = ipx.quest.dataset_scripts.argo.Argo + + obs = quest_instance.datasets + + assert type(obs) == dict + assert exp_key in obs.keys() + assert type(obs[exp_key]) == exp_type + assert set(quest_instance.datasets[exp_key].params) == set(params) + + +def test_add_multiple_datasets(quest_instance): + quest_instance.add_argo(params=["down_irradiance412", "temperature"]) + # print(quest_instance.datasets["argo"].params) + + quest_instance.add_icesat2(product="ATL06") + # print(quest_instance.datasets["icesat2"].product) + + exp_keys = ["argo", "icesat2"] + assert set(exp_keys) == set(quest_instance.datasets.keys()) + ########## ALL DATASET METHODS TESTS ########## + # each of the query functions should be tested in their respective modules def test_search_all(quest_instance): + quest_instance.add_argo(params=["down_irradiance412", "temperature"]) + quest_instance.add_icesat2(product="ATL06") + # Search and test all datasets quest_instance.search_all() @@ -78,8 +74,8 @@ def test_search_all(quest_instance): "kwargs", [ {"icesat2": {"IDs": True}}, - # {"argo":{"presRange":"10,500"}}, - # {"icesat2":{"IDs":True}, "argo":{"presRange":"10,500"}} + {"argo": {"presRange": "10,500"}}, + {"icesat2": {"IDs": True}, "argo": {"presRange": "10,500"}}, ], ) def test_search_all_kwargs(quest_instance, kwargs): @@ -88,15 +84,19 @@ def test_search_all_kwargs(quest_instance, kwargs): # TESTS NOT IMPLEMENTED # def test_download_all(): -# # this will require auth in some cases... -# pass +# quest_instance.add_argo(params=["down_irradiance412", "temperature"]) +# quest_instance.add_icesat2(product="ATL06") + +# # this will require auth in some cases... +# quest_instance.download_all() + # @pytest.mark.parametrize( # "kwargs", # [ # {"icesat2": {"verbose":True}}, -# # {"argo":{"keep_existing":True}, -# # {"icesat2":{"verbose":True}, "argo":{"keep_existing":True} +# {"argo":{"keep_existing":True}, +# {"icesat2":{"verbose":True}, "argo":{"keep_existing":True} # ], # ) # def test_download_all_kwargs(quest_instance, kwargs): diff --git a/icepyx/tests/test_quest_argo.py b/icepyx/tests/test_quest_argo.py new file mode 100644 index 000000000..a6940fe7b --- /dev/null +++ b/icepyx/tests/test_quest_argo.py @@ -0,0 +1,247 @@ +import os + +import pytest +import re + +from icepyx.quest.quest import Quest + + +# create an Argo instance via quest (Argo is a submodule) +@pytest.fixture(scope="function") +def argo_quest_instance(): + def _argo_quest_instance(bounding_box, date_range): # aka "factories as fixtures" + my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) + my_quest.add_argo() + my_argo = my_quest.datasets["argo"] + + return my_argo + + return _argo_quest_instance + + +# --------------------------------------------------- +# Test Formatting and Validation + + +def test_fmt_coordinates(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs = reg_a._fmt_coordinates() + + exp = "[[-143.0,30.0],[-143.0,37.0],[-154.0,37.0],[-154.0,30.0],[-143.0,30.0]]" + + assert obs == exp + + +def test_validate_parameters(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + invalid_params = ["temp", "temperature_files"] + + ermsg = re.escape( + "Parameter '{0}' is not valid. Valid parameters are {1}".format( + "temp", reg_a._valid_params() + ) + ) + + with pytest.raises(AssertionError, match=ermsg): + reg_a._validate_parameters(invalid_params) + + +# --------------------------------------------------- +# Test Setters + + +def test_param_setter(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = ["temperature"] + assert reg_a.params == exp + + reg_a.params = ["temperature", "salinity"] + + exp = list(set(["temperature", "salinity"])) + assert reg_a.params == exp + + +def test_param_setter_invalid_inputs(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = ["temperature"] + assert reg_a.params == exp + + ermsg = re.escape( + "Parameter '{0}' is not valid. Valid parameters are {1}".format( + "temp", reg_a._valid_params() + ) + ) + + with pytest.raises(AssertionError, match=ermsg): + reg_a.params = ["temp", "salinity"] + + +def test_presRange_setter(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = None + assert reg_a.presRange == exp + + reg_a.presRange = "0.5,150" + + exp = "0.5,150" + assert reg_a.presRange == exp + + +def test_presRange_setter_invalid_inputs(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = None + assert reg_a.presRange == exp + + reg_a.presRange = ( + "0.5, sam" # it looks like the API will take a string with a space + ) + + # this setter doesn't currently have a validation check, so would need to search + obs_msg = reg_a.search_data() + + exp_msg = "Error: Unexpected response " + + assert obs_msg == exp_msg + + +# --------------------------------------------------- +# Test search_data + + +def test_search_data_available_profiles(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs_msg = reg_a.search_data() + + exp_msg = "19 valid profiles have been identified" + + assert obs_msg == exp_msg + + +def test_search_data_no_available_profiles(argo_quest_instance): + reg_a = argo_quest_instance([-55, 68, -48, 71], ["2019-02-20", "2019-02-28"]) + obs = reg_a.search_data() + + exp = ( + "Warning: Query returned no profiles\n" "Please try different search parameters" + ) + + assert obs == exp + + +# --------------------------------------------------- +# Test download and df + + +def test_download_parse_into_df(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download() # note: pressure is returned by default + + obs_cols = reg_a.argodata.columns + + exp_cols = [ + "temperature", + "temperature_argoqc", + "pressure", + "profile_id", + "lat", + "lon", + "date", + ] + + assert set(exp_cols) == set(obs_cols) + + assert len(reg_a.argodata) == 2948 + + +# approach for additional testing of df functions: create json files with profiles and store them in test suite +# then use those for the comparison (e.g. number of rows in df and json match) + + +def test_save_df_to_csv(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download() # note: pressure is returned by default + + path = os.getcwd() + "test_file" + reg_a.save(path) + + assert os.path.exists(path + "_argo.csv") + os.remove(path + "_argo.csv") + + +def test_merge_df(argo_quest_instance): + reg_a = argo_quest_instance([-150, 30, -120, 60], ["2022-06-07", "2022-06-14"]) + param_list = ["salinity", "temperature", "down_irradiance412"] + + df = reg_a.download(params=param_list) + + assert "down_irradiance412" in df.columns + assert "down_irradiance412_argoqc" in df.columns + + df = reg_a.download(["doxy"], keep_existing=True) + assert "doxy" in df.columns + assert "doxy_argoqc" in df.columns + assert "down_irradiance412" in df.columns + assert "down_irradiance412_argoqc" in df.columns + + +# --------------------------------------------------- +# Test kwargs to replace params and presRange in search and download + + +def test_replace_param_search(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + obs = reg_a.search_data(params=["doxy"]) + + exp = ( + "Warning: Query returned no profiles\n" "Please try different search parameters" + ) + + assert obs == exp + + +def test_replace_param_download(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download(params=["salinity"]) # note: pressure is returned by default + + obs_cols = reg_a.argodata.columns + + exp_cols = [ + "salinity", + "salinity_argoqc", + "pressure", + "profile_id", + "lat", + "lon", + "date", + ] + + assert set(exp_cols) == set(obs_cols) + + assert len(reg_a.argodata) == 1942 + + +def test_replace_presRange_search(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs_msg = reg_a.search_data(presRange="100,600") + + exp_msg = "19 valid profiles have been identified" + + assert obs_msg == exp_msg + + +def test_replace_presRange_download(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + df = reg_a.download(params=["salinity"], presRange="0.2,180") + + assert df["pressure"].min() >= 0.2 + assert df["pressure"].max() <= 180 + assert "salinity" in df.columns + + +# second pres range test where does have a higher max pressure because only the new data was presRange limited? From c7656c8d487965dd3e0e6d96203d2af7e004b820 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 13 Dec 2023 17:37:36 -0500 Subject: [PATCH 21/21] format all code files using black (#476) --- icepyx/core/APIformatting.py | 1 - icepyx/core/auth.py | 71 +++++++++++++++++------------- icepyx/core/exceptions.py | 3 +- icepyx/core/icesat2data.py | 5 ++- icepyx/core/query.py | 7 ++- icepyx/core/spatial.py | 3 -- icepyx/core/temporal.py | 8 ---- icepyx/core/validate_inputs.py | 10 +++-- icepyx/core/variables.py | 48 ++++++++++---------- icepyx/core/visualization.py | 4 -- icepyx/tests/conftest.py | 1 + icepyx/tests/test_APIformatting.py | 1 + icepyx/tests/test_Earthdata.py | 2 +- icepyx/tests/test_auth.py | 15 ++++--- icepyx/tests/test_query.py | 1 + icepyx/tests/test_read.py | 2 - icepyx/tests/test_spatial.py | 2 - icepyx/tests/test_temporal.py | 1 + icepyx/tests/test_visualization.py | 1 - 19 files changed, 93 insertions(+), 93 deletions(-) diff --git a/icepyx/core/APIformatting.py b/icepyx/core/APIformatting.py index 55d49f84c..b5d31bdfa 100644 --- a/icepyx/core/APIformatting.py +++ b/icepyx/core/APIformatting.py @@ -205,7 +205,6 @@ class Parameters: """ def __init__(self, partype, values=None, reqtype=None): - assert partype in [ "CMR", "required", diff --git a/icepyx/core/auth.py b/icepyx/core/auth.py index 7c36126f9..cf771f420 100644 --- a/icepyx/core/auth.py +++ b/icepyx/core/auth.py @@ -4,14 +4,16 @@ import earthaccess + class AuthenticationError(Exception): - ''' + """ Raised when an error is encountered while authenticating Earthdata credentials - ''' + """ + pass -class EarthdataAuthMixin(): +class EarthdataAuthMixin: """ This mixin class generates the needed authentication sessions and tokens, including for NASA Earthdata cloud access. Authentication is completed using the [earthaccess library](https://nsidc.github.io/earthaccess/). @@ -21,26 +23,27 @@ class EarthdataAuthMixin(): 3. Storing credentials in a .netrc file (not recommended for security reasons) More details on using these methods is available in the [earthaccess documentation](https://nsidc.github.io/earthaccess/tutorials/restricted-datasets/#auth). - This class can be inherited by any other class that requires authentication. For - example, the `Query` class inherits this one, and so a Query object has the + This class can be inherited by any other class that requires authentication. For + example, the `Query` class inherits this one, and so a Query object has the `.session` property. The method `earthdata_login()` is included for backwards compatibility. - + The class can be created without any initialization parameters, and the properties will - be populated when they are called. It can alternately be initialized with an - earthaccess.auth.Auth object, which will then be used to create a session or + be populated when they are called. It can alternately be initialized with an + earthaccess.auth.Auth object, which will then be used to create a session or s3login_credentials as they are called. - + Parameters ---------- auth : earthaccess.auth.Auth, default None Optional parameter to initialize an object with existing credentials. - + Examples -------- >>> a = EarthdataAuthMixin() >>> a.session # doctest: +SKIP >>> a.s3login_credentials # doctest: +SKIP """ + def __init__(self, auth=None): self._auth = copy.deepcopy(auth) # initializatin of session and s3 creds is not allowed because those are generated @@ -58,25 +61,27 @@ def __str__(self): @property def auth(self): - ''' - Authentication object returned from earthaccess.login() which stores user authentication. - ''' + """ + Authentication object returned from earthaccess.login() which stores user authentication. + """ # Only login the first time .auth is accessed if self._auth is None: auth = earthaccess.login() # check for a valid auth response if auth.authenticated is False: - raise AuthenticationError('Earthdata authentication failed. Check output for error message') + raise AuthenticationError( + "Earthdata authentication failed. Check output for error message" + ) else: self._auth = auth - + return self._auth @property def session(self): - ''' + """ Earthaccess session object for connecting to Earthdata resources. - ''' + """ # Only generate a session the first time .session is accessed if self._session is None: self._session = self.auth.get_session() @@ -84,24 +89,26 @@ def session(self): @property def s3login_credentials(self): - ''' + """ A dictionary which stores login credentials for AWS s3 access. This property is accessed if using AWS cloud data. - + Because s3 tokens are only good for one hour, this function will automatically check if an hour has elapsed since the last token use and generate a new token if necessary. - ''' - + """ + def set_s3_creds(): - ''' Store s3login creds from `auth`and reset the last updated timestamp''' + """Store s3login creds from `auth`and reset the last updated timestamp""" self._s3login_credentials = self.auth.get_s3_credentials(daac="NSIDC") self._s3_initial_ts = datetime.datetime.now() - + # Only generate s3login_credentials the first time credentials are accessed, or if an hour - # has passed since the last login + # has passed since the last login if self._s3login_credentials is None: set_s3_creds() - elif (datetime.datetime.now() - self._s3_initial_ts) >= datetime.timedelta(hours=1): + elif (datetime.datetime.now() - self._s3_initial_ts) >= datetime.timedelta( + hours=1 + ): set_s3_creds() return self._s3login_credentials @@ -109,7 +116,7 @@ def earthdata_login(self, uid=None, email=None, s3token=None, **kwargs) -> None: """ Authenticate with NASA Earthdata to enable data ordering and download. Credential storage details are described in the EathdataAuthMixin class section. - + **Note:** This method is maintained for backward compatibility. It is no longer required to explicitly run `.earthdata_login()`. Authentication will be performed by the module as needed when `.session` or `.s3login_credentials` are accessed. Parameters @@ -134,12 +141,14 @@ def earthdata_login(self, uid=None, email=None, s3token=None, **kwargs) -> None: """ warnings.warn( - "It is no longer required to explicitly run the `.earthdata_login()` method. Authentication will be performed by the module as needed.", - DeprecationWarning, stacklevel=2 - ) - + "It is no longer required to explicitly run the `.earthdata_login()` method. Authentication will be performed by the module as needed.", + DeprecationWarning, + stacklevel=2, + ) + if uid != None or email != None or s3token != None: warnings.warn( "The user id (uid) and/or email keyword arguments are no longer required.", - DeprecationWarning, stacklevel=2 + DeprecationWarning, + stacklevel=2, ) diff --git a/icepyx/core/exceptions.py b/icepyx/core/exceptions.py index a36a1b645..d20bbfe61 100644 --- a/icepyx/core/exceptions.py +++ b/icepyx/core/exceptions.py @@ -2,6 +2,7 @@ class DeprecationError(Exception): """ Class raised for use of functionality that is no longer supported by icepyx. """ + pass @@ -27,5 +28,3 @@ def __init__( def __str__(self): return f"{self.msgtxt}: {self.errmsg}" - - diff --git a/icepyx/core/icesat2data.py b/icepyx/core/icesat2data.py index cebce4160..aa35fd433 100644 --- a/icepyx/core/icesat2data.py +++ b/icepyx/core/icesat2data.py @@ -2,8 +2,9 @@ class Icesat2Data: - def __init__(self,): - + def __init__( + self, + ): warnings.filterwarnings("always") warnings.warn( "DEPRECATED. Please use icepyx.Query to create a download data object (all other functionality is the same)", diff --git a/icepyx/core/query.py b/icepyx/core/query.py index 4ffe4c241..d857bbb3d 100644 --- a/icepyx/core/query.py +++ b/icepyx/core/query.py @@ -351,9 +351,9 @@ class Query(GenQuery, EarthdataAuthMixin): files : string, default None A placeholder for future development. Not used for any purposes yet. auth : earthaccess.auth.Auth, default None - An earthaccess authentication object. Available as an argument so an existing - earthaccess.auth.Auth object can be used for authentication. If not given, a new auth - object will be created whenever authentication is needed. + An earthaccess authentication object. Available as an argument so an existing + earthaccess.auth.Auth object can be used for authentication. If not given, a new auth + object will be created whenever authentication is needed. Returns ------- @@ -411,7 +411,6 @@ def __init__( auth=None, **kwargs, ): - # Check necessary combination of input has been specified if ( (product is None or spatial_extent is None) diff --git a/icepyx/core/spatial.py b/icepyx/core/spatial.py index 7702acdf2..c34e928ed 100644 --- a/icepyx/core/spatial.py +++ b/icepyx/core/spatial.py @@ -80,7 +80,6 @@ def geodataframe(extent_type, spatial_extent, file=False, xdateline=None): # DevGoal: the crs setting and management needs to be improved elif extent_type == "polygon" and file == False: - # if spatial_extent is already a Polygon if isinstance(spatial_extent, Polygon): spatial_extent_geom = spatial_extent @@ -248,7 +247,6 @@ def validate_polygon_pairs(spatial_extent): if (spatial_extent[0][0] != spatial_extent[-1][0]) or ( spatial_extent[0][1] != spatial_extent[-1][1] ): - # Throw a warning warnings.warn( "WARNING: Polygon's first and last point's coordinates differ," @@ -436,7 +434,6 @@ def __init__(self, spatial_extent, **kwarg): # Check if spatial_extent is a list of coordinates (bounding box or polygon) if isinstance(spatial_extent, (list, np.ndarray)): - # bounding box if len(spatial_extent) == 4 and all( isinstance(i, scalar_types) for i in spatial_extent diff --git a/icepyx/core/temporal.py b/icepyx/core/temporal.py index c7e2dda1c..67f59882a 100644 --- a/icepyx/core/temporal.py +++ b/icepyx/core/temporal.py @@ -51,7 +51,6 @@ def convert_string_to_date(date): def check_valid_date_range(start, end): - """ Helper function for checking if a date range is valid. @@ -89,7 +88,6 @@ def check_valid_date_range(start, end): def validate_times(start_time, end_time): - """ Validates the start and end times passed into __init__ and returns them as datetime.time objects. @@ -145,7 +143,6 @@ def validate_times(start_time, end_time): def validate_date_range_datestr(date_range, start_time=None, end_time=None): - """ Validates a date range provided in the form of a list of strings. @@ -190,7 +187,6 @@ def validate_date_range_datestr(date_range, start_time=None, end_time=None): def validate_date_range_datetime(date_range, start_time=None, end_time=None): - """ Validates a date range provided in the form of a list of datetimes. @@ -230,7 +226,6 @@ def validate_date_range_datetime(date_range, start_time=None, end_time=None): def validate_date_range_date(date_range, start_time=None, end_time=None): - """ Validates a date range provided in the form of a list of datetime.date objects. @@ -268,7 +263,6 @@ def validate_date_range_date(date_range, start_time=None, end_time=None): def validate_date_range_dict(date_range, start_time=None, end_time=None): - """ Validates a date range provided in the form of a dict with the following keys: @@ -330,7 +324,6 @@ def validate_date_range_dict(date_range, start_time=None, end_time=None): # if is string date elif isinstance(_start_date, str): - _start_date = convert_string_to_date(_start_date) _start_date = dt.datetime.combine(_start_date, start_time) @@ -411,7 +404,6 @@ def __init__(self, date_range, start_time=None, end_time=None): """ if len(date_range) == 2: - # date range is provided as dict of strings, dates, or datetimes if isinstance(date_range, dict): self._start, self._end = validate_date_range_dict( diff --git a/icepyx/core/validate_inputs.py b/icepyx/core/validate_inputs.py index d74768eea..a69f045fb 100644 --- a/icepyx/core/validate_inputs.py +++ b/icepyx/core/validate_inputs.py @@ -105,15 +105,17 @@ def tracks(track): return track_list + def check_s3bucket(path): """ Check if the given path is an s3 path. Raise a warning if the data being referenced is not in the NSIDC bucket """ - split_path = path.split('/') - if split_path[0] == 's3:' and split_path[2] != 'nsidc-cumulus-prod-protected': + split_path = path.split("/") + if split_path[0] == "s3:" and split_path[2] != "nsidc-cumulus-prod-protected": warnings.warn( - 's3 data being read from outside the NSIDC data bucket. Icepyx can ' - 'read this data, but available data lists may not be accurate.', stacklevel=2 + "s3 data being read from outside the NSIDC data bucket. Icepyx can " + "read this data, but available data lists may not be accurate.", + stacklevel=2, ) return path diff --git a/icepyx/core/variables.py b/icepyx/core/variables.py index 4c52003df..4dd5444fe 100644 --- a/icepyx/core/variables.py +++ b/icepyx/core/variables.py @@ -29,7 +29,7 @@ class Variables(EarthdataAuthMixin): contained in ICESat-2 products. Parameters - ---------- + ---------- vartype : string This argument is deprecated. The vartype will be inferred from data_source. One of ['order', 'file'] to indicate the source of the input variables. @@ -49,9 +49,9 @@ class Variables(EarthdataAuthMixin): wanted : dictionary, default None As avail, but for the desired list of variables auth : earthaccess.auth.Auth, default None - An earthaccess authentication object. Available as an argument so an existing - earthaccess.auth.Auth object can be used for authentication. If not given, a new auth - object will be created whenever authentication is needed. + An earthaccess authentication object. Available as an argument so an existing + earthaccess.auth.Auth object can be used for authentication. If not given, a new auth + object will be created whenever authentication is needed. """ def __init__( @@ -65,28 +65,28 @@ def __init__( auth=None, ): # Deprecation error - if vartype in ['order', 'file']: + if vartype in ["order", "file"]: raise DeprecationError( - 'It is no longer required to specify the variable type `vartype`. Instead please ', - 'provide either the path to a local file (arg: `path`) or the product you would ', - 'like variables for (arg: `product`).' + "It is no longer required to specify the variable type `vartype`. Instead please ", + "provide either the path to a local file (arg: `path`) or the product you would ", + "like variables for (arg: `product`).", ) - + if path and product: raise TypeError( - 'Please provide either a path or a product. If a path is provided ', - 'variables will be read from the file. If a product is provided all available ', - 'variables for that product will be returned.' + "Please provide either a path or a product. If a path is provided ", + "variables will be read from the file. If a product is provided all available ", + "variables for that product will be returned.", ) # initialize authentication properties EarthdataAuthMixin.__init__(self, auth=auth) - + # Set the product and version from either the input args or the file if path: self._path = val.check_s3bucket(path) # Set up auth - if self._path.startswith('s3'): + if self._path.startswith("s3"): auth = self.auth else: auth = None @@ -98,15 +98,19 @@ def __init__( self._product = is2ref._validate_product(product) # Check for valid version string # If version is not specified by the user assume the most recent version - self._version = val.prod_version(is2ref.latest_version(self._product), version) + self._version = val.prod_version( + is2ref.latest_version(self._product), version + ) else: - raise TypeError('Either a path or a product need to be given as input arguments.') - + raise TypeError( + "Either a path or a product need to be given as input arguments." + ) + self._avail = avail self.wanted = wanted # DevGoal: put some more/robust checks here to assess validity of inputs - + @property def path(self): if self._path: @@ -114,15 +118,14 @@ def path(self): else: path = None return path - + @property def product(self): return self._product - + @property def version(self): return self._version - def avail(self, options=False, internal=False): """ @@ -143,7 +146,7 @@ def avail(self, options=False, internal=False): """ if not hasattr(self, "_avail") or self._avail == None: - if not hasattr(self, 'path') or self.path.startswith('s3'): + if not hasattr(self, "path") or self.path.startswith("s3"): self._avail = is2ref._get_custom_options( self.session, self.product, self.version )["variables"] @@ -628,7 +631,6 @@ def remove(self, all=False, var_list=None, beam_list=None, keyword_list=None): for bkw in beam_list: if bkw in vpath_kws: for kw in keyword_list: - if kw in vpath_kws: self.wanted[vkey].remove(vpath) except TypeError: diff --git a/icepyx/core/visualization.py b/icepyx/core/visualization.py index 32c81e3e7..001ae178e 100644 --- a/icepyx/core/visualization.py +++ b/icepyx/core/visualization.py @@ -142,7 +142,6 @@ def __init__( cycles=None, tracks=None, ): - if query_obj: pass else: @@ -241,7 +240,6 @@ def query_icesat2_filelist(self) -> tuple: is2_file_list = [] for bbox_i in bbox_list: - try: region = ipx.Query( self.product, @@ -364,7 +362,6 @@ def request_OA_data(self, paras) -> da.array: # get data we need (with the correct date) try: - df_series = df.query(expr="date == @Date").iloc[0] beam_data = df_series.beams @@ -483,7 +480,6 @@ def viz_elevation(self) -> (hv.DynamicMap, hv.Layout): return (None,) * 2 else: - cols = ( ["lat", "lon", "elevation", "canopy", "rgt", "cycle"] if self.product == "ATL08" diff --git a/icepyx/tests/conftest.py b/icepyx/tests/conftest.py index fca31847a..9ce8e4081 100644 --- a/icepyx/tests/conftest.py +++ b/icepyx/tests/conftest.py @@ -2,6 +2,7 @@ import pytest from unittest import mock + # PURPOSE: mock environmental variables @pytest.fixture(scope="session", autouse=True) def mock_settings_env_vars(): diff --git a/icepyx/tests/test_APIformatting.py b/icepyx/tests/test_APIformatting.py index 83e88a131..213c1cf8a 100644 --- a/icepyx/tests/test_APIformatting.py +++ b/icepyx/tests/test_APIformatting.py @@ -11,6 +11,7 @@ # CMR temporal and spatial formats --> what's the best way to compare formatted text? character by character comparison of strings? + ########## _fmt_temporal ########## def test_time_fmt(): obs = apifmt._fmt_temporal( diff --git a/icepyx/tests/test_Earthdata.py b/icepyx/tests/test_Earthdata.py index 8ad883e6a..60b92f621 100644 --- a/icepyx/tests/test_Earthdata.py +++ b/icepyx/tests/test_Earthdata.py @@ -8,6 +8,7 @@ import shutil import warnings + # PURPOSE: test different authentication methods @pytest.fixture(scope="module", autouse=True) def setup_earthdata(): @@ -65,7 +66,6 @@ def earthdata_login(uid=None, pwd=None, email=None, s3token=False) -> bool: url = "urs.earthdata.nasa.gov" mock_uid, _, mock_pwd = netrc.netrc(netrc).authenticators(url) except: - mock_uid = os.environ.get("EARTHDATA_USERNAME") mock_pwd = os.environ.get("EARTHDATA_PASSWORD") diff --git a/icepyx/tests/test_auth.py b/icepyx/tests/test_auth.py index 6ac77c864..8507b1e40 100644 --- a/icepyx/tests/test_auth.py +++ b/icepyx/tests/test_auth.py @@ -8,30 +8,35 @@ @pytest.fixture() def auth_instance(): - ''' + """ An EarthdatAuthMixin object for each of the tests. Default scope is function level, so a new instance should be created for each of the tests. - ''' + """ return EarthdataAuthMixin() + # Test that .session creates a session def test_get_session(auth_instance): assert isinstance(auth_instance.session, requests.sessions.Session) + # Test that .s3login_credentials creates a dict with the correct keys def test_get_s3login_credentials(auth_instance): assert isinstance(auth_instance.s3login_credentials, dict) - expected_keys = set(['accessKeyId', 'secretAccessKey', 'sessionToken', - 'expiration']) + expected_keys = set( + ["accessKeyId", "secretAccessKey", "sessionToken", "expiration"] + ) assert set(auth_instance.s3login_credentials.keys()) == expected_keys + # Test that earthdata_login generates an auth object def test_login_function(auth_instance): auth_instance.earthdata_login() assert isinstance(auth_instance.auth, earthaccess.auth.Auth) assert auth_instance.auth.authenticated + # Test that earthdata_login raises a warning if email is provided def test_depreciation_warning(auth_instance): with pytest.warns(DeprecationWarning): - auth_instance.earthdata_login(email='me@gmail.com') + auth_instance.earthdata_login(email="me@gmail.com") diff --git a/icepyx/tests/test_query.py b/icepyx/tests/test_query.py index 7738c424a..15eebfcbd 100644 --- a/icepyx/tests/test_query.py +++ b/icepyx/tests/test_query.py @@ -9,6 +9,7 @@ # seem to be adequately covered in docstrings; # may want to focus on testing specific queries + # ------------------------------------ # icepyx-specific tests # ------------------------------------ diff --git a/icepyx/tests/test_read.py b/icepyx/tests/test_read.py index 018435968..d6727607e 100644 --- a/icepyx/tests/test_read.py +++ b/icepyx/tests/test_read.py @@ -21,7 +21,6 @@ def test_check_datasource_type(): ], ) def test_check_datasource(filepath, expect): - source_type = read._check_datasource(filepath) assert source_type == expect @@ -90,7 +89,6 @@ def test_validate_source_str_not_a_dir_or_file(): ], ) def test_check_run_fast_scandir(dir, fn_glob, expect): - (subfolders, files) = read._run_fast_scandir(dir, fn_glob) assert (sorted(subfolders), sorted(files)) == expect diff --git a/icepyx/tests/test_spatial.py b/icepyx/tests/test_spatial.py index 2666d857d..4d6369d9e 100644 --- a/icepyx/tests/test_spatial.py +++ b/icepyx/tests/test_spatial.py @@ -351,7 +351,6 @@ def test_poly_list_auto_close(): def test_poly_file_simple_one_poly(): - poly_from_file = spat.Spatial( str( Path( @@ -391,7 +390,6 @@ def test_bad_poly_inputfile_type_throws_error(): def test_gdf_from_one_bbox(): - obs = spat.geodataframe("bounding_box", [-55, 68, -48, 71]) geom = [Polygon(list(zip([-55, -55, -48, -48, -55], [68, 71, 71, 68, 68])))] exp = gpd.GeoDataFrame(geometry=geom) diff --git a/icepyx/tests/test_temporal.py b/icepyx/tests/test_temporal.py index 83926946e..c93b30a38 100644 --- a/icepyx/tests/test_temporal.py +++ b/icepyx/tests/test_temporal.py @@ -235,6 +235,7 @@ def test_range_str_yyyydoy_dict_time_start_end(): # Date Range Errors + # (The following inputs are bad, testing to ensure the temporal class handles this elegantly) def test_bad_start_time_type(): with pytest.raises(AssertionError): diff --git a/icepyx/tests/test_visualization.py b/icepyx/tests/test_visualization.py index 0a1f2fa43..dfd41116f 100644 --- a/icepyx/tests/test_visualization.py +++ b/icepyx/tests/test_visualization.py @@ -62,7 +62,6 @@ def test_files_in_latest_cycles(n, exp): ], ) def test_gran_paras(filename, expect): - para_list = vis.gran_paras(filename) assert para_list == expect