diff --git a/HARMONY_MIGRATION_NOTES.md b/HARMONY_MIGRATION_NOTES.md new file mode 100644 index 000000000..3519cd93f --- /dev/null +++ b/HARMONY_MIGRATION_NOTES.md @@ -0,0 +1,143 @@ +## Assumptions that are different in Harmony + +* We can't use short name and version with Harmony like we do with ECS, we have to use + Concept ID (or DOI). We need to get this from CMR using short name and version. +* Variable subsetting won't be supported on day 1. +* All the ICESat-2 products we currently support will not be supported on day 1. + * +* ECS and CMR shared some parameters. This is not the case with Harmony. + + +## Getting started on development + +### Work so far + +Work in progress is on the `harmony` branch. This depends on the `low-hanging-refactors` +branch being merged. A PR is open. + +In addition to this work, refactoring, type checking, and type annotations have been +added to the codebase to support the migration to Harmony. + + +### Familiarize with Harmony + +* Check out this amazing notebook provided by Amy Steiker and Patrick Quinn: + +* Review the interactive API documentation: + + + +### Getting started replacing ECS with Harmony + +1. Find the `WIP` commit (`ac916d6`) and use `git reset` to restore the changes into the + working tree. There are several breakpoints set, as well as an artificially + introduced exception class to help trace and narrow the code paths during + refactoring. +2. Exercise a specific code path. For example: + + ```python + import icepyx as ipx + import datetime as dt + + q = ipx.Query( + product="ATL06", + version="006", + spatial_extent=[-90, 68, 48, 90], + # "./doc/source/example_notebooks/supporting_files/simple_test_poly.gpkg", + date_range={ + "start_date": dt.datetime(2018, 10, 10, 0, 10, 0), + "end_date": dt.datetime(2018, 10, 18, 14, 45, 30), + # "end_date": '2019-02-28', + } + ) + + q.download_granules("/tmp/icepyx") + ``` + +3. Identify the first query to ECS. Queries, except the capabilities query in + `is2ref.py`, are formed from constants in `urls.py`. Continue this practice. Harmony + URLs in this file are placeholders. +4. Determine an equivalent Harmony query. The Harmony Coverages API has an equivalent to + the capabilities query in `is2ref.py`, for example. +5. Raise `RefactoringException` at the top of any functions or methods which currently + speak to ECS. This will help us find and delete those "dead code" functions later, + and prevent them from being inadvertently executed. +6. Write new functions or methods which speak to Harmony instead. It's important to + encapsulate the communication with the Harmony API in a single function. This may + mean replacing one function with several smaller functions during refactoring. +7. Maintain the high standard of documentation in the code. Include examples as doctests + in the new functions. Use Numpy style docstrings. **DO NOT** include type information + in docstrings -- write type annotations instead. They will be automatically + documented by the documentation generator. +8. Repeat from step 3 for the next EGI query. + +### Watch out for broken assumptions + +It's important to note that two major assumptions will require significant refactoring. +The type annotations will help with this process! + +1. Broken assumption: "CMR and EGI share parameter sets". My mental model looks like: + * Current: User passes in parameters to `Query(...)`. Those params are used to generate + separate "CMR parameters" and "reqparams". "CMRparams" are spatial and temporal + parameters compatible with CMR. I'm not sure about the naming of "reqparams", but I + think of them as the EGI parameters (which may include more than the user passed, like + `page_size`) _minus_ the CMR spatial and temporal parameters. The actual queries + submitted to CMR and EGI are based on those generated parameter sets. + * Future: In Harmony-land, the shared parameter assumption is broken. CMR and Harmony's + Coverages API have completely parameter sets. The code can be drastically simplified: + User passes in parameters to `Query(...)`. Those params are used directly to generate + both CMR and Harmony queries without an intervening layer. E.g. +2. Broken assumption: "We can query with only short_name and version number". Harmony + requires a unique identifier (concept ID or DOI). E.g.: + + . + Since we want the user to be able to provide short_name and version, implementing the + concept ID as a `@cached_property` on `Query` which asks CMR for the concept ID makes + sense to me. + + +### Don't forget to enhance along the way + +* Now that we're ripping things apart and changing parameters, I think it's important to + replace the TypedDict annotations we're using with Pydantic models. This will enable us + to better encapsulate validation code that's currently spread around. + + +## Integrating with other ongoing Icepyx work + +Harmony is a major breaking change, so we'll be releasing it in Icepyx v2. + +We know the community wants to break the API in some other ways, so we want to include those in v2 as well! + +* Some of Icepyx's Query functionality is already served by earthaccess; refactor or replace the `Query` class? +* ? + +Jessica is currently determining who can help work on these changes, and what that looks like. *If you, the +Harmony/ECS migration developer, identify opportunities to easily replace portions of Icepyx with _earthaccess_ +or other libraries, take advantage of that opportunity. + +## FAQ + +### Which API? + +Harmony has two APIs: + +* [OGC Environmental Data Retrieval API](https://harmony.earthdata.nasa.gov/docs/edr-api) +* [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/) + +Which should be used and when and why? + + +#### "Answer" + +Use the [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)! + +> My take is that we ought to focus on the Coverages API for ICESat-2, since we aren’t +> making use of the new parameters. And this is what they primarily support. But I don’t +> have a good handle on whether we ought to pursue the EDR API at any point. +> +> - Amy Steiker + +See this thread on EOSDIS Slack for more details: + + diff --git a/icepyx/__init__.py b/icepyx/__init__.py index b0cd8095d..a9d61834b 100644 --- a/icepyx/__init__.py +++ b/icepyx/__init__.py @@ -1,17 +1,3 @@ -from warnings import warn - -deprecation_msg = """icepyx v1.x is being deprecated; the back-end systems on which it relies -will be shut down as of late 2024. At that time, upgrade to icepyx v2.x, which uses the -new NASA Harmony back-end, will be required. Please see - for more -information! -""" -# IMPORTANT: This is being done before the other icepyx imports because the imported -# code changes warning filters. If this is done after the imports, the warning won't -# work. -warn(deprecation_msg, FutureWarning, stacklevel=2) - - from _icepyx_version import version as __version__ from icepyx.core.query import GenQuery, Query diff --git a/icepyx/core/cmr.py b/icepyx/core/cmr.py index 0c453c78b..d50b871a7 100644 --- a/icepyx/core/cmr.py +++ b/icepyx/core/cmr.py @@ -1,3 +1,27 @@ from typing import Final +import requests + +from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL + CMR_PROVIDER: Final = "NSIDC_CPRD" + + +def get_concept_id(*, product: str, version: str) -> str: + response = requests.get( + COLLECTION_SEARCH_BASE_URL, + params={ + "short_name": product, + "version": version, + "provider": CMR_PROVIDER, + }, + ) + metadata = response.json()["feed"]["entry"] + + if len(metadata) != 1: + raise RuntimeError(f"Expected 1 result from CMR, received {metadata}") + + return metadata[0]["id"] + + +# TODO: Extract CMR collection query from granules.py diff --git a/icepyx/core/exceptions.py b/icepyx/core/exceptions.py index 085fed8c9..b4598dc2c 100644 --- a/icepyx/core/exceptions.py +++ b/icepyx/core/exceptions.py @@ -53,3 +53,11 @@ class ExhaustiveTypeGuardException(TypeGuardException): Used exclusively in cases where the typechecker needs a typeguard to tell it that a check is exhaustive. """ + + +class RefactoringException(Exception): + def __str__(self): + return ( + "This code is being refactored." + " The code after this exception is expected to require major changes." + ) diff --git a/icepyx/core/granules.py b/icepyx/core/granules.py index 119e60bb3..d6a519048 100644 --- a/icepyx/core/granules.py +++ b/icepyx/core/granules.py @@ -25,6 +25,7 @@ EGIRequiredParamsSearch, ) from icepyx.core.urls import DOWNLOAD_BASE_URL, GRANULE_SEARCH_BASE_URL, ORDER_BASE_URL +from icepyx.uat import EDL_ACCESS_TOKEN def info(grans: list[dict]) -> dict[str, Union[int, float]]: @@ -228,7 +229,11 @@ def get_avail( # if not hasattr(self, 'avail'): self.avail = [] - headers = {"Accept": "application/json", "Client-Id": "icepyx"} + headers = { + "Accept": "application/json", + "Client-Id": "icepyx", + "Authorization": f"Bearer {EDL_ACCESS_TOKEN}", + } # note we should also check for errors whenever we ping NSIDC-API - # make a function to check for errors @@ -332,6 +337,7 @@ def place_order( -------- query.Query.order_granules """ + raise icepyx.core.exceptions.RefactoringException self.get_avail(CMRparams, reqparams) @@ -366,6 +372,7 @@ def place_order( total_pages, " is submitting to NSIDC", ) + breakpoint() request_params.update({"page_num": page_num}) request = self.session.get(ORDER_BASE_URL, params=request_params) @@ -523,10 +530,6 @@ def download(self, verbose, path, restart=False): -------- query.Query.download_granules """ - """ - extract : boolean, default False - Unzip the downloaded granules. - """ # DevNote: this will replace any existing orderIDs with the saved list # (could create confusion depending on whether download was interrupted or kernel restarted) diff --git a/icepyx/core/harmony.py b/icepyx/core/harmony.py new file mode 100644 index 000000000..8d03ecc5d --- /dev/null +++ b/icepyx/core/harmony.py @@ -0,0 +1,13 @@ +from typing import Any + +import requests + +from icepyx.core.urls import CAPABILITIES_BASE_URL + + +def get_capabilities(concept_id: str) -> dict[str, Any]: + response = requests.get( + CAPABILITIES_BASE_URL, + params={"collectionId": concept_id}, + ) + return response.json() diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py index ac080dd4f..b598ede26 100644 --- a/icepyx/core/is2ref.py +++ b/icepyx/core/is2ref.py @@ -8,7 +8,8 @@ import numpy as np import requests -from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL, EGI_BASE_URL +from icepyx.core.exceptions import RefactoringException +from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL # ICESat-2 specific reference functions @@ -92,9 +93,9 @@ def about_product(prod: str) -> dict: # DevGoal: use a mock of this output to test later functions, such as displaying options and widgets, etc. # options to get customization options for ICESat-2 data (though could be used generally) def _get_custom_options(session, product, version): - """ - Get lists of what customization options are available for the product from NSIDC. - """ + """Get lists of available customization options from Harmony.""" + raise RefactoringException + cust_options = {} if session is None: @@ -102,6 +103,11 @@ def _get_custom_options(session, product, version): "Don't forget to log in to Earthdata using query.earthdata_login()" ) + # concept_id_query_url = f"{COLLECTION_SEARCH_BASE_URL}?short_name={product}&version={version}" + # concept_id = session.get(concept_id_query_url).json()["feed"]["entry"][-1]["id"] + # capability_url = f"{CAPABILITIES_BASE_URL}?collectionId={concept_id}" + # response_json = session.get(capability_url).json() + capability_url = f"{EGI_BASE_URL}/capabilities/{product}.{version}.xml" response = session.get(capability_url) root = ET.fromstring(response.content) @@ -111,6 +117,7 @@ def _get_custom_options(session, product, version): cust_options.update({"options": subagent}) # reformatting + # cust_options.update({"fileformats": response_json["outputFormats"]}) formats = [Format.attrib for Format in root.iter("Format")] format_vals = [formats[i]["value"] for i in range(len(formats))] try: diff --git a/icepyx/core/query.py b/icepyx/core/query.py index 4d0d3015f..573ca8b1b 100644 --- a/icepyx/core/query.py +++ b/icepyx/core/query.py @@ -10,7 +10,8 @@ import icepyx.core.APIformatting as apifmt from icepyx.core.auth import EarthdataAuthMixin -from icepyx.core.exceptions import DeprecationError +from icepyx.core.cmr import get_concept_id +from icepyx.core.exceptions import DeprecationError, RefactoringException import icepyx.core.granules as granules from icepyx.core.granules import Granules import icepyx.core.is2ref as is2ref @@ -464,6 +465,13 @@ def __str__(self) -> str: self.spatial_extent, self.dates, self.product, self.product_version ) + @cached_property + def concept_id(self) -> str: + return get_concept_id( + product=self.product, + version=self.product_version, + ) + @property def dataset(self) -> Never: """ @@ -605,6 +613,7 @@ def reqparams(self) -> EGIRequiredParams: >>> reg_a.reqparams # doctest: +SKIP {'short_name': 'ATL06', 'version': '006', 'page_size': 2000, 'page_num': 1, 'request_mode': 'async', 'include_meta': 'Y', 'client_string': 'icepyx'} """ + raise RefactoringException if not hasattr(self, "_reqparams"): self._reqparams = apifmt.Parameters("required", reqtype="search") @@ -641,6 +650,8 @@ def subsetparams(self, **kwargs) -> Union[EGIParamsSubset, dict[Never, Never]]: {'time': '2019-02-20T00:00:00,2019-02-28T23:59:59', 'bbox': '-55.0,68.0,-48.0,71.0'} """ + raise RefactoringException + if not hasattr(self, "_subsetparams"): self._subsetparams = apifmt.Parameters("subset") @@ -977,16 +988,16 @@ def order_granules( Parameters ---------- - verbose : boolean, default False + verbose : Print out all feedback available from the order process. Progress information is automatically printed regardless of the value of verbose. - subset : boolean, default True + subset : Apply subsetting to the data order from the NSIDC, returning only data that meets the subset parameters. Spatial and temporal subsetting based on the input parameters happens by default when subset=True, but additional subsetting options are available. Spatial subsetting returns all data that are within the area of interest (but not complete granules. This eliminates false-positive granules returned by the metadata-level search) - email: boolean, default False + email : Have NSIDC auto-send order status email updates to indicate order status as pending/completed. The emails are sent to the account associated with your Earthdata account. **kwargs : key-value pairs @@ -1013,6 +1024,8 @@ def order_granules( . Retry request status is: complete """ + breakpoint() + raise RefactoringException if not hasattr(self, "reqparams"): self.reqparams @@ -1106,10 +1119,6 @@ def download_granules( See Also -------- granules.download - """ - """ - extract : boolean, default False - Unzip the downloaded granules. Examples -------- @@ -1131,6 +1140,8 @@ def download_granules( or len(self.granules.orderIDs) == 0 ): self.order_granules(verbose=verbose, subset=subset, **kwargs) + breakpoint() + raise RefactoringException self.granules.download(verbose, path, restart=restart) diff --git a/icepyx/core/types.py b/icepyx/core/types.py deleted file mode 100644 index e85f8696f..000000000 --- a/icepyx/core/types.py +++ /dev/null @@ -1,111 +0,0 @@ -from __future__ import annotations - -from typing import Literal, TypedDict, Union - -from typing_extensions import NotRequired - -ICESat2ProductShortName = Literal[ - "ATL01", - "ATL02", - "ATL03", - "ATL04", - "ATL06", - "ATL07", - "ATL07QL", - "ATL08", - "ATL09", - "ATL09QL", - "ATL10", - "ATL11", - "ATL12", - "ATL13", - "ATL14", - "ATL15", - "ATL16", - "ATL17", - "ATL19", - "ATL20", - "ATL21", - "ATL23", -] - -CMRParamsBase = TypedDict( - "CMRParamsBase", - { - "temporal": NotRequired[str], - "options[readable_granule_name][pattern]": NotRequired[str], - "options[spatial][or]": NotRequired[str], - "readable_granule_name[]": NotRequired[str], - }, -) - - -class CMRParamsWithBbox(CMRParamsBase): - bounding_box: str - - -class CMRParamsWithPolygon(CMRParamsBase): - polygon: str - - -CMRParams = Union[CMRParamsWithBbox, CMRParamsWithPolygon] - - -class EGIRequiredParamsBase(TypedDict): - """Common parameters for searching, ordering, or downloading from EGI. - - See: https://wiki.earthdata.nasa.gov/display/SDPSDOCS/EGI+Programmatic+Access+Documentation - - EGI shares parameters with CMR, so this data is used in conjunction with CMRParams - to build EGI requests. - - TODO: Validate more strongly (with Pydantic and its annotated types? - https://docs.pydantic.dev/latest/concepts/types/#composing-types-via-annotated): - - * version is 3 digits - * 0 < page_size <= 2000 - """ - - short_name: ICESat2ProductShortName # alias: "product" - version: str - page_size: int # default 2000 - page_num: int # default 0 - - -class EGIRequiredParamsSearch(EGIRequiredParamsBase): - """Parameters for interacting with EGI.""" - - -class EGIRequiredParamsDownload(EGIRequiredParamsBase): - """Parameters for ordering from EGI. - - TODO: Validate more strongly (with Pydantic?): page_num >=0. - """ - - request_mode: Literal["sync", "async", "stream"] # default "async" - include_meta: Literal["Y", "N"] # default "Y" - client_string: Literal["icepyx"] # default "icepyx" - # token, email - - -class EGIParamsSubsetBase(TypedDict): - """Parameters for subsetting with EGI.""" - - time: NotRequired[str] - format: NotRequired[str] - projection: NotRequired[str] - projection_parameters: NotRequired[str] - Coverage: NotRequired[str] - - -class EGIParamsSubsetBbox(EGIParamsSubsetBase): - bbox: NotRequired[str] - - -class EGIParamsSubsetBoundingShape(EGIParamsSubsetBase): - Boundingshape: NotRequired[str] - - -EGIParamsSubset = Union[EGIParamsSubsetBbox, EGIParamsSubsetBoundingShape] - -EGIRequiredParams = Union[EGIRequiredParamsSearch, EGIRequiredParamsDownload] diff --git a/icepyx/core/types/__init__.py b/icepyx/core/types/__init__.py new file mode 100644 index 000000000..335474ea9 --- /dev/null +++ b/icepyx/core/types/__init__.py @@ -0,0 +1,28 @@ +from __future__ import annotations + +from typing import Literal + +ICESat2ProductShortName = Literal[ + "ATL01", + "ATL02", + "ATL03", + "ATL04", + "ATL06", + "ATL07", + "ATL07QL", + "ATL08", + "ATL09", + "ATL09QL", + "ATL10", + "ATL11", + "ATL12", + "ATL13", + "ATL14", + "ATL15", + "ATL16", + "ATL17", + "ATL19", + "ATL20", + "ATL21", + "ATL23", +] diff --git a/icepyx/core/types/api.py b/icepyx/core/types/api.py new file mode 100644 index 000000000..b29ba8fb4 --- /dev/null +++ b/icepyx/core/types/api.py @@ -0,0 +1,28 @@ +from typing import Literal, TypedDict, Union + +from typing_extensions import NotRequired +from pydantic import BaseModel + +CMRParamsBase = TypedDict( + "CMRParamsBase", + { + "temporal": NotRequired[str], + "options[readable_granule_name][pattern]": NotRequired[str], + "options[spatial][or]": NotRequired[str], + "readable_granule_name[]": NotRequired[str], + }, +) + + +class CMRParamsWithBbox(CMRParamsBase): + bounding_box: str + + +class CMRParamsWithPolygon(CMRParamsBase): + polygon: str + + +CMRParams = Union[CMRParamsWithBbox, CMRParamsWithPolygon] + + +class HarmonyCoverageAPIParamsBase(BaseModel): diff --git a/icepyx/core/urls.py b/icepyx/core/urls.py index 8c5bc325b..643525cc9 100644 --- a/icepyx/core/urls.py +++ b/icepyx/core/urls.py @@ -4,7 +4,9 @@ GRANULE_SEARCH_BASE_URL: Final = f"{CMR_BASE_URL}/search/granules" COLLECTION_SEARCH_BASE_URL: Final = f"{CMR_BASE_URL}/search/collections.json" -EGI_BASE_URL: Final = "https://n5eil02u.ecs.nsidc.org/egi" -ORDER_BASE_URL: Final = f"{EGI_BASE_URL}/request" - -DOWNLOAD_BASE_URL: Final = "https://n5eil02u.ecs.nsidc.org/esir" +# TODO: the harmony base url and capabilities URL will be handled by +# `harmony-py`: remove these constants. +HARMONY_BASE_URL: Final = "https://harmony.earthdata.nasa.gov" +CAPABILITIES_BASE_URL: Final = f"{HARMONY_BASE_URL}/capabilities" +ORDER_BASE_URL: Final = f"{HARMONY_BASE_URL}/...?" +DOWNLOAD_BASE_URL: Final = f"{HARMONY_BASE_URL}/...?" diff --git a/requirements.txt b/requirements.txt index 6a9659270..1ff3a8824 100644 --- a/requirements.txt +++ b/requirements.txt @@ -10,6 +10,7 @@ holoviews hvplot matplotlib numpy +pydantic>=2.9.2 requests s3fs shapely