Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset.services() method to list available services #500

Merged
merged 55 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
3c4d293
List services that are available for a collection
nikki-t Mar 19, 2024
78a6fcb
Define integration test for services functionality
nikki-t Mar 19, 2024
285b0d6
Update imports and fix type annotiations
nikki-t Mar 19, 2024
5a40e8a
Update file formatting
nikki-t Mar 19, 2024
8088690
Update changelog and readme to include services functionality
nikki-t Mar 25, 2024
911b954
Update for clarity on services
nikki-t Mar 25, 2024
bc9c2f4
Provide unit test for DataService get function
nikki-t Mar 25, 2024
0bb9f11
Fix formatting of imports
nikki-t Mar 25, 2024
692d3cf
Fix code formatting
nikki-t Mar 25, 2024
9e37fe2
Mock API response to account for changing service records
nikki-t Mar 25, 2024
e8dfc74
Add documentation for services functionality
nikki-t Mar 25, 2024
5404df3
Be more clear about test failures because no tests were collected
mfisher87 Apr 2, 2024
1c41659
Improve the error message when no tests collected
mfisher87 Apr 2, 2024
daf1bb4
Merge branch 'main' into feature/issue-447
mfisher87 Apr 2, 2024
07771e1
Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…
nikki-t Apr 26, 2024
240c930
Fix import organization
nikki-t Apr 26, 2024
4a710f4
Use VCR for CMR API calls and update unit and integration test for se…
nikki-t Apr 26, 2024
6cf0f64
Merge branch 'feature/issue-447' of https://github.com/nikki-t/eartha…
nikki-t Apr 26, 2024
20d395d
Fix test formatting
nikki-t Apr 26, 2024
44917d9
Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…
nikki-t May 14, 2024
71973dc
Fix DataService init documentation
nikki-t May 14, 2024
16512d1
Add a HOW-TO on searching for services
nikki-t May 14, 2024
057d7fd
Fix trailing whitespace
nikki-t May 14, 2024
41d18b8
Update docs/howto/search-services.md
nikki-t Jun 3, 2024
14dc3ef
Update earthaccess/services.py
nikki-t Jun 3, 2024
64caf3d
Update earthaccess/services.py
nikki-t Jun 3, 2024
f67365b
Update earthaccess/results.py
nikki-t Jun 3, 2024
0c66c66
Update earthaccess/results.py
nikki-t Jun 3, 2024
9dfa385
Add issue to changelog enhancements
nikki-t Jun 11, 2024
cc24470
Update service architecture to provide cleanr access to service queries.
nikki-t Jun 11, 2024
2362eb8
Factor our get_results to utils._search to be shared by search and re…
nikki-t Jun 11, 2024
d1bebe3
Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…
nikki-t Jun 11, 2024
0e505ae
Fix code formatting
nikki-t Jun 11, 2024
b437341
Fix reference to expected test data file
nikki-t Jun 11, 2024
8ecacd1
Fix issue with accessing expected test data
nikki-t Jun 11, 2024
b874c01
Test response for different Python version unit tests
nikki-t Jun 11, 2024
61b2e04
Test response for different Python version unit tests
nikki-t Jun 11, 2024
49fb87d
Remove logging of response
nikki-t Jun 11, 2024
c2af6b3
Update fixtures for JSON body
nikki-t Jul 23, 2024
9474b33
Set authentication to false
nikki-t Jul 23, 2024
f6d3776
Merge branch 'main' into feature/issue-447
betolink Jul 23, 2024
c0a3c8a
Fix end of file reference
nikki-t Jul 23, 2024
79319c8
Merge branch 'feature/issue-447' of https://github.com/nikki-t/eartha…
nikki-t Jul 23, 2024
01bad9c
Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…
nikki-t Sep 10, 2024
2c29d73
Decode compressed VCR response
nikki-t Sep 10, 2024
8dd2402
Update unit test VCR file
nikki-t Sep 10, 2024
a1b64f2
Cache mypy_cache in CI to speedup build
chuckwondo Sep 13, 2024
d99dc3c
Tweak mypy config to drop explicit path args
chuckwondo Sep 13, 2024
134a7b7
Pluralize DataService
chuckwondo Sep 13, 2024
783d996
Add top-level search_services function
chuckwondo Sep 13, 2024
ba0eb8f
Simplify logic
chuckwondo Sep 13, 2024
39f8963
Fix tests failing for response compression handling
chuckwondo Sep 13, 2024
86ab7a8
Fixup changelog
chuckwondo Sep 13, 2024
3971454
Fix mkdocs build, including broken links
chuckwondo Sep 13, 2024
39b86a8
Add mfisher87 and betolink to credits for #447
chuckwondo Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ jobs:
poetry config virtualenvs.create true --local
poetry config virtualenvs.in-project true --local
poetry self add setuptools
- name: Set up mypy cache
uses: actions/cache@v4
id: mypy-cache
with:
path: .mypy_cache
key: mypy-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('poetry.lock') }}
- name: Set up cache
uses: actions/cache@v4
id: cache
Expand All @@ -44,12 +50,12 @@ jobs:
run: poetry run pip --version >/dev/null 2>&1 || rm -rf .venv
- name: Install Dependencies
if: ${{ !env.ACT }}
run: poetry install
run: poetry install --quiet
- name: Install Dependencies
if: ${{ env.ACT }}
# When using `act` to run the workflow locally, the `poetry install` command
# may fail due to network issues when running multiple Docker containers.
run: poetry install || poetry install || poetry install
run: poetry install --quiet || poetry install --quiet || poetry install --quiet
- name: Test
run: poetry run bash scripts/test.sh
- name: Upload coverage
Expand Down
528 changes: 386 additions & 142 deletions CHANGELOG.md

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions docs/howto/search-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# How to search for services using `earthaccess`

You can search for services associated with a dataset. Services include a
back-end processing workflow that transforms or processes the data in some way
(e.g. clipping to a spatial extent or converting to a different file format).

`earthaccess` facilitates the retrieval of service metadata via the
`search_datasets` function. The results from the `search_datasets` method are
an enhanced Python dictionary that includes a `services` method which returns
the metadata for all services associated with a collection. The service results
are returned as a Python dictionary.

To search for services, import the earthaccess library and search by dataset
(you need to know the short name of the dataset which can be found on the
dataset landing page):

```py
import earthaccess

datasets = earthaccess.search_datasets(
short_name="MUR-JPL-L4-GLOB-v4.1",
cloud_hosted=True,
temporal=("2024-02-27T00:00:00Z", "2024-02-29T23:59:59Z"),
)
```

Parse the service results to return metadata on services available for the dataset.

```py
for dataset in datasets:
print(dataset.services())
```

Alternatively, you may search directly for services. For example:

```py
services = earthaccess.search_services(provider="POCLOUD", keyword="COG")
```

The keyword arguments supported by the `search_services` function are
constrained to what the NASA CMR allows, as described in the
[Service section of the CMR API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service).
7 changes: 7 additions & 0 deletions docs/user-reference/collections/collections-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Documentation for `Collection Services`

::: earthaccess.DataServices
options:
inherited_members: true
show_root_heading: true
show_source: false
2 changes: 1 addition & 1 deletion docs/user_guide/access.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
We are reorganizing and updating the documentation, so not all pages are complete. If you are looking for information about accessing data using earthaccess see the
HOW-TO pages below.

* [Quick start](../../quick-start/)
* [Quick start](../quick-start.md)
* [How-to download data](../howto/onprem.md)

## Downloading data
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/authenticate.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ Introduces the `earthaccess.login` method for managing Earthdata Login and cloud
We are reorganizing and updating the documentation, so not all pages are complete. If you are looking for information about authenticating using earthaccess see the
How-Tos and Tutorials in links below.

* [Quick start](../../quick-start/)
* [Quick start](../quick-start.md)
* [How-To Authenticate with earthaccess](../howto/authenticate.md)
2 changes: 1 addition & 1 deletion docs/user_guide/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
We are reorganizing and updating the documentation, so not all pages are complete. If you are looking for information about authenticating using earthaccess see the
How-Tos and Tutorials in links below.

* [Quick start](../../quick-start/)
* [Quick start](../quick-start.md)
* [How-To Access Data](../howto/access-data.md)

## `search_datasets`
Expand Down
44 changes: 23 additions & 21 deletions earthaccess/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@
open,
search_data,
search_datasets,
search_services,
)
from .auth import Auth
from .kerchunk import consolidate_metadata
from .search import DataCollections, DataGranules
from .services import DataServices
from .store import Store
from .system import PROD, UAT

Expand All @@ -31,6 +33,7 @@
"login",
"search_datasets",
"search_data",
"search_services",
"get_requests_https_session",
"get_fsspec_https_session",
"get_s3fs_session",
Expand All @@ -45,6 +48,7 @@
# search.py
"DataGranules",
"DataCollections",
"DataServices",
# auth.py
"Auth",
# store.py
Expand All @@ -70,26 +74,24 @@ def __getattr__(name): # type: ignore
"""
global _auth, _store

if name == "__auth__" or name == "__store__":
with _lock:
if not _auth.authenticated:
for strategy in ["environment", "netrc"]:
try:
_auth.login(strategy=strategy)
except Exception as e:
if name not in ["__auth__", "__store__"]:
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

with _lock:
if not _auth.authenticated:
for strategy in ["environment", "netrc"]:
try:
_auth.login(strategy=strategy)

if _auth.authenticated:
_store = Store(_auth)
logger.debug(
f"An error occurred during automatic authentication with {strategy=}: {str(e)}"
f"Automatic authentication with {strategy=} was successful"
)
continue
else:
if not _auth.authenticated:
continue
else:
_store = Store(_auth)
logger.debug(
f"Automatic authentication with {strategy=} was successful"
)
break
return _auth if name == "__auth__" else _store
else:
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
break
except Exception as e:
logger.debug(
f"An error occurred during automatic authentication with {strategy=}: {str(e)}"
)

return _auth if name == "__auth__" else _store
29 changes: 29 additions & 0 deletions earthaccess/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from typing_extensions import Any, Dict, List, Optional, Union, deprecated

import earthaccess
from earthaccess.services import DataServices

from .auth import Auth
from .results import DataCollection, DataGranule
Expand Down Expand Up @@ -130,6 +131,34 @@ def search_data(count: int = -1, **kwargs: Any) -> List[DataGranule]:
return query.get_all()


def search_services(count: int = -1, **kwargs: Any) -> List[Any]:
"""Search the NASA CMR for Services matching criteria.

See <https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service>.

Parameters:
count:
maximum number of services to fetch (if less than 1, all services
matching specified criteria are fetched [default])
kwargs:
keyword arguments accepted by the CMR for searching services

Returns:
list of services (possibly empty) matching specified criteria, in UMM
JSON format

Examples:
```python
services = search_services(provider="POCLOUD", keyword="COG")
```
"""
query = DataServices(auth=earthaccess.__auth__).parameters(**kwargs)
hits = query.hits()
logger.info(f"Services found: {hits}")

return query.get(hits if count < 1 else min(count, hits))


def login(strategy: str = "all", persist: bool = False, system: System = PROD) -> Auth:
"""Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Expand Down
13 changes: 13 additions & 0 deletions earthaccess/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@
import uuid
from typing import Any, Dict, List, Optional, Union

import earthaccess

from .formatters import _repr_granule_html
from .services import DataServices


class CustomDict(dict):
Expand Down Expand Up @@ -178,6 +181,16 @@ def s3_bucket(self) -> Dict[str, Any]:
return self["umm"]["DirectDistributionInformation"]
return {}

def services(self) -> Dict[Any, List[Dict[str, Any]]]:
"""Return list of services available for this collection."""
services = self.get("meta", {}).get("associations", {}).get("services", [])
queries = (
DataServices(auth=earthaccess.__auth__).parameters(concept_id=service)
for service in services
)

return {service: query.get_all() for service, query in zip(services, queries)}

def __repr__(self) -> str:
return json.dumps(
self.render_dict, sort_keys=False, indent=2, separators=(",", ": ")
Expand Down
49 changes: 1 addition & 48 deletions earthaccess/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,61 +21,14 @@
from .auth import Auth
from .daac import find_provider, find_provider_by_shortname
from .results import DataCollection, DataGranule
from .utils._search import get_results

logger = logging.getLogger(__name__)

FloatLike: TypeAlias = Union[str, SupportsFloat]
PointLike: TypeAlias = Tuple[FloatLike, FloatLike]


def get_results(
session: requests.Session,
query: Union[CollectionQuery, GranuleQuery],
limit: int = 2000,
) -> List[Any]:
"""Get all results up to some limit, even if spanning multiple pages.

???+ Tip
The default page size is 2000, if the supplied value is greater then the
Search-After header will be used to iterate across multiple requests until
either the limit has been reached or there are no more results.

Parameters:
limit: The number of results to return

Returns:
query results as a list

Raises:
RuntimeError: The CMR query failed.
"""
page_size = min(limit, 2000)
url = query._build_url()

results: List[Any] = []
more_results = True
headers = dict(query.headers or {})

while more_results:
response = session.get(url, headers=headers, params={"page_size": page_size})

if cmr_search_after := response.headers.get("cmr-search-after"):
headers["cmr-search-after"] = cmr_search_after

try:
response.raise_for_status()
except requests.exceptions.HTTPError as ex:
raise RuntimeError(ex.response.text) from ex

latest = response.json()["items"]

results.extend(latest)

more_results = page_size <= len(latest) and len(results) < limit

return results


class DataCollections(CollectionQuery):
"""Placeholder.

Expand Down
47 changes: 47 additions & 0 deletions earthaccess/services.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from typing import Any, List, Optional

import requests

from cmr import ServiceQuery

from .auth import Auth
from .utils import _search as search


class DataServices(ServiceQuery):
"""A Service client for NASA CMR that returns data on collection services.

API: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service
"""

_format = "umm_json"

def __init__(self, auth: Optional[Auth] = None, *args: Any, **kwargs: Any) -> None:
"""Build an instance of DataService to query CMR.

auth is an optional parameter for queries that need authentication,
e.g. restricted datasets.

Parameters:
auth: An authenticated `Auth` instance.
"""
super().__init__(*args, **kwargs)
self._debug = False

# To search, we need the new bearer tokens from NASA Earthdata
self.session = (
auth.get_session(bearer_token=True)
if auth is not None and auth.authenticated
else requests.sessions.Session()
)

def get(self, limit: int = 2000) -> List:
"""Get all service results up to some limit.

Parameters
limit (int): The number of results to return

Returns:
Query results as a list
"""
return search.get_results(self.session, self, limit)
Loading
Loading