Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing searching for restricted datasets and accessing ASF on demand data from Opera #443

Merged
merged 13 commits into from
Feb 11, 2024
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## [Unreleased]

* Bug fixes:
* fixed #439 by implementing more trusted domains in the SessionWithRedirection
* fixed #438 by using an authenticated session for hits()
* Enhancements:
* addressing #427 by adding parameters to collection query

## [v0.8.2] 2023-12-06
* Bug fixes:
* Enable AWS check with IMDSv2
Expand Down Expand Up @@ -167,7 +175,7 @@
- Add basic classes to interact with NASA CMR, EDL and cloud access.
- Basic object formatting.

[Unreleased]: https://github.com/nsidc/earthaccess/compare/v0.5.2...HEAD
[Unreleased]: https://github.com/nsidc/earthaccess/compare/v0.8.2...HEAD
[v0.5.2]: https://github.com/nsidc/earthaccess/releases/tag/v0.5.2
[v0.5.1]: https://github.com/nsidc/earthaccess/releases/tag/v0.5.1
[v0.5.0]: https://github.com/nsidc/earthaccess/releases/tag/v0.4.0
Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ With *earthaccess* we can login, search and download data with a few lines of co

The only requirement to use this library is to open a free account with NASA [EDL](https://urs.earthdata.nasa.gov).

<a href="https://urs.earthdata.nasa.gov"><img src="https://auth.ops.maap-project.org/cas/images/urs-logo.png" /></a>


### **Authentication**

Expand Down
2 changes: 1 addition & 1 deletion binder/environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
dependencies:
# This environment bootstraps poetry, the actual dev environment
# is installed and managed with poetry
- python=3.9
- python=3.10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is only for the local dev environment and actually not necessary as we can just pip install the project locally.... I use it to bootstrap the environment and install some conda libs to test notebooks with (cartopy)

- jupyterlab=3
- xarray>=0.19
- ipyleaflet>=0.13
Expand Down
19 changes: 12 additions & 7 deletions earthaccess/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,13 @@ class SessionWithHeaderRedirection(requests.Session):
"""
Requests removes auth headers if the redirect happens outside the
original req domain.
This is taken from https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python
"""

AUTH_HOST = "urs.earthdata.nasa.gov"
AUTH_HOSTS: List[str] = [
"urs.earthdata.nasa.gov",
"cumulus.asf.alaska.edu",
mfisher87 marked this conversation as resolved.
Show resolved Hide resolved
"datapool.asf.alaska.edu",
]

def __init__(
self, username: Optional[str] = None, password: Optional[str] = None
Expand All @@ -39,11 +42,13 @@ def rebuild_auth(self, prepared_request: Any, response: Any) -> None:
if "Authorization" in headers:
original_parsed = urlparse(response.request.url)
redirect_parsed = urlparse(url)
if (
(original_parsed.hostname != redirect_parsed.hostname)
and redirect_parsed.hostname != self.AUTH_HOST
and original_parsed.hostname != self.AUTH_HOST
if (original_parsed.hostname != redirect_parsed.hostname) and (
redirect_parsed.hostname not in self.AUTH_HOSTS
or original_parsed.hostname not in self.AUTH_HOSTS
):
logger.debug(
f"Deleting Auth Headers: {original_parsed.hostname} -> {redirect_parsed.hostname}"
)
del headers["Authorization"]
return

Expand Down Expand Up @@ -210,7 +215,7 @@ def get_session(self, bearer_token: bool = True) -> requests.Session:
Returns:
class Session instance with Auth and bearer token headers
"""
session = requests.Session()
session = SessionWithHeaderRedirection()
if bearer_token and self.authenticated:
# This will avoid the use of the netrc after we are logged in
session.trust_env = False
Expand Down
4 changes: 3 additions & 1 deletion earthaccess/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,9 @@ def __repr__(self) -> str:
Temporal coverage: {self['umm']['TemporalExtent']}
Size(MB): {self.size()}
Data: {data_links}\n\n
""".strip().replace(" ", "")
""".strip().replace(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black likes this, Ruff does not, which one is right? @mfisher87

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pick one :) Ruff offers a formatter that re-implements Black, but I've never worked with it before. Personally, I'd turn Ruff formatting off and stick with Black due to my lack of experience, but if I had more time I'd try to learn more and switch to Ruff for everything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Ruff prefers the compact and black expands. I think Ruff is actually correct here (obvs. subjective).

This doc is helpful for black vs ruff:
https://github.com/astral-sh/ruff/blob/main/docs/formatter/black.md#implicit-string-concatenations-in-attribute-accesses

I don't think it matters which, but I agree that we should pick one -- I'd prefer ruff overall.

@MattF-NSIDC I'm pretty happy with ruff; and the transition from black to ruff is pretty straight forward as they mostly behave the same.

" ", ""
)
return rep_str

def _repr_html_(self) -> str:
Expand Down
44 changes: 43 additions & 1 deletion earthaccess/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,16 @@ def hits(self) -> int:
Returns:
number of results reported by CMR
"""
return super().hits()
url = self._build_url()

response = self.session.get(url, headers=self.headers, params={"page_size": 0})

try:
response.raise_for_status()
except exceptions.HTTPError as ex:
raise RuntimeError(ex.response.text)

return int(response.headers["CMR-Hits"])

def concept_id(self, IDs: List[str]) -> Type[CollectionQuery]:
"""Filter by concept ID (ex: C1299783579-LPDAAC_ECS or G1327299284-LPDAAC_ECS, S12345678-LPDAAC_ECS)
Expand Down Expand Up @@ -106,6 +115,39 @@ def doi(self, doi: str) -> Type[CollectionQuery]:
self.params["doi"] = doi
return self

def instrument(self, instrument: str) -> Type[CollectionQuery]:
"""Searh datasets by instrument

???+ Tip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all datasets have an associated instrument. This works
only at the dataset level but not the granule (data) level.

Parameters:
instrument (String): instrument of a datasets, e.g. instrument=GEDI
"""
if not isinstance(instrument, str):
raise TypeError("instrument must be of type str")

self.params["instrument"] = instrument
return self

def project(self, project: str) -> Type[CollectionQuery]:
"""Searh datasets by associated project

???+ Tip
Not all datasets have an associated project. This works
only at the dataset level but not the granule (data) level.
Will bring datasets across DAACs matching the project.
betolink marked this conversation as resolved.
Show resolved Hide resolved

Parameters:
project (String): associated project of a datasets, e.g. project=EMIT
"""
if not isinstance(project, str):
raise TypeError("project must be of type str")

self.params["project"] = project
return self

def parameters(self, **kwargs: Any) -> Type[CollectionQuery]:
"""Provide query parameters as keyword arguments. The keyword needs to match the name
of the method, and the value should either be the value or a tuple of values.
Expand Down
Loading
Loading