Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change direct access handling to support JupyterHub #880

Closed
wants to merge 11 commits into from
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,22 @@ and this project uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html)

## [Unreleased]

### Removed
- Remove `earthaccess.__store__.in_region` member variable. There is no accurate way to determine if what region
a client is running in. Better to try and raise an exception or attempt HTTPS access
- Remove in_region specific unit tests

### Added
- add `access` argument to `api.download` and `store.get` functions. Allows you to skip direct access and immediately
attempt HTTPS access
- Add `out_of_region_handling` argument to `api.download` and `store.get` functions. Defaults to `raise`, which raises
Exceptions if direct access fails. If `handle` is passed and direct access fails, a warning is raised that includes
the encountered exception, and then HTTPS access is attempted. ([#444](https://github.com/nsidc/earthaccess/issues/444))

### Changed
- Update `test_download_deferred_failure` and `test_download_immediate_failure` integration tests to test both direct an external access
- raise exception in `conftest.py` if EarthData environment variables not found

## [v0.12.0] - 2024-11-13

### Changed
Expand Down
12 changes: 11 additions & 1 deletion earthaccess/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ def download(
local_path: Optional[Union[Path, str]] = None,
provider: Optional[str] = None,
threads: int = 8,
access: Optional[str] = None,
out_of_region_handling: Optional[str] = "raise",
*,
pqdm_kwargs: Optional[Mapping[str, Any]] = None,
) -> List[str]:
Expand All @@ -224,6 +226,9 @@ def download(
of a UUID4 value.
provider: if we download a list of URLs, we need to specify the provider.
threads: parallel number of threads to use to download the files, adjust as necessary, default = 8
access: "direct" to attempt direct S3 access, "external" for HTTPS
out_of_region_handling: "raise" to raise an Exception if attempting out-of-region access or
"handle" (or anything else) to attempt using HTTPS upon faliure
pqdm_kwargs: Additional keyword arguments to pass to pqdm, a parallel processing library.
See pqdm documentation for available options. Default is to use immediate exception behavior
and the number of jobs specified by the `threads` parameter.
Expand All @@ -243,7 +248,12 @@ def download(

try:
return earthaccess.__store__.get(
granules, local_path, provider, threads, pqdm_kwargs=pqdm_kwargs
granules,
local_path,
provider,
threads,
access=access,
pqdm_kwargs=pqdm_kwargs,
)
except AttributeError as err:
logger.error(
Expand Down
5 changes: 5 additions & 0 deletions earthaccess/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,11 @@ def get_s3_credentials(
auth_url = self._get_cloud_auth_url(
daac_shortname=daac, provider=provider
)
if not auth_url:
# Display possible typos in a helpfull error
raise Exception(
f'auth_url not found using daac: "{daac}" and provider: "{provider}"'
)
else:
auth_url = endpoint
if auth_url.startswith("https://"):
Expand Down
45 changes: 16 additions & 29 deletions earthaccess/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,50 +303,37 @@ def _derive_s3_link(self, links: List[str]) -> List[str]:
s3_links.append(f's3://{links[0].split("nasa.gov/")[1]}')
return s3_links

def data_links(
self, access: Optional[str] = None, in_region: bool = False
) -> List[str]:
def data_links(self, access: Optional[str] = None) -> List[str]:
"""Placeholder.

Returns the data links from a granule.

Parameters:
access: direct or external.
Direct means in-region access for cloud-hosted collections.
in_region: True if we are running in us-west-2.
It is meant for the store class.

Returns:
The data links for the requested access type.
"""
https_links = self._filter_related_links("GET DATA")
s3_links = self._filter_related_links("GET DATA VIA DIRECT ACCESS")
if in_region:
# we are in us-west-2
if self.cloud_hosted and access in (None, "direct"):
# this is a cloud collection, and we didn't specify the access type
# default to S3 links
if len(s3_links) == 0 and len(https_links) > 0:
# This is guessing the S3 links for some cloud collections that for
# some reason only offered HTTPS links
return self._derive_s3_link(https_links)
else:
# we have the s3 links so we return those
return s3_links

# we assume, perhaps incorrectly, that we are in us-west-2
if self.cloud_hosted and access == "direct":
# this is a cloud collection, and we didn't specify the access type
# default to S3 links
if len(s3_links) == 0 and len(https_links) > 0:
# This is guessing the S3 links for some cloud collections that for
# some reason only offered HTTPS links
return self._derive_s3_link(https_links)
else:
# Even though we are in us-west-2, the user wants the HTTPS links used in-region.
# They are S3 signed links from TEA.
# <https://github.com/asfadmin/thin-egress-app>
return https_links
else:
# we are not in-region
if access == "direct":
# maybe the user wants to collect S3 links and use them later
# from the cloud
# we have the s3 links so we return those
return s3_links
else:
# we are not in us-west-2, even cloud collections have HTTPS links
return https_links
else:
# Even though we are in us-west-2, the user wants the HTTPS links used in-region.
# They are S3 signed links from TEA.
# <https://github.com/asfadmin/thin-egress-app>
return https_links

def dataviz_links(self) -> List[str]:
"""Placeholder.
Expand Down
Loading
Loading