Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granule Data Field Returns "https" when in region s3 data are available #883

Open
1 task done
meteodave opened this issue Nov 26, 2024 · 2 comments
Open
1 task done
Labels
type: bug Something isn't working

Comments

@meteodave
Copy link

meteodave commented Nov 26, 2024

Is this issue already tracked somewhere, or is this a new report?

  • I've reviewed existing issues and couldn't find a duplicate for this problem.

Current Behavior

I seem to be having an issue accessing LAADS data with earthaccess 0.12.0. The "Data" field of the granule returns only the "https" link but these data are in the Cloud according to earthdata search.

image

image

Expected Behavior

I would expect "Data" in the granule to return the S3 path link.

Steps To Reproduce

In Jupyter Notebook:

import earthaccess
from pprint import pprint
import boto3

auth = earthaccess.login(persist=True)
granules = earthaccess.search_data(concept_id = 'C2859273114-LAADS', temporal = ('2019-09-26','2019-09-27'))

if (boto3.client('s3').meta.region_name == 'us-west-2'):
    print("found US-West-2")
else: 
    print("US-West-2 not found")

print(granules[0])

Output:

found US-West-2
Collection: {'ShortName': 'XAERDT_L2_ABI_G16', 'Version': '1'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -147.0, 'Latitude': -72.0}, {'Longitude': -3.0, 'Latitude': -72.0}, {'Longitude': -3.0, 'Latitude': 72.0}, {'Longitude': -147.0, 'Latitude': 72.0}, {'Longitude': -147.0, 'Latitude': -72.0}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2019-09-25T23:50:00.000Z', 'EndingDateTime': '2019-09-26T00:00:00.000Z'}}
Size(MB): 49.1438646316528
Data: ['https://data.laadsdaac.earthdatacloud.nasa.gov/prod-lads/XAERDT_L2_ABI_G16/XAERDT_L2_ABI_G16.A2019268.2350.001.2023253054738.nc']

Environment

- OS:Debian GNU/Linux 11
- Python: 3.12.7
- earthaccess: 0.12.0

Additional Context

No response

@mfisher87 mfisher87 added the type: bug Something isn't working label Nov 26, 2024
@asteiker
Copy link
Member

asteiker commented Dec 5, 2024

@meteodave Thanks for reporting this. I don't think this is unique to your LAADS example, as I see the same results when searching for an ICESat-2 collection in the cloud. I believe that the search_data results are only grabbing the first data access URL, which would be the HTTPS link in this case. @betolink does that sound right to you? Regardless, the s3 URL should still be found and utilized when using earthaccess.open().

So, I don't know if this is truly a bug versus an enhancement that we need to make to search_data() to provide all data access URLs that exist for the granule results, including s3.

@betolink
Copy link
Member

betolink commented Dec 14, 2024

Thanks for stopping by the poster @meteodave it was great meeting you in person! Not a full answer but some clarifications, the data_links() method to defaults to "out-of-region" for the representation, this means we'll always see the output you're seeing, which perhaps is a bug! Internally however if we use .download(granules) or .open(granules) it will check if we are in-region... which is also tricky as some instances and frameworks hide the required metadata to know if we are in us-west-2 or not.

We are having conversations around what should be the default, the best option so far is to assume that we are in the cloud and try the s3:linksif they are reachable. As for the representation, we may need to change the default to follow the same logic or even show both like:

Collection: {'ShortName': 'XAERDT_L2_ABI_G16', 'Version': '1'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -147.0, 'Latitude': -72.0}, {'Longitude': -3.0, 'Latitude': -72.0}, {'Longitude': -3.0, 'Latitude': 72.0}, {'Longitude': -147.0, 'Latitude': 72.0}, {'Longitude': -147.0, 'Latitude': -72.0}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2019-09-25T23:50:00.000Z', 'EndingDateTime': '2019-09-26T00:00:00.000Z'}}
Size(MB): 49.1438646316528
Data: {
      S3: ['s3://nasa-s3-url/granule.nc'],
      HTTP: ['https://nasa-http-url/granule.nc']
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants