Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance search_data in EarthAccess to Include Associated XML Paths #367

Open
emanueleromito opened this issue Nov 23, 2023 · 3 comments
Open

Comments

@emanueleromito
Copy link

emanueleromito commented Nov 23, 2023

I'm currently using the earthaccess library to access MODIS data in my project. In my workflow, I use both HDF paths and the XML paths associated with the HDF files. However, when I use the search_data function from the library, the results only provide the HDF paths.

import earthaccess

results = earthaccess.search_data(
    provider=provider,
    short_name='MCD12Q1',
    count=10
)

uri = granule.data_links()

And what I get is:
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/MCD12Q1.061/MCD12Q1.A2001001.h01v09.061.2022146025902/MCD12Q1.A2001001.h01v09.061.2022146025902.hdf']

This is certainly fine, but it would be nice to have an option that gives you access to the .xml-related file also, or at least the capability to download that file passing the DataGranule related to the hdf file.

@mfisher87
Copy link
Collaborator

mfisher87 commented Nov 23, 2023

Thanks for the report!

I think it'd be awesome if we provided an easily-accessible escape hatch to view the raw CMR results that earthaccess queried for situations like this where our assumptions don't line up with end-users' use cases. Without the escape hatch, users have to wait to use earthaccess until we adapt to support their use case. With the escape hatch, they can begin using earthaccess with a minor "hack" and later on remove it when we support their use case fully.

What do you all think? I don't think we currently support this, but maybe we do, I just didn't find it in the docs and am not planning on source diving today :)

I'm thinking the implementation might be DataGranule having a .raw or .cmr_json attribute/property that contains the parsed JSON from CMR for that granule. Same for collections!

@betolink
Copy link
Member

Hi @emanueleromito,

All that information is still available in the results, earthaccess is only accessing part of it. To get to the XML companion files we can do something like this:

import earthaccess

earthaccess.login()

results = earthaccess.search_data(
    short_name="MCD12Q1",
    count=10
)

for granule in results:
    print(granule["umm"]["RelatedUrls"])

all the granules have a "meta" and a "umm" dictionaries with all the data we need. If you want to filter only those XML and hdf we can download them with:

links = []
for granule in results:
    urls = [link["URL"] for link in granule["umm"]["RelatedUrls"] if (link["URL"].endswith((".xml", ".hdf")) and link["URL"].startswith("https"))]
    links.extend(urls)

earthaccess.download(links, "./MCD12Q1")

and that's it, let us know if this works for you.

@MattF-NSIDC
Copy link

all the granules have a "meta" and a "umm" dictionaries with all the data we need.

Awesome! This does appear to be undocumented. Or perhaps a limitation of search. I'm thinking we could use a how-to on this. Or perhaps we should expose those as properties that will be picked up by our API autodoc setup? Or both. #368

@MattF-NSIDC MattF-NSIDC removed the type: enhancement New feature or request label Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants