Enhance search_data in EarthAccess to Include Associated XML Paths #367

emanueleromito · 2023-11-23T09:34:56Z

I'm currently using the earthaccess library to access MODIS data in my project. In my workflow, I use both HDF paths and the XML paths associated with the HDF files. However, when I use the search_data function from the library, the results only provide the HDF paths.

import earthaccess

results = earthaccess.search_data(
    provider=provider,
    short_name='MCD12Q1',
    count=10
)

uri = granule.data_links()

And what I get is:
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/MCD12Q1.061/MCD12Q1.A2001001.h01v09.061.2022146025902/MCD12Q1.A2001001.h01v09.061.2022146025902.hdf']

This is certainly fine, but it would be nice to have an option that gives you access to the .xml-related file also, or at least the capability to download that file passing the DataGranule related to the hdf file.

The text was updated successfully, but these errors were encountered:

mfisher87 · 2023-11-23T18:44:59Z

Thanks for the report!

I think it'd be awesome if we provided an easily-accessible escape hatch to view the raw CMR results that earthaccess queried for situations like this where our assumptions don't line up with end-users' use cases. Without the escape hatch, users have to wait to use earthaccess until we adapt to support their use case. With the escape hatch, they can begin using earthaccess with a minor "hack" and later on remove it when we support their use case fully.

What do you all think? I don't think we currently support this, but maybe we do, I just didn't find it in the docs and am not planning on source diving today :)

I'm thinking the implementation might be DataGranule having a .raw or .cmr_json attribute/property that contains the parsed JSON from CMR for that granule. Same for collections!

betolink · 2023-11-27T16:16:39Z

Hi @emanueleromito,

All that information is still available in the results, earthaccess is only accessing part of it. To get to the XML companion files we can do something like this:

import earthaccess

earthaccess.login()

results = earthaccess.search_data(
    short_name="MCD12Q1",
    count=10
)

for granule in results:
    print(granule["umm"]["RelatedUrls"])

all the granules have a "meta" and a "umm" dictionaries with all the data we need. If you want to filter only those XML and hdf we can download them with:

links = []
for granule in results:
    urls = [link["URL"] for link in granule["umm"]["RelatedUrls"] if (link["URL"].endswith((".xml", ".hdf")) and link["URL"].startswith("https"))]
    links.extend(urls)

earthaccess.download(links, "./MCD12Q1")

and that's it, let us know if this works for you.

MattF-NSIDC · 2023-11-27T17:05:18Z

all the granules have a "meta" and a "umm" dictionaries with all the data we need.

Awesome! This does appear to be undocumented. Or perhaps a limitation of search. I'm thinking we could use a how-to on this. Or perhaps we should expose those as properties that will be picked up by our API autodoc setup? Or both. #368

github-project-automation bot added this to earthaccess project Nov 23, 2023

github-project-automation bot moved this to 🆕 New in earthaccess project Nov 23, 2023

mfisher87 added the type: enhancement New feature or request label Nov 23, 2023

MattF-NSIDC mentioned this issue Nov 27, 2023

Document DataGranule "umm" and "meta" members #368

Open

MattF-NSIDC removed the type: enhancement New feature or request label Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance search_data in EarthAccess to Include Associated XML Paths #367

Enhance search_data in EarthAccess to Include Associated XML Paths #367

emanueleromito commented Nov 23, 2023 •

edited

Loading

mfisher87 commented Nov 23, 2023 •

edited

Loading

betolink commented Nov 27, 2023

MattF-NSIDC commented Nov 27, 2023

Enhance search_data in EarthAccess to Include Associated XML Paths #367

Enhance search_data in EarthAccess to Include Associated XML Paths #367

Comments

emanueleromito commented Nov 23, 2023 • edited Loading

mfisher87 commented Nov 23, 2023 • edited Loading

betolink commented Nov 27, 2023

MattF-NSIDC commented Nov 27, 2023

emanueleromito commented Nov 23, 2023 •

edited

Loading

mfisher87 commented Nov 23, 2023 •

edited

Loading