Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding earthaccess catalog in Intake 2 #352

Open
martindurant opened this issue Nov 10, 2023 · 5 comments
Open

Adding earthaccess catalog in Intake 2 #352

martindurant opened this issue Nov 10, 2023 · 5 comments

Comments

@martindurant
Copy link

I have written a little code which enables calling the earthaccess functions from within intake. The point of this, is that certain queries and dataset results could then be persisted in catalogs without having to keep code snippets around. The users still need to register and understand what the query parameters mean.

Do people here think this is a useful thing to do, and does the implementation look OK? Am I right in assuming that the DOI is the best unique identifier of a data product?

@MattF-NSIDC
Copy link

Nice! I haven't used Intake before, but excited to see more integrations :) What would using this look like?

Am I right in assuming that the DOI is the best unique identifier of a data product?

I think collection_concept_id is going to be the "best" unique identifier (as intended by the CMR API, not necessarily easiest-to-use). Under the hood, earthaccess is translating the doi query to a concept_id query by doing a collection search to get the concept_id.

collection = DataCollections().doi(doi).get()
if len(collection) > 0:
concept_id = collection[0].concept_id()
self.params["concept_id"] = concept_id

@martindurant
Copy link
Author

martindurant commented Nov 10, 2023

collection_concept_id is going to be the "best" unique identifier

Thanks, I'll use that.

The use pattern would be like

import intake.readers.catalogs
spec = intake.readers.catalogs.EarthdataCatalogReader(temporal=("2002-01-01", "2002-01-02"), ....)
cat = spec.read()
list(cat) # shows available identifiers, which all have metadata
reader = cat[<identifier>]
ds = reader.read() # outputs an xr.DataSet

Of course, the flow is nearly exactly the same as you have anyway, but the point is that spec and reader with their parameters can be saved in catalogs.

@ebo
Copy link

ebo commented Dec 6, 2023

I am working with provisional ATL07/10 data, and would like to set up some access to our local repositories. These are pre-decisional data, and cannot be added for general access. I have been looking for instructions and/or tutorials on how to set up intake/earthaccess to access local files/repositories, but have not figured it out yet, so I thought I would ask here .

As a note, it has been 5+ years since I worked on setting up any intake catalogs, so pointers to instructions on setting this out would be helpful. I will be glad to post tutorials and instructions once I get this worked out, but I will first have to get permission for the public release.

@martindurant
Copy link
Author

The general Earth catalog maker for Intake 2 is here: https://github.com/intake/intake/blob/745ebd42db371aa7d0f5d7d2ca8744103532819d/intake/readers/catalogs.py#L623

This calls earthaccess.search_datasets - so I don't know how you would change that to point to local resources.

@ebo
Copy link

ebo commented Dec 6, 2023

Thanks! This gives me a place to start. Ill post something here if I find a workable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants