Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider alternative names for search_data and search_datasets #770

Open
itcarroll opened this issue Jul 22, 2024 · 11 comments
Open

consider alternative names for search_data and search_datasets #770

itcarroll opened this issue Jul 22, 2024 · 11 comments
Labels
type: enhancement New feature or request
Milestone

Comments

@itcarroll
Copy link
Collaborator

Maybe we should think about the the top-level API; are these good names for these functions? Why not use common terminology like search_collections and search_granules? "Dataset" is often used to refer to a single data file. 🤔

~ @mfisher87 in #769

Since "granules" is not very generic either, an option borrowed from the STAC spec could be "search_collections" vs "search_items".

Seems like a Milesone 1.0 change though ...

@chuckwondo
Copy link
Collaborator

I agree that we should consider alternative function names. My preference would be to use the prefix find_ instead of search_, but not fussed about it.

@mfisher87
Copy link
Collaborator

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

@mfisher87 mfisher87 added this to the Version 1.0 milestone Jul 22, 2024
@itcarroll itcarroll added the type: enhancement New feature or request label Jul 22, 2024
@chuckwondo
Copy link
Collaborator

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

Aside from it being 2 letters shorter, in previous contexts, I've often seen DB client APIs using find (or find_by_X) as a naming convention, so it is anecdotally arguably more consistent with other things. However, search may be equally widely used, so again, I don't have any overtly strong preference. It's just a mild preference, perhaps more personal than logical.

@mfisher87
Copy link
Collaborator

Thanks for expounding!

@betolink
Copy link
Member

I like the idea of aligning with STAC, a while ago Scott suggested that and I think it'll be valuable to avoid cognitive load from users, the one thing I'm afraid is to deprecated existing names. I think we should try to not break the API to the extend possible while encouraging people to use the new conventions. See: #221

@asteiker
Copy link
Member

I know this has caused confusion in the past, even as others at NSIDC have come up to speed on the library, so I fully support updated names here! I like what @itcarroll proposed to align with STAC. We may also want to consider the language most commonly used within the NASA Earthdata ecosystem. You can't have Earthdata Search without "search", for example, so I'd be more keen on using this vs "find". As an aside, this seems like a great use case for #761 too.

@chuckwondo
Copy link
Collaborator

Just to clarify, is this the current proposal?

  • rename search_datasets to search_collections
  • rename search_data to search_items

If so, +1 from me.

@mfisher87
Copy link
Collaborator

mfisher87 commented Jul 23, 2024

As an aside, this seems like a great use case for #761 too.

🚀 💯

Just to clarify, is this the current proposal?

* rename `search_datasets` to `search_collections`

* rename `search_data` to `search_items`

If so, +1 from me.

+1

one thing I'm afraid is to deprecated existing names. I think we should try to not break the API to the extend possible while encouraging people to use the new conventions

I do worry that having multiple aliases for common features could lead to confusion, as people might think they do different things. I really like having "one correct way". I do believe a long deprecation period would be in order for top-level API things.

We need to probably have deeper discussions about how to communicate around time-until-deprecation. Should we always include a minimum date in our deprecation messages, e.g. DeprecationWarning(" ... Obsoletion will occur no sooner than YYYY-MM-DD.")?

Related #766

@andypbarrett
Copy link
Collaborator

I like the alignment of earthaccess terminology with STAC. collections already aligns in STAC and NASA-speak. However, as a newbie to STAC lingo, I find the usage of items unclear.

@chuckwondo
Copy link
Collaborator

I have no strong preference on search v find, but I am curious what reasoning underlies your preference @chuckwondo?

Aside from it being 2 letters shorter, in previous contexts, I've often seen DB client APIs using find (or find_by_X) as a naming convention, so it is anecdotally arguably more consistent with other things. However, search may be equally widely used, so again, I don't have any overtly strong preference. It's just a mild preference, perhaps more personal than logical.

@mfisher87, after looking at the proposed new names again -- search_collections and search_items -- I now have an arguably better reason for preferring find_collections and find_items: The term search_collections is arguably ambiguous in terms of the types of "things" it will find. Does it search the available collections to find things within collections, or does it search for collections?

This analogy might be a bit of a stretch, but consider the case of security procedures at a place/event, where people may be subject to a "bag search." In that context, nobody is searching for bags, they are searching within bags (for banned "items"). Thus, the security folks running a search_bags function don't expect the result to be a "list of bags," but rather a list of "banned items" within given bags.

Thus, I would argue that a "collection search" implemented by a function named search_collections could easily be misinterpreted to mean a search for items within collections, not a search for collections, or to simply cause someone to wonder which interpretation is correct, if they recognize the ambiguity. Thus, the name find_collections arguably eliminates such ambiguity by explicitly stating what we expect to find: collections. (Similarly for find_items.)

I like the alignment of earthaccess terminology with STAC. collections already aligns in STAC and NASA-speak. However, as a newbie to STAC lingo, I find the usage of items unclear.

@andypbarrett, I agree that "items" is perhaps too generic for many folks. Although "collections" is perhaps no less generic a term, anecdotally, it may feel more specific to most folks dealing with this information. I don't have any particular preference or suggestion for a better term than "items," but if you have any suggestions, please share so we can "vote" on it here.

@mfisher87
Copy link
Collaborator

Thus, I would argue that a "collection search" implemented by a function named search_collections could easily be misinterpreted to mean a search for items within collections, not a search for collections, or to simply cause someone to wonder which interpretation is correct, if they recognize the ambiguity. Thus, the name find_collections arguably eliminates such ambiguity by explicitly stating what we expect to find: collections. (Similarly for find_items.)

💯 This is an excellent point. I'm on team find now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants