Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect types passed by end-users at runtime are not caught until our code tries to do incompatible things with those objects (e.g. .upper() on an Auth instance) #420

Open
JessicaS11 opened this issue Dec 22, 2023 · 13 comments
Labels
type: bug Something isn't working
Milestone

Comments

@JessicaS11
Copy link
Collaborator

In testing for implementing cloud read-in in icepyx using CryoCloud, @rwegener2 and I discovered something changed between 0.6.2 and 0.8.2 that causes a failure where code ran successfully previously. In particular, the line: s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth), where auth is an Auth instance, fails with an AttributeError when it tries to run the string operation upper() on the auth object (see full traceback here). I haven't dug into recent changes but wanted to report here in case it's already known/being resolved.

@mfisher87
Copy link
Collaborator

mfisher87 commented Dec 23, 2023

Thanks for the report! provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have! https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/#earthaccess.api.get_s3fs_session

I think we need to think more about how this (and other) APIs in earthaccess could catch runtime type errors before they become confusing messages later. Failing early with "Unexpected type (Auth) received for provider. Expected str." would be a big improvement to user experience. Is there like a library we can use where we can throw a decorator on a type-annotated function to also get runtime checking? We do have the type annotations, so users can also use a typechecker to discover such bugs, which is better than nothing 🎉 :)

For reference, this is the change that introduced the new call to .upper() under the valid (but in this specific case, incorrect) assumption that it was operating on a string.

@mfisher87 mfisher87 changed the title function (upper()) unavailable for auth object in api.py Incorrect types passed by end-users at runtime are not caught until our code tries to do incompatible things with those objects (e.g. .upper() on an Auth instance) Dec 23, 2023
@mfisher87 mfisher87 added the type: bug Something isn't working label Dec 23, 2023
@mfisher87
Copy link
Collaborator

Does this new description of the problem work for everyone?

@JessicaS11
Copy link
Collaborator Author

provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have!

We were supplying a valid daac, so given the logic it never tried to use our invalid provider input (then lines 213+, now lines 251+, in store.py):

            elif daac is not None:
                s3_credentials = self.auth.get_s3_credentials(daac=daac)
            elif provider is not None:
                s3_credentials = self.auth.get_s3_credentials(provider=provider)

Thanks for linking to the docs page - since we're already providing a daac, we could just remove the provider.

That said, I'll keep my eye on this for catching runtime type errors. : )

@mfisher87
Copy link
Collaborator

Ahh, that makes sense. Thanks! :)

@rwegener2
Copy link

Thanks to you both for investigating this error!

This feels like an issue of Python as a language not enforcing types. I see earthaccess already provides type hints (🎉). One library I've used on past projects to enforce the hints is pydantic, which worked well and has some nice features like coercing types. A brief google search shows this isn't the only option (Ex. typeguard). These could be ways to automatically raise errors when arguments don't match the specified types.

@mfisher87
Copy link
Collaborator

I love Pydantic!

Haven't use typeguard before though. I love the simplicity of adding @typechecked to a function with annotations and having the runtime type checks added magically 👨‍🍳 🤌 I'm also reading about beartype whose main selling point seems to be speed. Both have had recent releases. beartype seems to default to checking everything, no decorator required. That's kind of nice.

@mfisher87
Copy link
Collaborator

Here's the pydantic feature I think we'd use: https://docs.pydantic.dev/latest/api/validate_call/

@mfisher87
Copy link
Collaborator

This validation could help with #279 as well! Especially if our runtime validation tool can also coerce types.

@mfisher87
Copy link
Collaborator

Looks like Pydantic will do this. We may only want this at the API level, because I feel it could be confusing to debug the code with multiple levels of validation.

https://docs.pydantic.dev/latest/concepts/validation_decorator/#argument-types

As with the rest of Pydantic, types can be coerced by the decorator before they're passed to the actual function

@chuckwondo
Copy link
Collaborator

Given pydantic's massive size, I'd caution against pulling it into the mix, especially if you use it only for the @validate_call decorator. If you simply want runtime type validation of function arguments, then I recommend either typeguard or beartype (as mentioned in previous comments), as these are much smaller libraries, since they have much narrower functionality than pydantic.

If you do want to leverage other capabilities of pydantic, then I'd suggest instead looking at msgspec, which is much more performant and much smaller than pydantic. It doesn't validate function arguments though, so you would still need to use typeguard or beartype for such validation.

@chuckwondo
Copy link
Collaborator

Given the major speed advantage of beartype over typeguard, beartype might be the better choice. It also appears to be the most actively developed and most PEP-compliant of other choices. See https://beartype.readthedocs.io/en/latest/moar/

@asteiker
Copy link
Member

asteiker commented Jan 7, 2025

@JessicaS11 @chuckwondo I'm doing some bug grooming and wanted to check in on this issue. It sounds like this was worked around on the icepyx side, but there is lingering discussion on the use of a validator. Have we integrated anything like this within the past year? If still a relevant thread, I may suggest either moving this to a discussion and/or labeling as needing a decision to move forward.

@mfisher87
Copy link
Collaborator

mfisher87 commented Jan 7, 2025

We haven't yet integrated runtime type checking. I think this is ready for implementation without decision committee. I think whoever implements this can choose the library of their preference (pydantic, beartype, etc., as long as it can read Python type annotations). If we decide during review to try another library, the swap should be easy (i.e. replace a lib and change decorators) and would enable us to do a "show, don't tell" comparison.

This would be a fun hack day activity!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants