Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CMR-related type hints #510

Merged
merged 12 commits into from
Apr 16, 2024
Merged

Conversation

chuckwondo
Copy link
Collaborator

@chuckwondo chuckwondo commented Apr 8, 2024

There were a number of type hints in search.py and api.py related to CMR queries that were incorrect. These were fixed. In addition, there were a number of other static type errors that were masked because of ignored cmr imports. Added type stubs for python_cmr library to unmask and address these additional type errors.

Limited static type changes as much as possible to only functions and methods dealing with CMR queries and results to keep this PR manageable.

Fixes #508


📚 Documentation preview 📚: https://earthaccess--510.org.readthedocs.build/en/510/

@chuckwondo chuckwondo force-pushed the fix-cmr-type-hints branch from d8d99e1 to fae6b73 Compare April 9, 2024 23:14
@chuckwondo
Copy link
Collaborator Author

pre-commit.ci autofix

@chuckwondo
Copy link
Collaborator Author

pre-commit.ci run

@chuckwondo
Copy link
Collaborator Author

There seems to be some sort of discrepancy between pre-commit ci's ruff hook and locally running ruff because the pre-commit failure was due to import formatting that doesn't cause a problem locally. Specifically, pre-commit ci wanted to remove a blank line on line 5 of tests/unit/test_results.py, but locally running ruff wants to insert that blank line.

No idea why that's the case, but I'm concerned that this might cause a persistent build issue because now that I've pulled the autofix that I triggered above, and run make lint, I now get:

$ make lint
bash scripts/lint.sh
+ mypy earthaccess stubs tests
Success: no issues found in 31 source files
+ ruff check .
tests/unit/test_results.py:1:1: I001 [*] Import block is un-sorted or un-formatted
Found 1 error.
[*] 1 fixable with the `--fix` option.

That's failing locally on the autofix that pre-commit ci applied because the autofix removed a blank line. If I were to "fix" that locally and re-push, then pre-commit ci would again fail again because my local "fix" would insert the blank line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the changes in this file do not materially change the tests. This is simply a refactoring of the code to follow the convention used for integrating vcrpy with Unittest: https://vcrpy.readthedocs.io/en/latest/usage.html#unittest-integration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this ends up renaming the cassettes, but the contents of the cassettes have not changed, although 1 cassette was added that didn't previously exist.

@betolink betolink requested review from jhkennedy and betolink April 12, 2024 15:47
betolink
betolink previously approved these changes Apr 12, 2024
Copy link
Member

@betolink betolink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR cleans up and improves the typing used in earthaccess. The only observation/question I have is if from now on we'll have to write Python interfaces for new classes? (.pyi) Would we need some extra instructions on the "how to contribute" section? Python and typing doesn't feel intuitive for newcomers.

from typing import Dict, List, Mapping, Sequence, Tuple
else:
from builtins import dict as Dict, list as List, tuple as Tuple
from collections.abc import Mapping, Sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of remember an issue with collections.abc in the past, I guess this abstraction fixes it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I pulled all imports from typing, typing_extensions, and collections.abc into this module so that throughout the earthaccess code, we simply import from .typing_ (or earthaccess.typing_) to avoid having to deal with all of the changes that have been made across Python versions.

There are various "collection" types prior to 3.9 that were exposed from the typing module, but as of 3.9, the recommendation is to import them from collections.abc. There's also the transition from using List, Dict, and Tuple to list, dict, and tuple, respectively, along with various other changes that are annoying to keep up with.

Therefore, this typing_ module centralizes all such logic in one place, so throughout the rest of the code base, we can ignore it all and import all types from typing_.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the rest of this module, but I'm not sure how I feel about smoothing over the transition to built-in types like this:

from builtins import dict as Dict, list as List, tuple as Tuple

I think I prefer having the codebase use the style of the oldest version of Python we support, i.e. from typing import List at the moment, then letting Ruff handle autofixing that (using the UP ruleset IIRC) when we drop support for older versions. With this module helping us transition, Ruff won't be able to apply those fixes, and we'll need to remember to come back and clean up parts of this module when we drop support for 3.8 and no longer need the transitional code.

Anyway, I don't care about this deeply enough to request a change here, this is totally fine :) Just a feeling.

earthaccess/api.py Show resolved Hide resolved
stubs/cmr/queries.pyi Show resolved Hide resolved

def point(self, lon: str, lat: str) -> Type[GranuleQuery]:
@override
def point(self, lon: FloatLike, lat: FloatLike) -> Union[Self, Never]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with FloatLike are "1.2" and 1.2 valid values? I forgot if python_cmr will parse the string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both are valid values, as python_cmr simply passes the values to the float function/constructor.

Copy link
Collaborator Author

@chuckwondo chuckwondo Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, the FloatLike type alias (Union[str, SupportsFloat]) indicates that we can pass either an str value (which could cause an exception if it's a string that cannot be parsed as a float), or any instance of a type that implements the __float__ method, which the float function/constructor defers to.

Generally speaking, this means that aside from str (which does not implement __float__, thus the need for the Union above), we'll take an int or a float as a value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to have to include interfaces from now on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. Do you mean do we have to define interfaces within earthaccess? Or do you mean do we have to include .pyi files? Either way, the answer is no. However, that does remind me that I forgot to add an empty py.typed file to this PR. By adding such a file, we're indicating that our code includes its own type hints, and there is no need for a separate type stub package.

"""
if not isinstance(doi, str):
raise TypeError("doi must be of type str")

self.params["doi"] = doi
return self

def instrument(self, instrument: str) -> Type[CollectionQuery]:
def instrument(self, instrument: str) -> Union[Self, Never]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen Never is that the type for not used, None?

Copy link
Collaborator Author

@chuckwondo chuckwondo Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Never (or NoReturn, prior to Python 3.11) indicates that the function/method may throw an exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@mfisher87 mfisher87 Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand Something | Never annotation. My understanding is the reason to annotate def foo() -> Never: is to enable mypy to throw an error on f = foo(), since foo can't return. But if a function sometimes returns Something and sometimes returns Never, what additional checking does that enable?

My understanding is as a "bottom type", Something | Never is identical to Something!

I've read an argument that it serves to help document for humans that a possible outcome is an exception, but IMO that's a shortcoming of the type system that shouldn't be documented using the type system, since if you remove the possibility of an exception from the function and forget to also update the annotation, the annotation will be "wrong" (no longer serving its intended purpose) and the typechecker won't know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, a return type of Something | Never does not indicate that it is possible for a function to return Never. It indicates that the function possibly never returns at all (typically indicative of the possibility of raising an exception instead of returning a value).

However, I agree that Python's type system has shortcomings, so using Never really only serves to help the reader, but in thinking more about this, and seeing that an IDE may not even show Never in the function's return type when it pops up info when hovering your mouse pointer over a function call (at least VS Code doesn't show it, perhaps because it is a bottom type), then it would seem that this perhaps is not even helpful to the reader (unless directly reading the source code where the hint is written).

Further, given that a return type of Something | Never can be very confusing, since in all other cases, a union type indicates the possible types of return values, including Never doesn't actually makes sense since you can never return Never.

Therefore, I'm going to drop the use of Never, but I'll keep the Raises sections that I added to the docstrings, since that indeed does show up in the little mouse-over popups in IDEs, and is helpful to include in generated docs.

if you remove the possibility of an exception from the function and forget to also update the annotation, the annotation will be "wrong" (no longer serving its intended purpose) and the typechecker won't know.

As an aside, this isn't a particularly strong argument for removing Never because this issue is not unique to Never. There are currently plenty of places in the current codebase where mypy doesn't complain, but should, due to incorrect return type hints. In fact, this PR corrected many of them that mypy wasn't complaining about because of the very loose configuration of mypy, so this issue is also a matter of making sure we're also configuring the type checker effectively.

I added a bit of strictness with this PR, but not too much. Only enough to help weed out some of the incorrect hints scoped within the context of this PR. I plan to incrementally increase strictness so that we don't have to touch too many parts of the code at once.

If we don't improve our type hints, then we may be doing more harm than good since incorrect type hints are perhaps worse than no type hints, as with documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, I'm going to drop the use of Never, but I'll keep the Raises sections that I added to the docstrings, since that indeed does show up in the little mouse-over popups in IDEs, and is helpful to include in generated docs.

That sounds great to me, thanks so much for looking in to IDE behavior around these annotations. These details of the developer experience aren't always paid attention to, and I love to see when they are! ❤️

I added a bit of strictness with this PR, but not too much. Only enough to help weed out some of the incorrect hints scoped within the context of this PR. I plan to incrementally increase strictness so that we don't have to touch too many parts of the code at once.

If we don't improve our type hints, then we may be doing more harm than good since incorrect type hints are perhaps worse than no type hints, as with documentation.

💯 * 💯 Couldn't agree more with the analysis and the approach. We need this and I really appreciate you're spearheading the work!

@mfisher87
Copy link
Collaborator

I want to say I'll review it this weekend, but I have some important paperwork due very soon that I've been postponing for far too long...

Let's see how that goes :)

@chuckwondo
Copy link
Collaborator Author

I want to say I'll review it this weekend, but I have some important paperwork due very soon that I've been postponing for far too long...

Let's see how that goes :)

I appreciate your critical eyes! 👀

chuckwondo and others added 8 commits April 15, 2024 09:41
There were a number of type hints in `search.py` and `api.py` related to
CMR queries that were incorrect. These were fixed. In addition, there
were a number of other static type errors that were masked because of
ignored `cmr` imports. Added type stubs for `python_cmr` library to
unmask and address these additional type errors.

Limited static type changes as much as possible to only functions and
methods dealing with CMR queries and results to keep this PR manageable.

Fixes nsidc#508
- wrap docstrings at 88 characters per ruff configuration
- remove `Never` from return types where method raises only for invalid input types or values
- add missing `Never` return types where method raises for server-side errors
Use of the session during CMR paged queries was accidentally removed
with the introduction of Search-After functionality.
@chuckwondo
Copy link
Collaborator Author

pre-commit.ci autofix

@mfisher87
Copy link
Collaborator

👀

mfisher87
mfisher87 previously approved these changes Apr 15, 2024
Copy link
Collaborator

@mfisher87 mfisher87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great! One thing I'd like to see is some documentation about the purpose of and process for maintaining the stubs/ directory. It's not immediately obvious to me that these are type stubs for python_cmr, because the import for that library is simply cmr, and my brain doesn't make that jump intuitively or immediately. It would also be good to document the long-term intention to upstream these instead of keeping them around!

Perhaps this could go in CONTRIBUTING.md. As I'm thinking that, I also think we've outgrown this single-file contributing doc, but that's a different issue (#521)

earthaccess/api.py Show resolved Hide resolved
earthaccess/typing_.py Outdated Show resolved Hide resolved
from typing import Dict, List, Mapping, Sequence, Tuple
else:
from builtins import dict as Dict, list as List, tuple as Tuple
from collections.abc import Mapping, Sequence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the rest of this module, but I'm not sure how I feel about smoothing over the transition to built-in types like this:

from builtins import dict as Dict, list as List, tuple as Tuple

I think I prefer having the codebase use the style of the oldest version of Python we support, i.e. from typing import List at the moment, then letting Ruff handle autofixing that (using the UP ruleset IIRC) when we drop support for older versions. With this module helping us transition, Ruff won't be able to apply those fixes, and we'll need to remember to come back and clean up parts of this module when we drop support for 3.8 and no longer need the transitional code.

Anyway, I don't care about this deeply enough to request a change here, this is totally fine :) Just a feeling.

pyproject.toml Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
scripts/lint.sh Show resolved Hide resolved
tests/unit/test_results.py Show resolved Hide resolved
@betolink
Copy link
Member

Should we merge this? @mfisher87 @chuckwondo

@chuckwondo
Copy link
Collaborator Author

I think this looks great! One thing I'd like to see is some documentation about the purpose of and process for maintaining the stubs/ directory. It's not immediately obvious to me that these are type stubs for python_cmr, because the import for that library is simply cmr, and my brain doesn't make that jump intuitively or immediately. It would also be good to document the long-term intention to upstream these instead of keeping them around!

Perhaps this could go in CONTRIBUTING.md. As I'm thinking that, I also think we've outgrown this single-file contributing doc, but that's a different issue (#521)

I'm a bit torn on how we should handle this.

I started adding a section to CONTRIBUTING.md with details on how a contributor should either add a "sidecar" type stubs package, if available (e.g., types-requests for requests), or generate stubs (like I did in this PR for python_cmr), but the more I wrote, the more I thought that perhaps this is too much detail for adding to the guide.

It's starting to feel like we're including information that is documented elsewhere about dealing with type stubs in Python in general, and that we would just be repeating such general documentation (and perhaps not as well). It feels akin to including a section on "how to program in Python" because we use Python in this repo. A contributor should look elsewhere for such information.

If a contributor has difficulty resolving type check failures, they should reach out by asking a question in an issue, discussion, or PR, and one of us more familiar with type checking can assist. This should wash out during PR reviews, just like any other aspect of Python programming in general.

Regarding "promotion" of the included python_cmr stubs to either typeshed, or directly upstream to python_cmr, I suggest that perhaps we simply capture that in a new issue, rather than describing it here.

What say you?

@chuckwondo
Copy link
Collaborator Author

Should we merge this? @mfisher87 @chuckwondo

Not just yet. We're close, but there are a couple more questions open, which I posted in response to a couple of questions from @mfisher87

@mfisher87
Copy link
Collaborator

It's starting to feel like we're including information that is documented elsewhere about dealing with type stubs in Python in general, and that we would just be repeating such general documentation (and perhaps not as well). It feels akin to including a section on "how to program in Python" because we use Python in this repo. A contributor should look elsewhere for such information.

I don't think we should duplicate information (as much as possible), more like a sentence or two that's relevant to our project's more curious contributors. Even if it's simply "We added type stubs for the untype python-cmr library, which we intend to eventually upstream. You shouldn't need to worry about this, but check here and here to learn more."

@chuckwondo
Copy link
Collaborator Author

@mfisher87, I believe I've addressed everything now. The most recent tweaks:

  • added a couple of sentences to the contributing guide on type stubs and the cmr stubs we have included
  • removed the typing_ module I had originally added to this PR with simply using typing_extensions (and also added it as a direct dependency, as we were relying on it as a transitive dep) -- I didn't globally make use of typing_extensions (or even look for other places where typing is used), but only used it in places where I had originally made use of the now-removed typing_ module

Copy link
Collaborator

@mfisher87 mfisher87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks so much Chuck!! 😍


We have included type stubs for the untyped `python-cmr` library, which we
intend to eventually upstream. Since `python-cmr` exposes the `cmr` package,
the stubs appear under `stubs/cmr`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 to both new paragraphs, thanks!

@chuckwondo
Copy link
Collaborator Author

Thanks @mfisher87! I'm willing to take on the responsibility of having merge privileges, if you'd like to grant them, so I can squash-merge my changes.

@mfisher87
Copy link
Collaborator

Invitation sent!

@chuckwondo chuckwondo merged commit 0e4f392 into nsidc:main Apr 16, 2024
11 checks passed
doug-newman-nasa pushed a commit to doug-newman-nasa/earthaccess that referenced this pull request Apr 19, 2024
There were a number of type hints in `search.py` and `api.py` related to
CMR queries that were incorrect. These were fixed. In addition, there
were a number of other static type errors that were masked because of
ignored `cmr` imports. Added type stubs for `python_cmr` library to
unmask and address these additional type errors.

In addition:

- Aligned vcrpy usage with VCRTestCase as per
  https://vcrpy.readthedocs.io/en/latest/usage.html#unittest-integration
- Restored use of session for CMR paged queries, which was accidentally
  removed with the introduction of Search-After functionality.
- Wrapped a number of docstrings at 88 characters per ruff configuration

Fixes nsidc#508
@chuckwondo chuckwondo deleted the fix-cmr-type-hints branch April 22, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect type hints for CMR queries
3 participants