Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

483 refactor search #494

Merged
merged 30 commits into from
Apr 9, 2024
Merged

Conversation

doug-newman-nasa
Copy link
Contributor

@doug-newman-nasa doug-newman-nasa commented Mar 15, 2024

  • Refactored earthaccess/search.py so that Collection and Granule share a means of executing their query to CMR and iterate through results using the SearchAfter header.
  • Updated documentation to outline how to install and prep for development
  • Added VCR library to allow testing validity of API calls and result parsing without having to rely on
    • consistent CMR state
    • CMR availability

📚 Documentation preview 📚: https://earthaccess--494.org.readthedocs.build/en/494/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for fleshing this doc out. Over time this has become a sort of blind spot for us. ❤️

earthaccess/search.py Outdated Show resolved Hide resolved
response = get_results(self, limit)

results = list(
DataCollection(collection, self._fields) for collection in response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

CONTRIBUTING.md Outdated
Comment on lines 53 to 56
```
pytest tests/unit
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a Makefile, so this should be make test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following error if I use the make command

$ make test
bash scripts/test.sh
+ pytest tests/unit --cov=earthaccess --cov=tests --cov-report=term-missing -s --tb=native --log-cli-level=INFO
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --cov=earthaccess --cov=tests --cov-report=term-missing
  inifile: None
  rootdir: /Users/djnewman/Documents/code/earthaccess

make: *** [test] Error 4

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this may be a knock-on effect from your poetry problem mentioned elsewhere, such that pytest-cov didn't get installed.

Can you successfully run these commands, as given at the top of the guide (possibly with conda in place of mamba):

mamba env update -f ci/environment-dev.yml
mamba activate earthaccess-dev
poetry install

CONTRIBUTING.md Outdated
Comment on lines 63 to 69
In order to run the following scripts, run the following,
```
pip install mypy
mypy --install-types
pip install ruff
```
Then run these scripts to ensure correctx format, style and convention:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are already installed by poetry install as instructed earlier in this guide. Also, the commands below should be replaced with make lint format, which is what I've done in this PR (in review): #497

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, I was unable to get these things working post poetry install from a fresh build. If we want to consolidate on make <test|lint|format> I'm cool with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem did you encounter, because this works for me, as given at the top of the file (although with conda, not mamba)?

conda env update -f ci/environment-dev.yml
conda activate earthaccess-dev
poetry install

CONTRIBUTING.md Outdated
Comment on lines 84 to 87
2. Create a new virtual environment with conda/virtualenv
3. Activate environment and install earthaccess as editable source
1. `pip install -e .`
2. `poetry install` (note, if we use Poetry with conda we need to tell poetry to not use its own virtual environments)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are covered by the commands given earlier in the guide (not exactly these commands, but same result).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, I was unable to get these things working post poetry install from a fresh build. I consulted with @betolink and he provided the above instructions after I couldn't get the unit tests to run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps at the top of the guide work for me, so what specific steps did you follow and what error did you encounter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can put these instructions on a separate "If you are not using Poetry, here is another way of getting a dev environment". Ideally, we should have a "universal" bash script ala npm install dev that will take care of everything including virtual environment and dev installs. In any case hopefully this PR gets merged soon-ish and we can keep iterating on it!

CONTRIBUTING.md Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
@chuckwondo
Copy link
Collaborator

chuckwondo commented Apr 8, 2024

@doug-newman-nasa, I'd like to get your lovely enhancements landed soon. Any chance you can do the following things, which should get past the current build failures?

  • Add vcrpy to ci/environment-mindeps.yaml (unpinned and grouped with the rest of the # test dependencies in that file)
  • Run poetry lock --no-update to regenerate the poetry lock file

Once you make those changes, there may be further failures, but those 2 items are all that are currently visible in the build failures.

@chuckwondo
Copy link
Collaborator

@doug-newman-nasa, it looks like you still need to commit an updated poetry.lock file. Run poetry lock --no-update to update it and then commit it. However, before you push another commit, pull the latest changes from the upstream main branch and either rebase or merge those changes into your branch, as there are currently branch conflicts.

@chuckwondo
Copy link
Collaborator

@doug-newman-nasa, I don't think you pulled in changes from main correctly. I now see a bunch of commits in your PR that are changes that already exist in main from preceding PR merges.

@chuckwondo
Copy link
Collaborator

I see a bunch of unresolved merge conflict markers in the files you changed.

@chuckwondo
Copy link
Collaborator

Now that I've taken a closer look at your PR, I have a few change requests that I'll submit once you resolve the conflict markers.

doug-newman-nasa and others added 10 commits April 8, 2024 15:43
into a single function that utilizes Search-After
Updated documentation for linting.
using page_number parameter and were using
CMR-Search-After header appropriately
Co-authored-by: Chuck Daniels <[email protected]>
Co-authored-by: Chuck Daniels <[email protected]>
doug-newman-nasa and others added 13 commits April 8, 2024 15:49
Co-authored-by: Chuck Daniels <[email protected]>
Co-authored-by: Joseph H Kennedy <[email protected]>
the actual fix added. I suggest we find someone
to install from scratch and document any issues
with the instructions as they go along.
resolve issues with EDL and multiple people
running tests.
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Show resolved Hide resolved
Recorded requests due to change
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you pointed out that we're supporting only umm-json, there are 2 type hints I'd like to see made more specific, as noted in a couple of comments. Other than that, I think this looks good.

earthaccess/search.py Outdated Show resolved Hide resolved
earthaccess/search.py Outdated Show resolved Hide resolved
@chuckwondo
Copy link
Collaborator

@doug-newman-nasa, I'm seeing these test failures (CannotOverwriteExistingCassetteException):

2024-04-08T21:47:42.9829511Z =========================== short test summary info ============================
2024-04-08T21:47:42.9831472Z FAILED tests/unit/test_results.py::TestResults::test_get - vcr.errors.CannotOverwriteExistingCassetteException: Can't overwrite existing cassette ('tests/unit/fixtures/vcr_cassettes/MOD02QKM_2000.yaml') in your current record mode ('once').
2024-04-08T21:47:42.9833330Z No match for the request (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=0>) was found.
2024-04-08T21:47:42.9834210Z Found 2 similar requests with 1 different matcher(s) :
2024-04-08T21:47:42.9834531Z 
2024-04-08T21:47:42.9835037Z 1 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=0>).
2024-04-08T21:47:42.9836080Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'query']
2024-04-08T21:47:42.9836658Z Matchers failed :
2024-04-08T21:47:42.9836953Z headers - assertion failure :
2024-04-08T21:47:42.9838193Z {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} != {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'User-Agent': 'earthaccess v0.9.0'}
2024-04-08T21:47:42.9839186Z 
2024-04-08T21:47:42.9839713Z 2 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=2000>).
2024-04-08T21:47:42.9840605Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'headers']
2024-04-08T21:47:42.9841093Z Matchers failed :
2024-04-08T21:47:42.9841374Z query - assertion failure :
2024-04-08T21:47:42.9841948Z [('page_size', '0'), ('short_name', 'MOD02QKM')] != [('page_size', '2000'), ('short_name', 'MOD02QKM')]
2024-04-08T21:47:42.9843608Z FAILED tests/unit/test_results.py::TestResults::test_get_all_less_than_2k - vcr.errors.CannotOverwriteExistingCassetteException: Can't overwrite existing cassette ('tests/unit/fixtures/vcr_cassettes/TELLUS_GRAC.yaml') in your current record mode ('once').
2024-04-08T21:47:42.9845473Z No match for the request (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0>) was found.
2024-04-08T21:47:42.9846408Z Found 2 similar requests with 1 different matcher(s) :
2024-04-08T21:47:42.9846720Z 
2024-04-08T21:47:42.9847312Z 1 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0>).
2024-04-08T21:47:42.9848273Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'query']
2024-04-08T21:47:42.9848760Z Matchers failed :
2024-04-08T21:47:42.9849049Z headers - assertion failure :
2024-04-08T21:47:42.9850225Z {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} != {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'User-Agent': 'earthaccess v0.9.0'}
2024-04-08T21:47:42.9851237Z 
2024-04-08T21:47:42.9851869Z 2 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=2000>).
2024-04-08T21:47:42.9852849Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'headers']
2024-04-08T21:47:42.9853322Z Matchers failed :
2024-04-08T21:47:42.9853593Z query - assertion failure :
2024-04-08T21:47:42.9854391Z [('page_size', '0'), ('short_name', 'TELLUS_GRAC_L3_JPL_RL06_LND_v04')] != [('page_size', '2000'), ('short_name', 'TELLUS_GRAC_L3_JPL_RL06_LND_v04')]
2024-04-08T21:47:42.9856273Z FAILED tests/unit/test_results.py::TestResults::test_get_all_more_than_2k - vcr.errors.CannotOverwriteExistingCassetteException: Can't overwrite existing cassette ('tests/unit/fixtures/vcr_cassettes/CYGNSS.yaml') in your current record mode ('once').
2024-04-08T21:47:42.9858122Z No match for the request (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=CYGNSS_NOAA_L2_SWSP_25KM_V1.2&page_size=0>) was found.
2024-04-08T21:47:42.9859083Z Found 2 similar requests with 1 different matcher(s) :
2024-04-08T21:47:42.9859398Z 
2024-04-08T21:47:42.9860135Z 1 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=CYGNSS_NOAA_L2_SWSP_25KM_V1.2&page_size=0>).
2024-04-08T21:47:42.9861177Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'query']
2024-04-08T21:47:42.9861659Z Matchers failed :
2024-04-08T21:47:42.9861951Z headers - assertion failure :
2024-04-08T21:47:42.9863131Z {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} != {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'User-Agent': 'earthaccess v0.9.0'}
2024-04-08T21:47:42.9864128Z 
2024-04-08T21:47:42.9864835Z 2 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=CYGNSS_NOAA_L2_SWSP_25KM_V1.2&page_size=2000>).
2024-04-08T21:47:42.9865869Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'headers']
2024-04-08T21:47:42.9866370Z Matchers failed :
2024-04-08T21:47:42.9866666Z query - assertion failure :
2024-04-08T21:47:42.9867550Z [('page_size', '0'), ('short_name', 'CYGNSS_NOAA_L2_SWSP_25KM_V1.2')] != [('page_size', '2000'), ('short_name', 'CYGNSS_NOAA_L2_SWSP_25KM_V1.2')]
2024-04-08T21:47:42.9869630Z FAILED tests/unit/test_results.py::TestResults::test_get_more_than_2000 - vcr.errors.CannotOverwriteExistingCassetteException: Can't overwrite existing cassette ('tests/unit/fixtures/vcr_cassettes/MOD02QKM.yaml') in your current record mode ('once').
2024-04-08T21:47:42.9871417Z No match for the request (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=0>) was found.
2024-04-08T21:47:42.9872316Z Found 2 similar requests with 1 different matcher(s) :
2024-04-08T21:47:42.9872634Z 
2024-04-08T21:47:42.9873152Z 1 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=0>).
2024-04-08T21:47:42.9874007Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'query']
2024-04-08T21:47:42.9874503Z Matchers failed :
2024-04-08T21:47:42.9874801Z headers - assertion failure :
2024-04-08T21:47:42.9875982Z {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} != {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'User-Agent': 'earthaccess v0.9.0'}
2024-04-08T21:47:42.9876979Z 
2024-04-08T21:47:42.9877499Z 2 - (<Request (GET) https://cmr.earthdata.nasa.gov/search/granules.umm_json?short_name=MOD02QKM&page_size=2000>).
2024-04-08T21:47:42.9878382Z Matchers succeeded : ['method', 'scheme', 'host', 'port', 'path', 'headers']
2024-04-08T21:47:42.9878876Z Matchers failed :
2024-04-08T21:47:42.9879161Z query - assertion failure :
2024-04-08T21:47:42.9879738Z [('page_size', '0'), ('short_name', 'MOD02QKM')] != [('page_size', '2000'), ('short_name', 'MOD02QKM')]
2024-04-08T21:47:42.9880387Z ========================= 4 failed, 29 passed in 5.84s =========================

@betolink
Copy link
Member

betolink commented Apr 9, 2024

I thought @doug-newman-nasa had fixed those by ignoring the authentication headers and session cookies, is that not there Doug? +1 on making the tests less verbose, perhaps only be verbose on failures? (thinking VCR)

doug-newman-nasa added 2 commits April 9, 2024 09:37
from request evaluation in VCR.
pre-commit file updates
@chuckwondo
Copy link
Collaborator

@betolink, Doug and I discovered that User-Agent and Accept-Encoding also had to be ignored. Doug also reduced vcr's verbosity and ran pre-commit locally, so hopefully that's that last of the build errors solved.

@betolink betolink merged commit 493dd03 into nsidc:main Apr 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants