-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand Variables class to read s3 urls #464
Conversation
Co-authored-by: Wei Ji <[email protected]>
Co-authored-by: Jessica Scheick <[email protected]>
Refactor Variables class to be user facing functionality
This comment is to start documenting timing for s3 reads. Below I compare wall time across 3 ways of initializing a Variables instance and listing the available variables. Times
Notes: Timing done using the HighlightsInitializing a Variables object with an s3 path is ~14 times slower than a with a local file, but the I still think we should merge this PRThis is not super hopeful news off the bat, but I still think we should merge this and work on variable reads. The alternate approach to reading the product from the file is to use the filepath, but that had it's own set of dev challenges and also restricts icepyx to only NSIDC s3 data. I'm curious to get a more holistic view of read times before trying to optimize any particular piece. |
Alright @JessicaS11 this is ready for review! My only uncertainty is that Travis is hanging again 😕 |
d5747fa
to
142b7ab
Compare
@rwegener2 This is super close. Can you confirm it all still looks good from your perspective after all the merge conflicts, rebasing, etc.? I think we just need to update the line in our variables example to note that you can create a Variables object from local file, s3 url, or product+version. UPDATE: I randomly tried to read in a file while on this branch, and it forced me to authenticate even though I was reading a local file. Not sure if that's something we inadvertently introduced here, but figured it was the logical place to record it until one of us digs in! |
Thanks for helping move this to the finish line @JessicaS11! I think this looks ready to merge.
TLDR; This was unintended and it is not desired. As a fix I added an if statement in the init of Variables that checks if the path is s3 before authenticating. Good catch. So it seems like one of the design elements I was most proud of in the auth module is working against us right now. We setup The fix I am pushing adds an if statement in the Variables init to check if the path is an s3 path and creates a local variable that is either
It's possible one of these other ones is better, but my feeling right now is to wait on one of these bigger restructures to see if there are other methods that get created as the cloud features of icepyx develop. The needs of those other functions could help influence how we approach these situations. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This makes sense to me, in combination with
It's possible that some decorators could ultimately help us with this kind of issue - especially if the "check" consistently stays as "does this start with "s3"?". I added a few notes in the vars example. It looks like this is ready to go! |
* expand extract_product and extract_version to check for s3 url * add cloud notes to variables notebook --------- Co-authored-by: Jessica Scheick <[email protected]>
This reverts commit 8d7db9c.
Goal
This PR generalizes the Variables class to display available variables from an s3 path. If an s3 path is given the Variables class uses the product and version to display available variables; it does not try to walk through the file and build a the variables tree. If a user tries to request data from outside the NSIDC bucket a warning is raised indicating that available variable lists may not be accurate.
How it was done
So simply! I'm excited about how easy this was, because it shows how worth it the efforts were to 1) create an auth Mixin and 2) make Variables an independent class 🙂
The biggest changes were in
is2ref.py
, in which theextract_product
andextract_version
functions were generalized to read from s3. They now accept an optionalauth
argument.How it can be tested
As a template for anyone ever writing future tests, the extract_product and extract_version functions should also be tested, for example as: