Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read arguments #444

Merged
merged 37 commits into from
Oct 18, 2023
Merged

Read arguments #444

merged 37 commits into from
Oct 18, 2023

Conversation

rwegener2
Copy link
Contributor

@rwegener2 rwegener2 commented Sep 5, 2023

What was Done

This PR changes the input parameters for the Read module in two ways:

  1. combine the filepath_pattern and data_source arguments into a single parameter (maintained the name data_source).
  2. we are no longer going to require the product argument from the user. It will still be optional, both for the purposes of deprecation, and also to be used in the event that a user provides a list of files that do not all contain the same product.

The options for the new version of thedata_source parameter are:

  • a string to a single local file
  • a string to a directory
  • a glob string
  • a list of files

Unsupported options for data_source are:

  • user provides a list of directories
  • user provides a Query object (a question about this in the Notes section below)

How It was Done

The majority of the changes are all in the __init__ method. The overall flow of the method is now:

  1. raise warnings for depreciated arguments
  2. create a filelist from whatever the data_source input parameter is
  3. do a bunch of checks and raise warnings for lists with multiple products or a user-specified product that doesn't match the file metadata
  4. assign a product to the object

Other notable changes:

  • product and filelist are now public properties

Depreciation comments

product

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

Notes & Questions

  1. We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?
  2. If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.
  3. In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Todo

  • use a mock filesystem to test filelist creation
  • docstrings

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Binder 👈 Launch a binder notebook on this branch for commit 612662e

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit c16a003

Binder 👈 Launch a binder notebook on this branch for commit 7648078

Binder 👈 Launch a binder notebook on this branch for commit 4cfbfdb

Binder 👈 Launch a binder notebook on this branch for commit 203f3ad

Binder 👈 Launch a binder notebook on this branch for commit 10d1591

Binder 👈 Launch a binder notebook on this branch for commit 6f5bead

Binder 👈 Launch a binder notebook on this branch for commit 035ee5a

Binder 👈 Launch a binder notebook on this branch for commit 903c351

Binder 👈 Launch a binder notebook on this branch for commit d842bde

Binder 👈 Launch a binder notebook on this branch for commit 5e06de9

Binder 👈 Launch a binder notebook on this branch for commit 9ca29f1

Binder 👈 Launch a binder notebook on this branch for commit e8e35ad

Binder 👈 Launch a binder notebook on this branch for commit 4bcc518

Binder 👈 Launch a binder notebook on this branch for commit b2c2735

Binder 👈 Launch a binder notebook on this branch for commit 6b953f9

Binder 👈 Launch a binder notebook on this branch for commit 45704a4

Binder 👈 Launch a binder notebook on this branch for commit 2bf2808

Binder 👈 Launch a binder notebook on this branch for commit 1242881

Binder 👈 Launch a binder notebook on this branch for commit 5f8589a

@rwegener2 rwegener2 changed the title Read arguments WIP Read arguments Sep 5, 2023
@rwegener2
Copy link
Contributor Author

Ok @JessicaS11 and/or @weiji14, this PR is ready for comments. I am still working on 1) updating docstrings and 2) writing a few tests, but in the meantime I'm interested in initial feedback. Specifically, I left 3 questions in the Notes & Questions section of the writeup.

Thanks!

@JessicaS11
Copy link
Member

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

You make a compelling argument, and if we provide a few good glob examples (or links to them) in the notebook then we are hopefully showing users what they'd need to know. I think then I would be okay with deprecating the user-input product argument.

We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?

Correct! So if we're adding cloud support specifically in a next step, then we can bookmark adding this there.

If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.

See comment above - this would be an example glob use case I'd want to include.

In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Deprecated = discouraged in my book. Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

@rwegener2
Copy link
Contributor Author

rwegener2 commented Sep 8, 2023

Thanks for the comments @JessicaS11!

Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

I did some quick reading about depreciation recommendations. What you mention here, of informing users of in what version the feature will be depreciated, is considered best practice. To the "how long should you wait to depreciate" question, I noticed that people seem to mark time not by number of months but by versions. One SO comment summarized "Major version releases are a good time to remove deprecated methods. Minor releases should typically not contain breaking changes." Other common themes included communicating with users, ex. in warnings (I think icepyx does this well) or blog posts.

Have you considered a major release to v1.0.0 at some point? Either the integration of cloud reading or the Argo project that you're working on (or both) seem like big changes that could perhaps prompt a major release. The big changes you made to authentication could also be part of what gets announced. The major version change could then be a good time to remove many features that have been maintained only for backward compatibility.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rwegener2 rwegener2 marked this pull request as ready for review September 8, 2023 15:29
@rwegener2 rwegener2 changed the title WIP Read arguments Read arguments Sep 8, 2023
@rwegener2
Copy link
Contributor Author

rwegener2 commented Sep 8, 2023

Ok, this PR is ready for review! Since the last comment I updated the documentation and also added glob_kwargs parameter that can be used to recursively search directories. The thing I haven't been able to figure out is mocking a filesystem for testing with h5py reads. I've spent a few hours trying to get something set up to no avail. This morning I posted a question on Stack Overflow, so if that works out I'll follow up.

Looking forward to a review!

@rwegener2 rwegener2 requested a review from JessicaS11 September 8, 2023 15:35
@@ -63,9 +63,8 @@
"metadata": {},
Copy link
Member

@JessicaS11 JessicaS11 Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so. If you would like to search recursively, you can achieve this by:

  1. passing the recursive=True argument into glob_kwargs (shown below)
  2. use /**/ in the filepath to match any level of nested folders (not shown below)
  3. using glob directly to create a list of filepaths (shown below)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I streamlined this explanation, like your comment suggests. Let me know if it still isn't clear!

icepyx/core/read.py Outdated Show resolved Hide resolved
icepyx/core/read.py Outdated Show resolved Hide resolved
icepyx/core/read.py Outdated Show resolved Hide resolved
icepyx/core/read.py Outdated Show resolved Hide resolved
@rwegener2
Copy link
Contributor Author

All comments addressed. I'm looking for a 👍🏻 about how the glob docs section was phrased. Then we are just waiting to see what happens with the visualization errors caused by the OpenAltimetry API (cc @JessicaS11).

@JessicaS11
Copy link
Member

Looks great, @rwegener2. I'll hold off on approving until #454 is fixed so there's not an accidental merge, but this is good to go!

@JessicaS11
Copy link
Member

@all-contributors
please add @rwegener2 for bug, code, doc, ideas, maintenance, review, test, tutorial
please add @jpswinski for review
please add @whyjz for tutorial

@allcontributors
Copy link
Contributor

@JessicaS11

I've put up a pull request to add @rwegener2! 🎉

I've put up a pull request to add @jpswinski! 🎉

I've put up a pull request to add @whyjz! 🎉

@JessicaS11 JessicaS11 merged commit d86cc9e into development Oct 18, 2023
1 check passed
@JessicaS11 JessicaS11 deleted the read_arguments branch October 18, 2023 18:02
JessicaS11 pushed a commit that referenced this pull request Nov 15, 2023
* add filelist and product properties to Read object
* deprecate filename_pattern and product class Read inputs
* transition to data_source input as a string (including glob string) or list
* update tutorial with changes and user guidance for using glob

---------
Co-authored-by: Jessica Scheick <[email protected]>
JessicaS11 pushed a commit that referenced this pull request Jan 5, 2024
* add filelist and product properties to Read object
* deprecate filename_pattern and product class Read inputs
* transition to data_source input as a string (including glob string) or list
* update tutorial with changes and user guidance for using glob

---------
Co-authored-by: Jessica Scheick <[email protected]>
JessicaS11 added a commit that referenced this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants