Read arguments #444

rwegener2 · 2023-09-05T17:52:13Z

What was Done

This PR changes the input parameters for the Read module in two ways:

combine the filepath_pattern and data_source arguments into a single parameter (maintained the name data_source).
we are no longer going to require the product argument from the user. It will still be optional, both for the purposes of deprecation, and also to be used in the event that a user provides a list of files that do not all contain the same product.

The options for the new version of thedata_source parameter are:

a string to a single local file
a string to a directory
a glob string
a list of files

Unsupported options for data_source are:

user provides a list of directories
user provides a Query object (a question about this in the Notes section below)

How It was Done

The majority of the changes are all in the __init__ method. The overall flow of the method is now:

raise warnings for depreciated arguments
create a filelist from whatever the data_source input parameter is
do a bunch of checks and raise warnings for lists with multiple products or a user-specified product that doesn't match the file metadata
assign a product to the object

Other notable changes:

product and filelist are now public properties

Depreciation comments

`product`

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

Notes & Questions

We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?
If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.
In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Todo

use a mock filesystem to test filelist creation
docstrings

Co-authored-by: Wei Ji <[email protected]>

github-actions · 2023-09-05T17:52:27Z

👈 Launch a binder notebook on this branch for commit 612662e

I will automatically update this comment whenever this PR is modified

👈 Launch a binder notebook on this branch for commit c16a003

👈 Launch a binder notebook on this branch for commit 7648078

👈 Launch a binder notebook on this branch for commit 4cfbfdb

👈 Launch a binder notebook on this branch for commit 203f3ad

👈 Launch a binder notebook on this branch for commit 10d1591

👈 Launch a binder notebook on this branch for commit 6f5bead

👈 Launch a binder notebook on this branch for commit 035ee5a

👈 Launch a binder notebook on this branch for commit 903c351

👈 Launch a binder notebook on this branch for commit d842bde

👈 Launch a binder notebook on this branch for commit 5e06de9

👈 Launch a binder notebook on this branch for commit 9ca29f1

👈 Launch a binder notebook on this branch for commit e8e35ad

👈 Launch a binder notebook on this branch for commit 4bcc518

👈 Launch a binder notebook on this branch for commit b2c2735

👈 Launch a binder notebook on this branch for commit 6b953f9

👈 Launch a binder notebook on this branch for commit 45704a4

👈 Launch a binder notebook on this branch for commit 2bf2808

👈 Launch a binder notebook on this branch for commit 1242881

👈 Launch a binder notebook on this branch for commit 5f8589a

rwegener2 · 2023-09-05T22:31:29Z

Ok @JessicaS11 and/or @weiji14, this PR is ready for comments. I am still working on 1) updating docstrings and 2) writing a few tests, but in the meantime I'm interested in initial feedback. Specifically, I left 3 questions in the Notes & Questions section of the writeup.

Thanks!

JessicaS11 · 2023-09-07T17:57:01Z

It complicates the logic quite a bit to continue to allow the user to provide a product argument. I do think we should maintain it for the moment because it allows for backwards compatibility, but in the future it would be a lot simpler if the user were just required to provide a path to files of only a single product type. We give them lots of options for flexibility via glob strings.

You make a compelling argument, and if we provide a few good glob examples (or links to them) in the notebook then we are hopefully showing users what they'd need to know. I think then I would be okay with deprecating the user-input product argument.

We discussed allowing for a Query object as a data_source parameter type. It seems like this only makes sense for the cloud reading, since the Query object returns the paths of cloud data. Is this correct, or am I missing something?

Correct! So if we're adding cloud support specifically in a next step, then we can bookmark adding this there.

If a user provides a directory with data in it, do we want to recursively search the directory for sub-directories with data? My inclination is no, and that if a user wants that they will need to use the glob module on their own with the recursive=True argument.

See comment above - this would be an example glob use case I'd want to include.

In warning messages my tone is that giving the product argument is discouraged, but I'm open to hearing other opinions on this. (See note above in Depreciation Comment. I'm arguing from a dev perspective, but I may not be doing a complete job of seeing the user side. So, well, teamwork 🙂)

Deprecated = discouraged in my book. Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

rwegener2 · 2023-09-08T13:01:47Z

Thanks for the comments @JessicaS11!

Wondering if we should put a projected deprecation date in the message (or at the very least add a comment for ourselves on the date and last version to use the keyword) so that we can set a standard for actually removing deprecated things from the code.

I did some quick reading about depreciation recommendations. What you mention here, of informing users of in what version the feature will be depreciated, is considered best practice. To the "how long should you wait to depreciate" question, I noticed that people seem to mark time not by number of months but by versions. One SO comment summarized "Major version releases are a good time to remove deprecated methods. Minor releases should typically not contain breaking changes." Other common themes included communicating with users, ex. in warnings (I think icepyx does this well) or blog posts.

Have you considered a major release to v1.0.0 at some point? Either the integration of cloud reading or the Argo project that you're working on (or both) seem like big changes that could perhaps prompt a major release. The big changes you made to authentication could also be part of what gets announced. The major version change could then be a good time to remove many features that have been maintained only for backward compatibility.

review-notebook-app · 2023-09-08T14:16:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rwegener2 · 2023-09-08T15:35:43Z

Ok, this PR is ready for review! Since the last comment I updated the documentation and also added glob_kwargs parameter that can be used to recursively search directories. The thing I haven't been able to figure out is mocking a filesystem for testing with h5py reads. I've spent a few hours trying to get something set up to no avail. This morning I posted a question on Stack Overflow, so if that works out I'll follow up.

Looking forward to a review!

JessicaS11 · 2023-10-09T19:32:48Z

doc/source/example_notebooks/IS2_data_read-in.ipynb

@@ -63,9 +63,8 @@
   "metadata": {},


Suggestion:
glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so. If you would like to search recursively, you can achieve this by:

passing the recursive=True argument into glob_kwargs (shown below)
use /**/ in the filepath to match any level of nested folders (not shown below)
using glob directly to create a list of filepaths (shown below)

Reply via ReviewNB

I streamlined this explanation, like your comment suggests. Let me know if it still isn't clear!

icepyx/core/read.py

Co-authored-by: Jessica Scheick <[email protected]>

rwegener2 · 2023-10-10T16:09:36Z

All comments addressed. I'm looking for a 👍🏻 about how the glob docs section was phrased. Then we are just waiting to see what happens with the visualization errors caused by the OpenAltimetry API (cc @JessicaS11).

JessicaS11 · 2023-10-12T20:41:17Z

Looks great, @rwegener2. I'll hold off on approving until #454 is fixed so there's not an accidental merge, but this is good to go!

JessicaS11 · 2023-10-18T17:45:57Z

@all-contributors
please add @rwegener2 for bug, code, doc, ideas, maintenance, review, test, tutorial
please add @jpswinski for review
please add @whyjz for tutorial

allcontributors · 2023-10-18T17:49:56Z

@JessicaS11

I've put up a pull request to add @rwegener2! 🎉

I've put up a pull request to add @jpswinski! 🎉

I've put up a pull request to add @whyjz! 🎉

* add filelist and product properties to Read object * deprecate filename_pattern and product class Read inputs * transition to data_source input as a string (including glob string) or list * update tutorial with changes and user guidance for using glob --------- Co-authored-by: Jessica Scheick <[email protected]>

This reverts commit bae2d89.

rwegener2 and others added 15 commits August 1, 2023 17:07

mvp remove intake from Read

9d09ff9

Merge branch 'development' into refactor_intake

e5458a1

delete is2cat and references

24f6a42

remove extra comments

b13b847

update doc strings

0779b80

update tests

1cfbf72

update documentation for removing intake

de61d87

update approach paragraph

9f06611

remove one more instance of catalog from the docs

d019b9a

clear jupyter history

156ea89

Update icepyx/core/read.py

b26ca4e

Co-authored-by: Wei Ji <[email protected]>

remove intake and related modules

ce1ca76

Merge branch 'development' into read_arguments

fd00aeb

mvp with new read parameters

431af78

clean up remainder of file and remove extraneous comments

612662e

rwegener2 changed the title ~~Read arguments~~ WIP Read arguments Sep 5, 2023

rwegener2 added 2 commits September 5, 2023 21:06

maintain backward compatibility and combine arguments

c16a003

update to new error message

7648078

update docs

4cfbfdb

rwegener2 added 2 commits September 8, 2023 14:32

glob kwargs and list error

f7f823b

formatting updates

203f3ad

rwegener2 marked this pull request as ready for review September 8, 2023 15:29

rwegener2 changed the title ~~WIP Read arguments~~ Read arguments Sep 8, 2023

rwegener2 requested a review from JessicaS11 September 8, 2023 15:35

rwegener2 and others added 3 commits September 14, 2023 13:42

Merge branch 'development' into read_arguments

9ca29f1

Merge branch 'development' into read_arguments

e8e35ad

Merge branch 'development' into read_arguments

4bcc518