Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support selecting steps by package name for strun #202

Merged
merged 3 commits into from
Oct 25, 2024

Conversation

braingram
Copy link
Collaborator

If a user installs both romancal and jwst and runs the following:

strun resample some_file

strun will always find the resample step in jwst (even though romancal provides a resample step with the class alias "resample").

This PR adds the ability to filter steps found by strun by providing the package name as a "scope. With this PR a user could call:

strun romancal::resample some_file

to run resample from romancal or

strun jwst::resample some_file

to run resample from jwst.

Documentation in both packages will need to be updated minimally with a description that providing a package name may be required if multiple strun supported packages are provided.

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run regression tests with this branch installed ("git+https://github.com/<fork>/stpipe@<branch>")
news fragment change types...
  • changes/<PR#>.feature.rst: new feature
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.doc.rst: documentation change
  • changes/<PR#>.removal.rst: deprecation or removal of public API
  • changes/<PR#>.misc.rst: infrastructure or miscellaneous change

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.57%. Comparing base (558c952) to head (fd68d87).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
- Coverage   94.62%   94.57%   -0.05%     
==========================================
  Files          37       37              
  Lines        3125     3172      +47     
==========================================
+ Hits         2957     3000      +43     
- Misses        168      172       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -21,7 +21,7 @@ def main():
sys.exit(0)

try:
Step.from_cmdline(sys.argv[1:])
cmdline.step_from_cmdline(sys.argv[1:])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not required by this PR (I'm fine removing it) but during debugging it was confusing that this interface calls Step.from_cmdline which then just calls cmdline.step_from_cmdline.

stpipe/src/stpipe/step.py

Lines 170 to 191 in 799be65

@staticmethod
def from_cmdline(args):
"""
Create a step from a configuration file.
Parameters
----------
args : list of str
Commandline arguments
Returns
-------
step : Step instance
If the config file has a ``class`` parameter, the return
value will be as instance of that class.
Any parameters found in the config file will be set
as member variables on the returned `Step` instance.
"""
from . import cmdline
return cmdline.step_from_cmdline(args)

@@ -1,7 +1,7 @@
import warnings
from collections import namedtuple

from importlib_metadata import entry_points
import importlib_metadata
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to change this import to allow the monkeypatching of "entry_points" in the unit test to work.

@braingram
Copy link
Collaborator Author

Pinging @tapastro and @schlafly for comments on this PR before running regression tests. What do you both think of the new way of disambiguating steps when both jwst and romancal are installed?

@braingram braingram marked this pull request as ready for review October 24, 2024 15:49
@braingram braingram requested a review from a team as a code owner October 24, 2024 15:49
@zacharyburnett
Copy link
Collaborator

the only concern I have is if some user's shell uses :: as an operator, but in that case they could just surround the argument with quotes

@braingram
Copy link
Collaborator Author

the only concern I have is if some user's shell uses :: as an operator, but in that case they could just surround the argument with quotes

Good point. Is there a shell that uses that? I don't think bash does and I'm too lazy to try the fancy "new" ones xD

@zacharyburnett
Copy link
Collaborator

the only shell I can think of that uses colons is nushell but if you're using nushell then you're probably quoting all your arguments anyways just to be safe haha

@melanieclarke
Copy link
Collaborator

Question - isn't it already possible to disambiguate by specifying the package path for the step, e.g. 'strun jwst.resample.ResampleStep'? Is the idea just to add another convenient alias?

@braingram
Copy link
Collaborator Author

I mean maybe we dip into unicode... strun jwst😍resample

@zacharyburnett
Copy link
Collaborator

Question - isn't it already possible to disambiguate by specifying the package path for the step, e.g. 'strun jwst.resample.ResampleStep'? Is the idea just to add another convenient alias?

I think the intent is to reduce unexpected behavior, if a user forgets they have both installed and just calls resample. However, this won't prevent the user from calling resample by itself right? So the problem persists.

@braingram
Copy link
Collaborator Author

Question - isn't it already possible to disambiguate by specifying the package path for the step, e.g. 'strun jwst.resample.ResampleStep'? Is the idea just to add another convenient alias?

Yeah it's possible to either use the full step class or the class alias. For jwst these are defined here and for romancal here. Some "class aliases" (the second item in each tuple) are common which leads to issues when calling strun resample ....

@braingram
Copy link
Collaborator Author

Question - isn't it already possible to disambiguate by specifying the package path for the step, e.g. 'strun jwst.resample.ResampleStep'? Is the idea just to add another convenient alias?

I think the intent is to reduce unexpected behavior, if a user forgets they have both installed and just calls resample. However, this won't prevent the user from calling resample by itself right? So the problem persists.

We could also add a warning or exception if multiple classes are matched. I think that's worth doing (maybe even part of this PR).

To provide a bit more context. The "class alias" is also used for setting the "meta.cal_step" status when the step is skipped.

stpipe/src/stpipe/step.py

Lines 484 to 486 in 799be65

setattr(
model.meta.cal_step, self.class_alias, "SKIPPED"
)

Which is contributing to spacetelescope/roman_datamodels#343 where if a step is skipped in a pipeline for romancal it's cal_step status is not set to "SKIPPED". This is somewhat a separate issue but as romancal is going to define more class aliases to fix the skip issue it seems worth having a way to disambiguate the alias provided to strun since we will soon have more conflicting aliases (and both pipelines document that using an alias is possible).

@schlafly
Copy link

This seems like a good design to me, thanks!

@melanieclarke
Copy link
Collaborator

I see. I thought the original intent was that the class alias should be globally unique for stpipe packages, but it does seem silly to make romancal think of a synonym for "resample".

@tapastro
Copy link
Collaborator

Agreed that this seems like a reasonable approach - I think a warning or exception is a useful, if not necessary, addition.

I am curious who the "typical" user would be who would encounter this issue, and in what environment this overlap would occur. I am not that familiar with what a typical Roman user is expected to do wrt. processing, locally vs. in the cloud, but I kind of expected processing for the two observatories to require different environments, if not AWS instances vs. local processing. Not that this should prevent a solution, just some speculation!

@schlafly
Copy link

I agree that especially as currently Roman isn't operational there aren't many people with both the Webb and Roman pipelines running. But it seemed bad form for the two packages using stpipe to step on one anothers' toes. I think we do want people to be able to have both pipelines running side-by-side in the same environment.

@nden
Copy link
Collaborator

nden commented Oct 24, 2024

Roman data will be processed locally as well. There's no reason to expect people won't be interested in both JWST and Roman data.

@braingram
Copy link
Collaborator Author

Agreed that this seems like a reasonable approach - I think a warning or exception is a useful, if not necessary, addition.

I am curious who the "typical" user would be who would encounter this issue, and in what environment this overlap would occur. I am not that familiar with what a typical Roman user is expected to do wrt. processing, locally vs. in the cloud, but I kind of expected processing for the two observatories to require different environments, if not AWS instances vs. local processing. Not that this should prevent a solution, just some speculation!

Thanks! I'll update this PR with an error (and we'll see if we need to dial it back to a warning).

I doubt this will come up for the specific combination of roman and jwst (although @nden objects to saying "you can't have jwst and romancal installed" and I agree). I think the issue is more likely to come up for things like Eureka (which extends the jwst pipeline with additional steps) and as more projects adopt stpipe (we've so far heard of one group working on 1 possibly 2 pipelines).

@braingram
Copy link
Collaborator Author

I added an error in fd68d87

Testing with jwst and romancal installed if I run strun resample ... I get:

ValueError: class alias resample matched more than 1 step. Please provide the package name along with the step name. One of:
  jwst::resample
  romancal::resample

With that environment I can run strun jwst::resample ... and it finds and executes the jwst resample step and run strun romancal::resample ... and it finds and executes the romancal resample step.

Copy link
Collaborator

@zacharyburnett zacharyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a good solution to me, and scaleable to more packages in the future. I think it should remain an error, to enforce muscle habit of specifying the package name before shared aliases. To that end, should all aliases require the package name?

@braingram
Copy link
Collaborator Author

this looks like a good solution to me, and scaleable to more packages in the future. I think it should remain an error, to enforce muscle habit of specifying the package name before shared aliases. To that end, should all aliases require the package name?

Thanks! I'll run regtests to make sure the error isn't a problem (although I doubt anything will fail given the test setups).

I'm ok leaving aliases out when strun can unambiguously sort out the class name. If this goes in I'll work on PRs for jwst and romancal RTD to mention the need for a package name if >1 stpipe supporting package provides the same step names.

@braingram
Copy link
Collaborator Author

braingram commented Oct 25, 2024

@braingram braingram merged commit 3ed69b7 into spacetelescope:main Oct 25, 2024
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants