Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suffix to Step.spec #1347

Closed
wants to merge 22 commits into from
Closed

Conversation

braingram
Copy link
Collaborator

@braingram braingram commented Aug 1, 2024

This PR adds suffix to the Step.spec for (almost all) steps.

This allows removal of code like the following found in many steps:

if self.save_results:
try:
self.suffix = "assignwcs"
except AttributeError:
self["suffix"] = "assignwcs"

and like the following found in pipelines:
self.flux.suffix = "flux"

There are 2 steps that do not have suffixes added to their specs as I was unsure from the usage what was expected:

  • sourcedetection
  • tweakreg

I'm happy to update this PR with suffixes for those steps if there is a decided suffix.

Regression tests ran at: https://github.com/spacetelescope/RegressionTests/actions/runs/10206477558

shows 1 failure:

romancal/regtest/test_resample.py::test_resample_single_file - FileNotFoundError: [Errno 2] No such file or directory: '/runner/_work/_temp/pytest_basetemp/popen-gw3/test_resample_single_file0/mosaic_resamplestep.asdf'

This is due to the test expecting a mosaic_resamplestep.asdf file to be produced:

output_data = "mosaic_resamplestep.asdf"

with this PR the produced file is mosaic_i2d.asdf (since i2d is the default suffix for resample). This can be seen in the run log which shows:

INFO     stpipe.ResampleStep:log_config.py:142 Saved model in mosaic_i2d.asdf

With the changes in this PR I'd consider the production of mosaic_i2d.asdf an improvement and the regression test files can be updated to reflect this. If this requires updating the script to generate files please let me know and I can update this PR.

This PR could also allow some cleanup to occur for:
https://roman-pipeline.readthedocs.io/en/latest/roman/data_products/product_types.html
which lists several suffixes for steps. This could be reduced to a single suffix for step with the changes in this PR. Let me know if you'd like me to add those changes to this PR.

Closes #1344

Checklist

  • added entry in CHANGES.rst under the corresponding subsection
  • updated relevant tests
  • updated relevant documentation
  • updated relevant milestone(s)
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR

Copy link

codecov bot commented Aug 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.90%. Comparing base (482ef5c) to head (1688dad).
Report is 165 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1347      +/-   ##
==========================================
+ Coverage   78.61%   78.90%   +0.29%     
==========================================
  Files         117      117              
  Lines        7856     7809      -47     
==========================================
- Hits         6176     6162      -14     
+ Misses       1680     1647      -33     
Flag Coverage Δ *Carryforward flag
nightly 62.31% <ø> (+0.07%) ⬆️ Carriedforward from 482ef5c

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braingram braingram changed the title Spec suffix Add suffix to Step.spec Aug 2, 2024
@braingram braingram marked this pull request as ready for review August 2, 2024 17:03
@braingram braingram requested a review from a team as a code owner August 2, 2024 17:03
@braingram

This comment was marked as outdated.

'--suffix="suffix_test"',
]
RomanStep.from_cmdline(args)
output = "Test_darkcurrent.asdf"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to delete this test as it appears to check that calling the DarkCurrentStep and providing an expected output suffix of suffix_test produces a file with a suffix darkcurrent.

This isn't the expected behavior for stpipe (the output file should have the suffix provided on the commandline, suffix_test not darkcurrent).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddavis-stsci mentioned that this test was intended to test if a user provided suffix is used.

It could be made to pass by:

  • updating the expected output from Test_darkcurrent.asdf to Test_suffix_test.asdf (which includes the user-provided suffix)
  • update the regression test file to have the matching Test_suffix_test.asdf name

One question might be, why isn't the expected output Test_dark_suffix_text.asdf since the user provided --output_file="Test_dark" and --suffix="suffix_test"? This is due to dark being listed as a supported suffix.

@schlafly
Copy link
Collaborator

schlafly commented Aug 5, 2024

I agree there's a question with resample & tweakreg what the suffix should be; as these are the last steps in the pipeline, i2d & cal make sense. OTOH, if the steps are run in isolation, a more step-centric same (e.g., resample, tweakreg) make sense. In the ELP in your PR you make the ELP produce cal files. It seems reasonable to me to retain that for resample and tweakreg (i.e., using suffixes like resample and tweakreg).

sourcedetection is an annoying step. The sourcedetection step mostly makes a separate _cat suffixed file. In the L2 pipeline, we also update the image metadata to reference this file so that it can be used later in tweakreg. It feels a little much to make this a separate kind of file with a separate "sourcedetection" suffix, but maybe that's the easiest option.

For L3 we don't change the L3 image metadata and this suffix doesn't make as much sense, but I don't think the L3 pipeline can make these files so maybe we don't need to think about them.

I guess I'm happy having the default suffix here be "sourcedetection."

For my two cents I agree that either removing the dark suffix test or updating it to look for the new suffix is appropriate; the current behavior is surprising.

I missed your point about the duplicate docs.

@braingram
Copy link
Collaborator Author

Thanks for taking a look!

I'm going to try and continue the discussion but in a different order :)

I missed your point about the duplicate docs.

Currently the output of resample has 2 listed suffixes:

  • i2d
  • resamplestep

With this PR makes resample always produce an i2d file. I think it's easiest to understand if a step always produces a file with the same suffix. tweakreg also lists cal and tweakregstep (but as noted above this PR does not yet address the tweakreg suffix by giving the step a default). I think based on:

I agree there's a question with resample & tweakreg what the suffix should be; as these are the last steps in the pipeline, i2d & cal make sense. OTOH, if the steps are run in isolation, a more step-centric same (e.g., resample, tweakreg) make sense. In the ELP in your PR you make the ELP produce cal files. It seems reasonable to me to retain that for resample and tweakreg (i.e., using suffixes like resample and tweakreg).

that cal would be ok for tweakreg. However that wouldn't address your comment about pipeline vs individual step runs.

The pipeline suffix is trickier (as you noted). On one hand, it makes sense to me that running each step in a pipeline separately should produce a file that is identical (including in suffix) to a file produced by the pipeline. This would mean elp (for example) should use whatever suffix is used by the last step in the pipeline (dealing with skipped steps is something that this PR does not address. If a user skips tweakreg, should the output still be a cal file?).

I also think it makes sense to have save_results outputs for a pipeline run use the same suffix as the individual steps. For example, running elp with flat_field.save_results==True would produce a .flat file (in addition to the output of the pipeline).

sourcedetection is an annoying step. The sourcedetection step mostly makes a separate _cat suffixed file. In the L2 pipeline, we also update the image metadata to reference this file so that it can be used later in tweakreg. It feels a little much to make this a separate kind of file with a separate "sourcedetection" suffix, but maybe that's the easiest option.

For L3 we don't change the L3 image metadata and this suffix doesn't make as much sense, but I don't think the L3 pipeline can make these files so maybe we don't need to think about them.

I guess I'm happy having the default suffix here be "sourcedetection."

It sounds like there are a few things to discuss about sourcedetection. Before we dive in, I'm failing to find a pipeline that runs source detection :-/ How is this step used?

@schlafly
Copy link
Collaborator

schlafly commented Aug 5, 2024

I was reading those tables differently; the first and last rows give the total input and output of the various pipelines, and then the per-step rows give the per-step outputs.

I hear you that it would be nice for the last step of the pipeline to naturally create a product that has the suffix that is the suffix for the whole pipeline, and I can be convinced. But "cal" me carries with it the meaning that we've run a whole pipeline on it, and while "tweakregstep" is usually last it's a little awkward if it makes a cal file when run alone on a minimally calibrated product. I take your point that you could run the whole pipeline and skip all the steps and get a cal file too, but that feels like a less typical case than someone wanting to run tweakreg in isolation.

I agree that it's good if the step defaults are the same as the ELP defaults for the pipeline runs.

Sorry, I was confusing source_detection & source_catalog! We have in this build deprecated source_detection in favor of source_catalog and do not run source_detection any longer (though we haven't actually removed the code & docs yet).

@braingram
Copy link
Collaborator Author

I was reading those tables differently; the first and last rows give the total input and output of the various pipelines, and then the per-step rows give the per-step outputs.

Thanks for the clarification! That makes much more sense now that I know that.

I hear you that it would be nice for the last step of the pipeline to naturally create a product that has the suffix that is the suffix for the whole pipeline, and I can be convinced. But "cal" me carries with it the meaning that we've run a whole pipeline on it, and while "tweakregstep" is usually last it's a little awkward if it makes a cal file when run alone on a minimally calibrated product. I take your point that you could run the whole pipeline and skip all the steps and get a cal file too, but that feels like a less typical case than someone wanting to run tweakreg in isolation.

That works for me and I think is a easier approach to implement (as it's mostly what is done now). I think the only changes are:

  • add a suffix for tweakreg, does tweakreg work?
  • add a suffix for the mos pipeline, at the moment this is i2d which is the same as resample (alternatively resample could be given a different perhaps resample suffix).

I agree that it's good if the step defaults are the same as the ELP defaults for the pipeline runs.

I think that should be the case now with this PR.

Sorry, I was confusing source_detection & source_catalog! We have in this build deprecated source_detection in favor of source_catalog and do not run source_detection any longer (though we haven't actually removed the code & docs yet).

Good to know! For this PR what do you suggest we do? We could just ignore it (and it will use the default sourcedetectionstep suffix).

@schlafly
Copy link
Collaborator

schlafly commented Aug 5, 2024

Let's just not do anything for sourcedetection for this PR. We will delete it.

For resample, I would have said that the current default was "resamplestep."

cp mosaic_resamplestep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

unless that code has gone out of date. Changing that to resample sounds more idiomatic to me but would require renaming the regression test files, or leave it as resamplestep for now and avoid that.

Yes, in my head "resamplestep" was the current default for resample and "i2d" was the default for the pipeline; you're likely right if you think something different, but it seems to me that that's effectively what's been happening in the code.

For tweakreg, I agree that "tweakreg" would be the obvious suffix. Currently to me it looks like it is "tweakregstep"---similar scenario as to resample.

cp ${basename}_shift_tweakregstep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

@braingram
Copy link
Collaborator Author

Let's just not do anything for sourcedetection for this PR. We will delete it.

Thanks. Will do.

For resample, I would have said that the current default was "resamplestep."

On main it depends on how resample is run. As a stand-alone step it will use resamplestep (any step that doesn't define a suffix will use the lower cased step class name because of Step.default_suffix). However when run (with results saved) as part of the mos pipeline the suffix will be i2d:

self.resample.suffix = "i2d"

unless that code has gone out of date. Changing that to resample sounds more idiomatic to me but would require renaming the regression test files, or leave it as resamplestep for now and avoid that.

Yes, in my head "resamplestep" was the current default for resample and "i2d" was the default for the pipeline; you're likely right if you think something different, but it seems to me that that's effectively what's been happening in the code.

For tweakreg, I agree that "tweakreg" would be the obvious suffix. Currently to me it looks like it is "tweakregstep"---similar scenario as to resample.

cp ${basename}_shift_tweakregstep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

Exactly. Those 2 regression test files would need to be updated if we change:

  • resamplestep to resample
  • tweakregstep to tweakreg

I don't feel strongly one way or the other and am happy to update this PR to set resamplestep as the default suffix for resample (and similarly for tweakregstep for tweakreg). I think it's still worth adding the suffix to the spec for those steps since it appears in the documentation (here's an example with this PR for the flux step):
https://roman-pipeline--1347.org.readthedocs.build/en/1347/api/romancal.flux.FluxStep.html#romancal.flux.FluxStep.spec

Does the following todo look good for finishing up this PR:

  • set resample step suffix to resamplestep
  • set tweakreg step suffix to tweakregstep
  • sort out what to do about the dark unit test

I'm also ok with using resample and tweakreg instead of resamplestep and tweakregstep and if we go that route I'll update the script in this PR (I will have questions about how to run it though). Let me know what works for you.

@ddavis-stsci
Copy link
Collaborator

I think we should try and follow the naming convention of the https://roman-pipeline.readthedocs.io/en/latest/roman/data_products/product_types.html page and not try and make things up as we go along. This is based on the JWST naming conventions when possible.

We can certainly update the page if we think that is necessary but I think we should at least start from there.

@schlafly
Copy link
Collaborator

schlafly commented Aug 5, 2024

I think Brett's current proposal is to do exactly that.

I actually would prefer his other suggestion, which would be to update the the two steps that have "step" in their name and don't follow the pattern of the other suffixes (resample, tweakreg), replacing their suffixes with "resample" and "tweakreg". But I don't feel strongly; @ddavis-stsci , if you'd rather keep to exactly what's on that page, that's less effort for Brett but looks a little goofy.

There are also cases like outlier detection and source detection which we no longer use where I would propose we do nothing.

Brett, I agree that it's worth adding the suffix to the specs for resample & tweakreg even if they end up using the stpipe defaults.

@ddavis-stsci
Copy link
Collaborator

Also this is not honoring the save_results option,
strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp $input_filename
should save the catalog step. I'm only seeing the _cal.asdf file.

@ddavis-stsci
Copy link
Collaborator

Also the tweakreg step should be listed as skipped but I'm seeing,
...
saturation COMPLETE
skymatch INCOMPLETE
tweakreg INCOMPLETE

@braingram
Copy link
Collaborator Author

Also this is not honoring the save_results option, strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp $input_filename should save the catalog step. I'm only seeing the _cal.asdf file.

Thanks for taking a look. Is there a regtest for this?

I tested this locally running:

strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp /Users/bgraham/test_bigdata/roman-pipeline/dev/WFI/image/r0000101001001001001_01101_0001_WFI01_uncal.asdf

This produced 4 asdf files:

  • r0000101001001001001_01101_0001_WFI01_cal.asdf
  • r0000101001001001001_01101_0001_WFI01_cat.asdf
  • r0000101001001001001_01101_0001_WFI01_segm.asdf
  • r0000101001001001001_01101_0001_WFI01_sourcecatalog.asdf

Also the tweakreg step should be listed as skipped but I'm seeing,
...
saturation COMPLETE
skymatch INCOMPLETE
tweakreg INCOMPLETE

The same occurs on main see spacetelescope/roman_datamodels#343

@braingram
Copy link
Collaborator Author

There are also cases like outlier detection and source detection which we no longer use where I would propose we do nothing.

Is outlier detection no longer used?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 5, 2024
@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Aug 5, 2024 via email

@braingram
Copy link
Collaborator Author

This is a recent change then...

Not that I'm aware of. I opened the issue when I found it in April.

Also stpipe thinks the step is skipped but just not recording it?

Yes, the issue is with recording the skip. The step is still skipped.

In general the step is appended when stpipe cannot determine a rule for the suffix.

stpipe uses a lower cased representation of Step.name:
https://github.com/spacetelescope/stpipe/blob/fc937a231920e1ae737e42cd8ea670dc23d2feb0/src/stpipe/step.py#L716
Step.name defaults to Step.__class__.__name__ when not provided:
https://github.com/spacetelescope/stpipe/blob/fc937a231920e1ae737e42cd8ea670dc23d2feb0/src/stpipe/step.py#L353C24-L353C43
This leads to a suffix that differs between step runs in isolation vs as part of a pipeline.
To use RefPixStep as an example. When run in isolation (with strun roman.step.RefPixStep) the Step.name will be RefPixStep (since no name was provided when the step was created). Without a suffix in the spec the suffix for the produced file will be refpixstep.
However, when run as part of elp, Step.name will be refpix (because of the step defs) and the produced file will have a suffix refpix.

So it is confusing, at least to me, to have the default the same as if the rule is unknown.

Including the suffix in the spec allows it to have a consistent name for pipeline and non-pipeline runs. The default rule(s) do not (and are honestly quite confusing).

Comment on lines +54 to +56
- Provide suffix in Step.spec for steps to allow default suffix use
within and outside Pipeline runs [#1347]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Provide suffix in Step.spec for steps to allow default suffix use
within and outside Pipeline runs [#1347]

now that #1375 is merged (switching change log handling to towncrier) this change log entry should be a file in changes/ instead:

echo "Provide suffix in Step.spec for steps to allow default suffix use within and outside Pipeline runs" > changes/1347.general.rst

(this is just because #1375 was merged while active PRs existed; new PRs will have these instructions in their checklist)

@braingram
Copy link
Collaborator Author

Closing this as stale.

@braingram braingram closed this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Romancal steps and suffix treatment
4 participants