Add `suffix` to `Step.spec` #1347

braingram · 2024-08-01T21:05:15Z

This PR adds suffix to the Step.spec for (almost all) steps.

This allows removal of code like the following found in many steps:

romancal/romancal/assign_wcs/assign_wcs_step.py

Lines 48 to 52 in d837289

    
           if self.save_results: 
        
               try: 
        
                   self.suffix = "assignwcs" 
        
               except AttributeError: 
        
                   self["suffix"] = "assignwcs"

and like the following found in pipelines:

romancal/romancal/pipeline/mosaic_pipeline.py

Line 74 in ebc1730

self.flux.suffix = "flux"

There are 2 steps that do not have suffixes added to their specs as I was unsure from the usage what was expected:

sourcedetection
tweakreg

I'm happy to update this PR with suffixes for those steps if there is a decided suffix.

Regression tests ran at: https://github.com/spacetelescope/RegressionTests/actions/runs/10206477558

shows 1 failure:

romancal/regtest/test_resample.py::test_resample_single_file - FileNotFoundError: [Errno 2] No such file or directory: '/runner/_work/_temp/pytest_basetemp/popen-gw3/test_resample_single_file0/mosaic_resamplestep.asdf'

This is due to the test expecting a mosaic_resamplestep.asdf file to be produced:

romancal/romancal/regtest/test_resample.py

Line 19 in ebc1730

output_data = "mosaic_resamplestep.asdf"

with this PR the produced file is mosaic_i2d.asdf (since i2d is the default suffix for resample). This can be seen in the run log which shows:

INFO     stpipe.ResampleStep:log_config.py:142 Saved model in mosaic_i2d.asdf

With the changes in this PR I'd consider the production of mosaic_i2d.asdf an improvement and the regression test files can be updated to reflect this. If this requires updating the script to generate files please let me know and I can update this PR.

This PR could also allow some cleanup to occur for:
https://roman-pipeline.readthedocs.io/en/latest/roman/data_products/product_types.html
which lists several suffixes for steps. This could be reduced to a single suffix for step with the changes in this PR. Let me know if you'd like me to add those changes to this PR.

Closes #1344

Checklist

added entry in CHANGES.rst under the corresponding subsection
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)
ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR

codecov · 2024-08-01T22:06:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.90%. Comparing base (482ef5c) to head (1688dad).
Report is 165 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1347      +/-   ##
==========================================
+ Coverage   78.61%   78.90%   +0.29%     
==========================================
  Files         117      117              
  Lines        7856     7809      -47     
==========================================
- Hits         6176     6162      -14     
+ Misses       1680     1647      -33

Flag	Coverage Δ		*Carryforward flag
nightly	`62.31% <ø> (+0.07%)`	⬆️	Carriedforward from 482ef5c

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

braingram · 2024-08-02T17:09:52Z

romancal/regtest/test_dark_current.py

-        '--suffix="suffix_test"',
-    ]
-    RomanStep.from_cmdline(args)
-    output = "Test_darkcurrent.asdf"


I chose to delete this test as it appears to check that calling the DarkCurrentStep and providing an expected output suffix of suffix_test produces a file with a suffix darkcurrent.

This isn't the expected behavior for stpipe (the output file should have the suffix provided on the commandline, suffix_test not darkcurrent).

@ddavis-stsci mentioned that this test was intended to test if a user provided suffix is used.

It could be made to pass by:

updating the expected output from Test_darkcurrent.asdf to Test_suffix_test.asdf (which includes the user-provided suffix)

update the regression test file to have the matching Test_suffix_test.asdf name

One question might be, why isn't the expected output Test_dark_suffix_text.asdf since the user provided --output_file="Test_dark" and --suffix="suffix_test"? This is due to dark being listed as a supported suffix.

schlafly · 2024-08-05T14:29:21Z

I agree there's a question with resample & tweakreg what the suffix should be; as these are the last steps in the pipeline, i2d & cal make sense. OTOH, if the steps are run in isolation, a more step-centric same (e.g., resample, tweakreg) make sense. In the ELP in your PR you make the ELP produce cal files. It seems reasonable to me to retain that for resample and tweakreg (i.e., using suffixes like resample and tweakreg).

sourcedetection is an annoying step. The sourcedetection step mostly makes a separate _cat suffixed file. In the L2 pipeline, we also update the image metadata to reference this file so that it can be used later in tweakreg. It feels a little much to make this a separate kind of file with a separate "sourcedetection" suffix, but maybe that's the easiest option.

For L3 we don't change the L3 image metadata and this suffix doesn't make as much sense, but I don't think the L3 pipeline can make these files so maybe we don't need to think about them.

I guess I'm happy having the default suffix here be "sourcedetection."

For my two cents I agree that either removing the dark suffix test or updating it to look for the new suffix is appropriate; the current behavior is surprising.

I missed your point about the duplicate docs.

braingram · 2024-08-05T14:51:46Z

Thanks for taking a look!

I'm going to try and continue the discussion but in a different order :)

I missed your point about the duplicate docs.

Currently the output of resample has 2 listed suffixes:

i2d
resamplestep

With this PR makes resample always produce an i2d file. I think it's easiest to understand if a step always produces a file with the same suffix. tweakreg also lists cal and tweakregstep (but as noted above this PR does not yet address the tweakreg suffix by giving the step a default). I think based on:

I agree there's a question with resample & tweakreg what the suffix should be; as these are the last steps in the pipeline, i2d & cal make sense. OTOH, if the steps are run in isolation, a more step-centric same (e.g., resample, tweakreg) make sense. In the ELP in your PR you make the ELP produce cal files. It seems reasonable to me to retain that for resample and tweakreg (i.e., using suffixes like resample and tweakreg).

that cal would be ok for tweakreg. However that wouldn't address your comment about pipeline vs individual step runs.

The pipeline suffix is trickier (as you noted). On one hand, it makes sense to me that running each step in a pipeline separately should produce a file that is identical (including in suffix) to a file produced by the pipeline. This would mean elp (for example) should use whatever suffix is used by the last step in the pipeline (dealing with skipped steps is something that this PR does not address. If a user skips tweakreg, should the output still be a cal file?).

I also think it makes sense to have save_results outputs for a pipeline run use the same suffix as the individual steps. For example, running elp with flat_field.save_results==True would produce a .flat file (in addition to the output of the pipeline).

sourcedetection is an annoying step. The sourcedetection step mostly makes a separate _cat suffixed file. In the L2 pipeline, we also update the image metadata to reference this file so that it can be used later in tweakreg. It feels a little much to make this a separate kind of file with a separate "sourcedetection" suffix, but maybe that's the easiest option.

For L3 we don't change the L3 image metadata and this suffix doesn't make as much sense, but I don't think the L3 pipeline can make these files so maybe we don't need to think about them.

I guess I'm happy having the default suffix here be "sourcedetection."

It sounds like there are a few things to discuss about sourcedetection. Before we dive in, I'm failing to find a pipeline that runs source detection :-/ How is this step used?

schlafly · 2024-08-05T15:18:33Z

I was reading those tables differently; the first and last rows give the total input and output of the various pipelines, and then the per-step rows give the per-step outputs.

I hear you that it would be nice for the last step of the pipeline to naturally create a product that has the suffix that is the suffix for the whole pipeline, and I can be convinced. But "cal" me carries with it the meaning that we've run a whole pipeline on it, and while "tweakregstep" is usually last it's a little awkward if it makes a cal file when run alone on a minimally calibrated product. I take your point that you could run the whole pipeline and skip all the steps and get a cal file too, but that feels like a less typical case than someone wanting to run tweakreg in isolation.

I agree that it's good if the step defaults are the same as the ELP defaults for the pipeline runs.

Sorry, I was confusing source_detection & source_catalog! We have in this build deprecated source_detection in favor of source_catalog and do not run source_detection any longer (though we haven't actually removed the code & docs yet).

braingram · 2024-08-05T16:12:31Z

I was reading those tables differently; the first and last rows give the total input and output of the various pipelines, and then the per-step rows give the per-step outputs.

Thanks for the clarification! That makes much more sense now that I know that.

I hear you that it would be nice for the last step of the pipeline to naturally create a product that has the suffix that is the suffix for the whole pipeline, and I can be convinced. But "cal" me carries with it the meaning that we've run a whole pipeline on it, and while "tweakregstep" is usually last it's a little awkward if it makes a cal file when run alone on a minimally calibrated product. I take your point that you could run the whole pipeline and skip all the steps and get a cal file too, but that feels like a less typical case than someone wanting to run tweakreg in isolation.

That works for me and I think is a easier approach to implement (as it's mostly what is done now). I think the only changes are:

add a suffix for tweakreg, does tweakreg work?
add a suffix for the mos pipeline, at the moment this is i2d which is the same as resample (alternatively resample could be given a different perhaps resample suffix).

I agree that it's good if the step defaults are the same as the ELP defaults for the pipeline runs.

I think that should be the case now with this PR.

Sorry, I was confusing source_detection & source_catalog! We have in this build deprecated source_detection in favor of source_catalog and do not run source_detection any longer (though we haven't actually removed the code & docs yet).

Good to know! For this PR what do you suggest we do? We could just ignore it (and it will use the default sourcedetectionstep suffix).

schlafly · 2024-08-05T16:49:03Z

Let's just not do anything for sourcedetection for this PR. We will delete it.

For resample, I would have said that the current default was "resamplestep."

romancal/romancal/scripts/make_regtestdata.sh

Line 63 in ebc1730

cp mosaic_resamplestep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

unless that code has gone out of date. Changing that to resample sounds more idiomatic to me but would require renaming the regression test files, or leave it as resamplestep for now and avoid that.

Yes, in my head "resamplestep" was the current default for resample and "i2d" was the default for the pipeline; you're likely right if you think something different, but it seems to me that that's effectively what's been happening in the code.

For tweakreg, I agree that "tweakreg" would be the obvious suffix. Currently to me it looks like it is "tweakregstep"---similar scenario as to resample.

romancal/romancal/scripts/make_regtestdata.sh

Line 166 in ebc1730

    
           cp ${basename}_shift_tweakregstep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

braingram · 2024-08-05T17:22:19Z

Let's just not do anything for sourcedetection for this PR. We will delete it.

Thanks. Will do.

For resample, I would have said that the current default was "resamplestep."

On main it depends on how resample is run. As a stand-alone step it will use resamplestep (any step that doesn't define a suffix will use the lower cased step class name because of Step.default_suffix). However when run (with results saved) as part of the mos pipeline the suffix will be i2d:

romancal/romancal/pipeline/mosaic_pipeline.py

Line 140 in ebc1730

self.resample.suffix = "i2d"

unless that code has gone out of date. Changing that to resample sounds more idiomatic to me but would require renaming the regression test files, or leave it as resamplestep for now and avoid that.

Yes, in my head "resamplestep" was the current default for resample and "i2d" was the default for the pipeline; you're likely right if you think something different, but it seems to me that that's effectively what's been happening in the code.

For tweakreg, I agree that "tweakreg" would be the obvious suffix. Currently to me it looks like it is "tweakregstep"---similar scenario as to resample.

romancal/romancal/scripts/make_regtestdata.sh

Line 166 in ebc1730

cp ${basename}_shift_tweakregstep.asdf $outdir/roman-pipeline/dev/truth/WFI/image/

Exactly. Those 2 regression test files would need to be updated if we change:

resamplestep to resample
tweakregstep to tweakreg

I don't feel strongly one way or the other and am happy to update this PR to set resamplestep as the default suffix for resample (and similarly for tweakregstep for tweakreg). I think it's still worth adding the suffix to the spec for those steps since it appears in the documentation (here's an example with this PR for the flux step):
https://roman-pipeline--1347.org.readthedocs.build/en/1347/api/romancal.flux.FluxStep.html#romancal.flux.FluxStep.spec

Does the following todo look good for finishing up this PR:

set resample step suffix to resamplestep
set tweakreg step suffix to tweakregstep
sort out what to do about the dark unit test

I'm also ok with using resample and tweakreg instead of resamplestep and tweakregstep and if we go that route I'll update the script in this PR (I will have questions about how to run it though). Let me know what works for you.

ddavis-stsci · 2024-08-05T17:51:31Z

I think we should try and follow the naming convention of the https://roman-pipeline.readthedocs.io/en/latest/roman/data_products/product_types.html page and not try and make things up as we go along. This is based on the JWST naming conventions when possible.

We can certainly update the page if we think that is necessary but I think we should at least start from there.

schlafly · 2024-08-05T18:42:41Z

I think Brett's current proposal is to do exactly that.

I actually would prefer his other suggestion, which would be to update the the two steps that have "step" in their name and don't follow the pattern of the other suffixes (resample, tweakreg), replacing their suffixes with "resample" and "tweakreg". But I don't feel strongly; @ddavis-stsci , if you'd rather keep to exactly what's on that page, that's less effort for Brett but looks a little goofy.

There are also cases like outlier detection and source detection which we no longer use where I would propose we do nothing.

Brett, I agree that it's worth adding the suffix to the specs for resample & tweakreg even if they end up using the stpipe defaults.

ddavis-stsci · 2024-08-05T18:42:55Z

Also this is not honoring the save_results option,
strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp $input_filename
should save the catalog step. I'm only seeing the _cal.asdf file.

ddavis-stsci · 2024-08-05T18:54:13Z

Also the tweakreg step should be listed as skipped but I'm seeing,
...
saturation COMPLETE
skymatch INCOMPLETE
tweakreg INCOMPLETE

braingram · 2024-08-05T19:00:02Z

Also this is not honoring the save_results option, strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp $input_filename should save the catalog step. I'm only seeing the _cal.asdf file.

Thanks for taking a look. Is there a regtest for this?

I tested this locally running:

strun --steps.source_catalog.save_results=True --steps.tweakreg.skip=True roman_elp /Users/bgraham/test_bigdata/roman-pipeline/dev/WFI/image/r0000101001001001001_01101_0001_WFI01_uncal.asdf

This produced 4 asdf files:

r0000101001001001001_01101_0001_WFI01_cal.asdf
r0000101001001001001_01101_0001_WFI01_cat.asdf
r0000101001001001001_01101_0001_WFI01_segm.asdf
r0000101001001001001_01101_0001_WFI01_sourcecatalog.asdf

Also the tweakreg step should be listed as skipped but I'm seeing,
...
saturation COMPLETE
skymatch INCOMPLETE
tweakreg INCOMPLETE

The same occurs on main see spacetelescope/roman_datamodels#343

braingram · 2024-08-05T19:15:55Z

There are also cases like outlier detection and source detection which we no longer use where I would propose we do nothing.

Is outlier detection no longer used?

ddavis-stsci · 2024-08-05T19:30:09Z

In general the step is appended when stpipe cannot determine a rule for the suffix. So it takes the step name and just appends step to it for a suffix, so "rampfit" + "step" So it is confusing, at least to me, to have the default the same as if the rule is unknown. outlier detection is used in the mosaic pipeline. Source detection is being dropped in favor of source catalog. I've kept the _sourcedetection suffix since that is what SDF is looking for. I can change this but we need to coordinate with SDF so they can update their scripts and ingest process.

…

On 8/5/24 2:43 PM, Eddie Schlafly wrote: I think Brett's current proposal is to do exactly that. I actually would prefer his other suggestion, which would be to update the the two steps that have "step" in their name and don't follow the pattern of the other suffixes (resample, tweakreg), replacing their suffixes with "resample" and "tweakreg". But I don't feel strongly; @ddavis-stsci <https://urldefense.com/v3/__https://github.com/ddavis-stsci__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!wxebIRdV302vS299Dk5L7-sh_vzQeVpv5yLuoa-4lH6c8j-wOIl5zbSh5Oua0OzgSTBxp5YDczDzLdHoZlKgFb4z$> , if you'd rather keep to exactly what's on that page, that's less effort for Brett but looks a little goofy. There are also cases like outlier detection and source detection which we no longer use where I would propose we do nothing. Brett, I agree that it's worth adding the suffix to the specs for resample & tweakreg even if they end up using the stpipe defaults. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1347*issuecomment-2269683468__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!wxebIRdV302vS299Dk5L7-sh_vzQeVpv5yLuoa-4lH6c8j-wOIl5zbSh5Oua0OzgSTBxp5YDczDzLdHoZgyw4Bw-$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWJJNAACU4NTF7MGQMDZP7BTPAVCNFSM6AAAAABL3KOBEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGY4DGNBWHA__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!wxebIRdV302vS299Dk5L7-sh_vzQeVpv5yLuoa-4lH6c8j-wOIl5zbSh5Oua0OzgSTBxp5YDczDzLdHoZuuWREUc$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

braingram · 2024-08-05T19:43:11Z

This is a recent change then...

Not that I'm aware of. I opened the issue when I found it in April.

Also stpipe thinks the step is skipped but just not recording it?

Yes, the issue is with recording the skip. The step is still skipped.

In general the step is appended when stpipe cannot determine a rule for the suffix.

stpipe uses a lower cased representation of Step.name:
https://github.com/spacetelescope/stpipe/blob/fc937a231920e1ae737e42cd8ea670dc23d2feb0/src/stpipe/step.py#L716
Step.name defaults to Step.__class__.__name__ when not provided:
https://github.com/spacetelescope/stpipe/blob/fc937a231920e1ae737e42cd8ea670dc23d2feb0/src/stpipe/step.py#L353C24-L353C43
This leads to a suffix that differs between step runs in isolation vs as part of a pipeline.
To use RefPixStep as an example. When run in isolation (with strun roman.step.RefPixStep) the Step.name will be RefPixStep (since no name was provided when the step was created). Without a suffix in the spec the suffix for the produced file will be refpixstep.
However, when run as part of elp, Step.name will be refpix (because of the step defs) and the produced file will have a suffix refpix.

So it is confusing, at least to me, to have the default the same as if the rule is unknown.

Including the suffix in the spec allows it to have a consistent name for pipeline and non-pipeline runs. The default rule(s) do not (and are honestly quite confusing).

zacharyburnett · 2024-09-06T13:45:04Z

CHANGES.rst

+- Provide suffix in Step.spec for steps to allow default suffix use
+  within and outside Pipeline runs [#1347]
+


Suggested change

- Provide suffix in Step.spec for steps to allow default suffix use

within and outside Pipeline runs [#1347]

now that #1375 is merged (switching change log handling to towncrier) this change log entry should be a file in changes/ instead:

echo "Provide suffix in Step.spec for steps to allow default suffix use within and outside Pipeline runs" > changes/1347.general.rst

(this is just because #1375 was merged while active PRs existed; new PRs will have these instructions in their checklist)

braingram · 2024-11-05T22:07:22Z

Closing this as stale.

github-actions bot added Dark Current Photom jump Saturation linearity pipeline assign_wcs dq_init flatfield labels Aug 1, 2024

github-actions bot added the regression_testing label Aug 2, 2024

braingram changed the title ~~Spec suffix~~ Add suffix to Step.spec Aug 2, 2024

braingram marked this pull request as ready for review August 2, 2024 17:03

braingram requested a review from a team as a code owner August 2, 2024 17:03

This comment was marked as outdated.

Sign in to view

braingram commented Aug 2, 2024

View reviewed changes

braingram force-pushed the spec_suffix branch from 45d6d07 to 86d8327 Compare August 2, 2024 17:13

github-actions bot added the documentation Improvements or additions to documentation label Aug 5, 2024

braingram force-pushed the spec_suffix branch from 8fc4608 to 16e19ce Compare August 5, 2024 19:35

braingram force-pushed the spec_suffix branch from 16e19ce to f307160 Compare August 8, 2024 11:37

braingram added 22 commits August 23, 2024 15:27

add suffix to elp spec

7ee7b4f

add suffix to assign_wcs spec

6ccbeb4

add suffix to dark current spec

c2ed15e

add suffix to dqinit spec

1802676

add suffix to flat spec

009b76c

add suffix to flux step

19c6b08

add suffix to jump spec

b8ab132

add suffix to linearity spec

f79cabb

add suffix to outlier spec

065f1c9

add suffix to photom spec

e78e648

add suffix to refpix spec

34571f2

add suffix to resample spec

d5e4cb6

add suffix to saturation spec

36292a5

add suffix to skymatch spec

d8cbbc6

remove step suffix setting in pipelines

1949dd0

fix typo in dark spec update

adca03d

remove incorrect dark current test

3f1611b

add suffix to mosaic pipeline spec

c5c01f6

update changelog

4140b58

add suffix to resample and tweakreg specs

45b1c61

update make_regtestdata.sh for new filenames

bba9bbd

update data products table

1688dad

braingram force-pushed the spec_suffix branch from f307160 to 1688dad Compare August 23, 2024 19:27

zacharyburnett requested changes Sep 6, 2024

View reviewed changes

braingram closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `suffix` to `Step.spec` #1347

Add `suffix` to `Step.spec` #1347

braingram commented Aug 1, 2024 •

edited

Loading

codecov bot commented Aug 1, 2024 •

edited

Loading

This comment was marked as outdated.

braingram Aug 2, 2024

braingram Aug 2, 2024

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

schlafly commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

braingram commented Aug 5, 2024

braingram commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024 via email

braingram commented Aug 5, 2024

zacharyburnett Sep 6, 2024

braingram commented Nov 5, 2024

	if self.save_results:
	try:
	self.suffix = "assignwcs"
	except AttributeError:
	self["suffix"] = "assignwcs"

		- Provide suffix in Step.spec for steps to allow default suffix use
		within and outside Pipeline runs [#1347]

Add suffix to Step.spec #1347

Add suffix to Step.spec #1347

Conversation

braingram commented Aug 1, 2024 • edited Loading

codecov bot commented Aug 1, 2024 • edited Loading

Codecov Report

This comment was marked as outdated.

braingram Aug 2, 2024

Choose a reason for hiding this comment

braingram Aug 2, 2024

Choose a reason for hiding this comment

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

schlafly commented Aug 5, 2024

braingram commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

schlafly commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024

braingram commented Aug 5, 2024

braingram commented Aug 5, 2024

ddavis-stsci commented Aug 5, 2024 via email

braingram commented Aug 5, 2024

zacharyburnett Sep 6, 2024

Choose a reason for hiding this comment

braingram commented Nov 5, 2024

Add `suffix` to `Step.spec` #1347

Add `suffix` to `Step.spec` #1347

braingram commented Aug 1, 2024 •

edited

Loading

codecov bot commented Aug 1, 2024 •

edited

Loading