DM-39842: Add new Task exceptions and error annotation #400

parejkoj · 2024-01-26T00:52:44Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

python/lsst/pipe/base/_status.py

TallJimbo · 2024-02-14T17:41:06Z

python/lsst/pipe/base/_status.py

+    that makes that detector entirely non-recoverable.
+
+    Do not raise this unless we are convinced that the data cannot be
+    processed, even by a better algorithm.


I think "even by a better algorithm" may be too strong, but I'm not sure what to say instead. The RFC may be a good place to workshop the language on this.

python/lsst/pipe/base/_status.py

TallJimbo · 2024-02-14T17:46:02Z

python/lsst/pipe/base/_status.py

+        This can only be called from within an except block that has caught
+        the given exception.
+        """
+        # NOTE: can't test this in pipe_base because afw is not a dependency!


You could get some coverage with a test class that has a getMetadata() that returns a TaskMetadata, since TaskMetadata behaves very much like PropertyList. But it's probably worthwhile to add a comment on the CalibrateImageTask tests that they're providing important coverage here.

Once we have this kind of annotation in place for the kind of astrometry failure that happens when the fit nominally succeeds but the scatter is too large, we can also test the persisted form of the annotation with less mocking in ci_hsc, since we intentionally trigger one of those failures there.

It seemed like there were some subtle differences between TaskMetadata and PropertyList in what the result looked like, but maybe I was just confused about the TaskMetadata nested API. I was also surprised by the fact that there wasn't a metadata property on Exposure/Catalog. Those both make me worried about a mock-like test in this package.

TaskMetadata is not meant to exactly match PropertySet behavior (not PropertyList since that is not hierarchical). It exists solely so that pipe_base didn't have to depend on daf_base but people could still use similar APIs as they always had when creating the output task metadata files. I think I need to re-read the RFC because I'm confused as to why PropertyList/Set is relevant to error annotations and task exception tracking.

AnnotatedPartialOutputsError.annotate annotates the task outputs and calls task.annotate_failure to annotate the task metadata itself. Those are PropertyList and TaskMetadata, respectively.

I'm now really confused because middleware can not import anything from daf_base and runQuantum can't know anything about PropertyList.

These exact questions are causing me trouble with the type annotation in this file: what type do I annotate for exposures and catalogs, given that afw is not (and should not be) a dependency here?

I hadn't appreciated the dependency issue until @timj raised it, and I do agree it's a problem. An implicit dependency is not nearly as bad as a direct one, but it'd be much better to formalize a dependency in the right direction by having pipe_base declare an interface that afw objects can implement, and that will solve your typing problem. Given that the afw objects are C++ things that can't inherit from Python ABCs, this is a job for typing.Protocol. I'm thinking something like this (probably best put into _task_metadata.py):

from collections.abc import Set from typing import Protocol, Self class MetadataMapping(Protocol): """Interface satisfied by `PropertyList`, `PropertySet`, `TaskMetadata`, and any nested `collections.abc.Mapping` with `str` keys and sufficiently simple value types. Objects that satisfy this interface may be assigned as values to `MetadataContainer` objects. """ def keys(self) -> Set[str]: ... def __getitem__(self, key: str) -> int | str | float | MetadataMapping: ... class MetadataContainer(MetadataGetItem, Protocol): """Interface satisfied by `PropertyList`, `PropertySet`, and `TaskMetadata`. """ def keys(self) -> Set[str]: ... def __getitem__(self, key: str) -> int | str | float | MetadataContainer: ... # the whole reason `MetadataMapping` exists is to capture the fact that you can pass `dict` here. def __setitem__(self, key: str, value: int | str | float | MetadataMapping) -> None: ... # one more method; see comment thread on PropertyList/TaskMetadata differences.

We could add more methods (and inheritance from another Protocol, PropertySetLike, which is defined in _task_metadata.py) for future use, but I think this is all this branch requires.

Once we have that, we can add another Protocol that Task, Exposure, and SourceCatalog could all satisfy:

class HasMetadata(Protocol): metadata: MetadataContainer

If Exposure and SourceCatalog do not have read-write metadata properties they should, and I consider adding them better than putting getMetadata and setMetadata into a Protocol in lsst.pipe.base.

I understand if this is rather more typing-fu than you'd want to include on this branch (especially since the above snippets are untested first drafts); if you want to make me another ticket I can try to get that stuff in shape for you to rebase on.

If this means that the new interface doesn't require someone to call PropertyList, getMetadata and setMetadata inside runQuantum then that sounds great.

@TallJimbo, done: DM-42928

I've put some new methods and Protocols up on #403 (with matching methods in daf_base and afw), but they're not quite what I wrote above. Instead, I've just added get_dict and set_dict methods that set and get true dict objects with arbitrary nesting in a consistent way on all three types (TaskMetadata, PropertySet, and PropertyList). The idea is that you'd package up everything you want into one nested dict (exception message, type, and error metadata), and stuff all of that into the .metadata attribute at once. Something like this:

from ._task_metadata import NestedMetadataDict class RepeatableQuantumError(RuntimeError, abc.ABC): @property @abc.abstractmethod def metadata(self) -> NestedMetadataDict | None: """Metadata from the raising `~lsst.pipe.base.Task` with more information about the failure. The contents of the dict are `~lsst.pipe.base.Task` -dependent, and must have `str` keys and `str`, `int`, `float`, `bool`, or nested-dictionary (with the same key and value types) values. """ raise NotImplementedError() class AnnotatedPartialOutputsError(RepeatableQuantumError): ... @classmethod def annotate(cls, exception: Exception, *args: GetSetDictMetadataHolder) -> AnnotatedPartialOutputsError: failure_info = { "message": str(exception), "type": introspection.get_full_type_name(exception), } if other := getattr(exception, "metadata", None): failure_info["metadata"] = other for item in args: # Some outputs may not exist, so we cannot set metadata on them; ignore # those here for convenience. if item is None or item.metadata is None: continue item.metadata.set_dict("failure", failure_info) return cls("Task completed with only partial outputs: see chained exception for details.")

One nice thing about the new methods and properties is that they let you treat tasks, exposures, and catalogs identically, so there's no need for separate kwargs that prejudge the kinds of things we might want to annotate.

python/lsst/pipe/base/_status.py

python/lsst/pipe/base/task.py

TallJimbo · 2024-02-14T17:50:45Z

tests/test_task.py

+        self.assertEqual(task.metadata["failure_message"], msg)
+        result = "test_task.TaskTestCase.test_annotate_failure_task_exception.<locals>.TestError"
+        self.assertEqual(task.metadata["failure_type"], result)
+        # TODO: can we make this work without it being in "`scalars`?


I think having it in scalars may be a sign that it won't work on PropertyList and hence FITS headers. I thought TaskMetadata would flatten out that dictionary itself, but maybe not? In any case, I think the important thing to do is some one-off testing with PropertyList (or use a downstream unit test), and add a public method somewhere in pipe_base to extract these annotations so we can hide whatever messiness we might need behind it.

TaskMetadata is trying to be a bit like PropertyList but in Python. The internal pydantic model separates list items from scalar items but you aren't really meant to be looking inside that (but pydantic doesn't let me call them private properties).

Why isn't the test doing:

self.assertEqual(task.metadata["failure_metadata"]["something"], 12345)

?

Also, if you want to write TaskMetadata to FITS headers then copying to PropertyList should work fine. I'm not sure what the use case is.

@timj : I didn't know what the API for nested metadata was here; that makes the test cleaner, thank you.

I'd suggest that nested metadata like this not print like this:

ipdb> task.metadata["failure_metadata"] TaskMetadata(scalars={'something': 12345}, arrays={}, metadata={})

since that doesn't apply to me that you can just access the scalars by name.

That's the default stringification of the Pydantic model (effectively repr). I think a __str__ implementation that returned a dict like thing should be fine. TaskMetadata is designed to match PropertyList without the C++.

I just did some experimenting and we do have an important asymmetry between TaskMetadata and PropertyList that we may need to work around (PropertySet behaves like TaskMetadata and unlike PropertyList):

t = TaskMetadata() t["a"] = {"b": 3} assert isinstance(t["a"], TaskMetadata) assert t["a"]["b"] == 3

but with

p = PropertyList() p["a"] = {"b": 3}

accessing p["a"] raises KeyError, because the assignment flattens the dict out into p["a.b"] = 3. @timj , do you know if there is a dedicated method to reconstruct those flattened mappings that works on all of PropertySet / PropertyList / TaskMetadata? If not, I think we should add one.

As an aside, while testing this I also discovered that TaskMetadata(b=3) just silently swallows the b=3 and does nothing with it, probably because of something Pydantic-related.

PropertyList is not hierarchical but PropertySet and TaskMetadata are. PropertyList probably should not inherit from PropertySet. TaskMetadata is trying to do it all in Python but was never designed for PropertyList emulation, partly because Task metadata was historically a PropertySet. PropertyList in afw.Exposure has always been an interesting situation since it's sort of trying to emulate a FITS header without saying it emulates a FITS header (and the array values are problematic in FITS and best avoided in general). Ideally we would read back HIERARCH FITS headers into PropertyList and get a real hierarchy as in PropertySet.

Re the question on flattened mappings. You mean you want PropertyList to work like PropertySet and TaskMetadata so that p["a"] will return a PropertyList?

Yes, the TaskMetadata constructor doesn't really want you to pass any arguments into it because the pydantic model gets in the way. It's why the from_dict class method exists. from_metadata is trying to be the way you convert from a PropertySet to a TaskMetadata.

PropertyList probably should not inherit from PropertySet.

💯 and I once tried to fix this but it was a nightmare.

You mean you want PropertyList to work like PropertySet and TaskMetadata so that p["a"] will return a PropertyList?

I want a method that does that on all three classes, but I don't need it to be __getitem__, since I'm sure making __getitem__ do that on PropertyList would break lots of stuff. In the other two classes it could just delegate to __getitem__.

I'd suggest moving this conversation over to DM-42928, once that's been pushed? I think it's more appropriate for that one (and then this ticket can just use whatever is implemented there).

TallJimbo · 2024-02-20T19:55:59Z

python/lsst/pipe/base/_status.py

+            # Some outputs may not exist, so we cannot set metadata on them.
+            if item is None:
+                continue
+            if (metadata := item.getMetadata()) is None:


I'd prefer to have the calling code responsible for setting .metadata to not-None before this code is invoked. Is there a reason we can't assume/require that?

I'm not so keen on having to set an empty PropertyList on a SourceCatalog before calling annotate, during error handling. Could we make the SourceCatalog or SimpleCatalog constructor (maybe not BaseTable, but SourceCatalog is the one most commonly used) set _metadata to an empty PropertyList, so we can assume it will exist? It looks like ExposureInfo defaults to an empty PropertySet, so there is precedent.

There are other things I'd like to annotate with this new-found power: ArrowAstropy outputs, for example, or BackgroundList! Sadly, Astropy.Table has named their equivalent "OrderedDict of whatever", meta instead.

My dream would be to just pass the result Struct (or maybe result.getDict()?) into this annotate and let it annotate all the non-None items. That's probably wishful thinking though.

Part of me is tempted to put this

if isinstance(item, astropy.table.Table): item.meta.update(metadata)

in the loop, to get closer to that...

I wonder if it wouldn't benefit us to add a self.meta = self.metadata to Task, Exposure, Catalog, etc., so as to look more like them, and then make that be the interface here instead?

I'm fine with having the SourceTable constructor initialize a PropertyList (it'll be SourceTable rather than SourceCatalog you want to modify, because it's BaseTable that holds the actual shared_ptr<PropertyList>.

Having a meta alias on our objects doesn't actually bring us enough closer to astropy.table.Table because I'm sure their meta object does not have the set_dict and get_dict methods I added to PropertyList, PropertySet, and TaskMetadata on the other ticket.

Maybe an adapter class and context manager would work:

@dataclasses.dataclass class GetSetDictMetadataAdapter: metadata: TaskMetadata @contextlib.contextmanager def adapted_astropy_metadata(table: astropy.table.Table): metadata = TaskMetadata() yield GetSetDictMetadataAdapter(metadata) table.meta.update(metadata.to_dict())

That doesn't necessarily get you closer to being able to pass in just the struct to annotate, but I don't think we want annotate to have to do runtime checking to see if each item in the struct actually can hold metadata anyway. I like making what's annotated explicit in the task implementation, too.

TallJimbo · 2024-02-21T02:17:54Z

python/lsst/pipe/base/_status.py

+        This can only be called from within an except block that has caught
+        the given exception.
+        """
+        # NOTE: can't test this in pipe_base because afw is not a dependency!


I've put some new methods and Protocols up on #403 (with matching methods in daf_base and afw), but they're not quite what I wrote above. Instead, I've just added get_dict and set_dict methods that set and get true dict objects with arbitrary nesting in a consistent way on all three types (TaskMetadata, PropertySet, and PropertyList). The idea is that you'd package up everything you want into one nested dict (exception message, type, and error metadata), and stuff all of that into the .metadata attribute at once. Something like this:

from ._task_metadata import NestedMetadataDict class RepeatableQuantumError(RuntimeError, abc.ABC): @property @abc.abstractmethod def metadata(self) -> NestedMetadataDict | None: """Metadata from the raising `~lsst.pipe.base.Task` with more information about the failure. The contents of the dict are `~lsst.pipe.base.Task` -dependent, and must have `str` keys and `str`, `int`, `float`, `bool`, or nested-dictionary (with the same key and value types) values. """ raise NotImplementedError() class AnnotatedPartialOutputsError(RepeatableQuantumError): ... @classmethod def annotate(cls, exception: Exception, *args: GetSetDictMetadataHolder) -> AnnotatedPartialOutputsError: failure_info = { "message": str(exception), "type": introspection.get_full_type_name(exception), } if other := getattr(exception, "metadata", None): failure_info["metadata"] = other for item in args: # Some outputs may not exist, so we cannot set metadata on them; ignore # those here for convenience. if item is None or item.metadata is None: continue item.metadata.set_dict("failure", failure_info) return cls("Task completed with only partial outputs: see chained exception for details.")

One nice thing about the new methods and properties is that they let you treat tasks, exposures, and catalogs identically, so there's no need for separate kwargs that prejudge the kinds of things we might want to annotate.

codecov · 2024-02-23T08:18:48Z

Codecov Report

Attention: Patch coverage is 92.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 82.36%. Comparing base (02714a3) to head (eb16be8).

Files	Patch %	Lines
python/lsst/pipe/base/_quantumContext.py	0.00%	1 Missing and 1 partial ⚠️
python/lsst/pipe/base/_status.py	90.90%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #400      +/-   ##
==========================================
- Coverage   82.37%   82.36%   -0.01%     
==========================================
  Files          93       93              
  Lines       10614    10645      +31     
  Branches     2010     2021      +11     
==========================================
+ Hits         8743     8768      +25     
- Misses       1518     1521       +3     
- Partials      353      356       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mrawls

I'm OK if the linting is also OK

mrawls · 2024-03-21T19:27:48Z

python/lsst/pipe/base/_quantumContext.py

@@ -371,7 +372,9 @@ def put(
                    " attributes must be passed as the values to put"
                )
            for name, refs in dataset:
-                valuesAttribute = getattr(values, name)
+                valuesAttribute = getattr(values, name, None)
+                if valuesAttribute is None:


Is the auto-warning complaining because the syntax here is if thing is None instead of if not thing ?

What "auto-warning" are you talking about? I don't see one in Github CI below.

It's now a walrus, too.

I think the code coverage warning maybe the cause of confusion.

AHA, yes, that was confusing me, I guess it's not a linter warning, but a test coverage warning. Weird that it always seemed to coincide with your usage of if thing is None 😆

mrawls · 2024-03-21T19:28:33Z

python/lsst/pipe/base/_status.py

+        # dependency; test_calibrateImage.py in pipe_tasks gives more coverage.
+        for item in args:
+            # Some outputs may not exist, so we cannot set metadata on them.
+            if item is None:


Same Q as above, can this still have the desired behavior with the syntax if not item ?

This allows "optional" outputs in failure states.

parejkoj force-pushed the tickets/DM-39842 branch from 2608301 to ba4549f Compare February 9, 2024 08:31

TallJimbo reviewed Feb 14, 2024

View reviewed changes

parejkoj force-pushed the tickets/DM-39842 branch 4 times, most recently from ada5ace to 3801e5b Compare February 14, 2024 23:48

TallJimbo reviewed Feb 21, 2024

View reviewed changes

parejkoj force-pushed the tickets/DM-39842 branch from 3801e5b to 73295c3 Compare February 23, 2024 08:16

parejkoj force-pushed the tickets/DM-39842 branch 8 times, most recently from c885f56 to 56002e5 Compare February 28, 2024 01:31

parejkoj force-pushed the tickets/DM-39842 branch 3 times, most recently from 8758563 to 780427c Compare March 19, 2024 18:25

mrawls approved these changes Mar 21, 2024

View reviewed changes

parejkoj added 3 commits March 21, 2024 13:55

Add new Task exceptions and error annotations

da2b370

Allow None in butler put arguments

c301c39

This allows "optional" outputs in failure states.

remove test cruft

eb16be8

parejkoj force-pushed the tickets/DM-39842 branch from 780427c to eb16be8 Compare March 21, 2024 22:08

parejkoj merged commit 7c47f2a into main Mar 22, 2024
13 checks passed

parejkoj deleted the tickets/DM-39842 branch March 22, 2024 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-39842: Add new Task exceptions and error annotation #400

DM-39842: Add new Task exceptions and error annotation #400

parejkoj commented Jan 26, 2024

TallJimbo Feb 14, 2024

TallJimbo Feb 14, 2024

parejkoj Feb 14, 2024

timj Feb 14, 2024

parejkoj Feb 14, 2024

timj Feb 14, 2024

parejkoj Feb 14, 2024

TallJimbo Feb 15, 2024 •

edited

Loading

timj Feb 15, 2024

parejkoj Feb 16, 2024

TallJimbo Feb 21, 2024 •

edited

Loading

TallJimbo Feb 14, 2024

timj Feb 14, 2024

timj Feb 14, 2024

timj Feb 14, 2024

parejkoj Feb 14, 2024

timj Feb 14, 2024

TallJimbo Feb 15, 2024

timj Feb 15, 2024

TallJimbo Feb 15, 2024

parejkoj Feb 16, 2024

TallJimbo Feb 20, 2024

parejkoj Feb 23, 2024 •

edited

Loading

parejkoj Feb 23, 2024 •

edited

Loading

TallJimbo Feb 23, 2024 •

edited

Loading

TallJimbo Feb 21, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading

mrawls left a comment

mrawls Mar 21, 2024

parejkoj Mar 21, 2024

parejkoj Mar 21, 2024

timj Mar 21, 2024

mrawls Mar 21, 2024

mrawls Mar 21, 2024

DM-39842: Add new Task exceptions and error annotation #400

DM-39842: Add new Task exceptions and error annotation #400

Conversation

parejkoj commented Jan 26, 2024

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

parejkoj Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

TallJimbo Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

TallJimbo Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Feb 23, 2024 • edited Loading

Codecov Report

mrawls left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo Feb 15, 2024 •

edited

Loading

TallJimbo Feb 21, 2024 •

edited

Loading

parejkoj Feb 23, 2024 •

edited

Loading

parejkoj Feb 23, 2024 •

edited

Loading

TallJimbo Feb 23, 2024 •

edited

Loading

TallJimbo Feb 21, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading