Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-39842: Add new Task exceptions and error annotation #400

Merged
merged 3 commits into from
Mar 22, 2024

Conversation

parejkoj
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

python/lsst/pipe/base/_status.py Outdated Show resolved Hide resolved
that makes that detector entirely non-recoverable.

Do not raise this unless we are convinced that the data cannot be
processed, even by a better algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "even by a better algorithm" may be too strong, but I'm not sure what to say instead. The RFC may be a good place to workshop the language on this.

python/lsst/pipe/base/_status.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/_status.py Outdated Show resolved Hide resolved
This can only be called from within an except block that has caught
the given exception.
"""
# NOTE: can't test this in pipe_base because afw is not a dependency!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could get some coverage with a test class that has a getMetadata() that returns a TaskMetadata, since TaskMetadata behaves very much like PropertyList. But it's probably worthwhile to add a comment on the CalibrateImageTask tests that they're providing important coverage here.

Once we have this kind of annotation in place for the kind of astrometry failure that happens when the fit nominally succeeds but the scatter is too large, we can also test the persisted form of the annotation with less mocking in ci_hsc, since we intentionally trigger one of those failures there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like there were some subtle differences between TaskMetadata and PropertyList in what the result looked like, but maybe I was just confused about the TaskMetadata nested API. I was also surprised by the fact that there wasn't a metadata property on Exposure/Catalog. Those both make me worried about a mock-like test in this package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskMetadata is not meant to exactly match PropertySet behavior (not PropertyList since that is not hierarchical). It exists solely so that pipe_base didn't have to depend on daf_base but people could still use similar APIs as they always had when creating the output task metadata files. I think I need to re-read the RFC because I'm confused as to why PropertyList/Set is relevant to error annotations and task exception tracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnnotatedPartialOutputsError.annotate annotates the task outputs and calls task.annotate_failure to annotate the task metadata itself. Those are PropertyList and TaskMetadata, respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now really confused because middleware can not import anything from daf_base and runQuantum can't know anything about PropertyList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exact questions are causing me trouble with the type annotation in this file: what type do I annotate for exposures and catalogs, given that afw is not (and should not be) a dependency here?

Copy link
Member

@TallJimbo TallJimbo Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't appreciated the dependency issue until @timj raised it, and I do agree it's a problem. An implicit dependency is not nearly as bad as a direct one, but it'd be much better to formalize a dependency in the right direction by having pipe_base declare an interface that afw objects can implement, and that will solve your typing problem. Given that the afw objects are C++ things that can't inherit from Python ABCs, this is a job for typing.Protocol. I'm thinking something like this (probably best put into _task_metadata.py):

from collections.abc import Set
from typing import Protocol, Self

class MetadataMapping(Protocol):
     """Interface satisfied by `PropertyList`, `PropertySet`, `TaskMetadata`,
     and any nested `collections.abc.Mapping` with `str` keys and
     sufficiently simple value types.

    Objects that satisfy this interface may be assigned as values to
    `MetadataContainer` objects.
    """

    def keys(self) -> Set[str]:
        ...

    def __getitem__(self, key: str) -> int | str | float | MetadataMapping:
        ...

class MetadataContainer(MetadataGetItem, Protocol): 
    """Interface satisfied by `PropertyList`, `PropertySet`, and `TaskMetadata`.
    """

    def keys(self) -> Set[str]:
        ...

    def __getitem__(self, key: str) -> int | str | float | MetadataContainer:
        ...

    # the whole reason `MetadataMapping` exists is to capture the fact that you can pass `dict` here.
    def __setitem__(self, key: str, value: int | str | float | MetadataMapping) -> None:
       ...

    # one more method; see comment thread on PropertyList/TaskMetadata differences.

We could add more methods (and inheritance from another Protocol, PropertySetLike, which is defined in _task_metadata.py) for future use, but I think this is all this branch requires.

Once we have that, we can add another Protocol that Task, Exposure, and SourceCatalog could all satisfy:

class HasMetadata(Protocol):
    metadata: MetadataContainer

If Exposure and SourceCatalog do not have read-write metadata properties they should, and I consider adding them better than putting getMetadata and setMetadata into a Protocol in lsst.pipe.base.

I understand if this is rather more typing-fu than you'd want to include on this branch (especially since the above snippets are untested first drafts); if you want to make me another ticket I can try to get that stuff in shape for you to rebase on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this means that the new interface doesn't require someone to call PropertyList, getMetadata and setMetadata inside runQuantum then that sounds great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@TallJimbo TallJimbo Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put some new methods and Protocols up on #403 (with matching methods in daf_base and afw), but they're not quite what I wrote above. Instead, I've just added get_dict and set_dict methods that set and get true dict objects with arbitrary nesting in a consistent way on all three types (TaskMetadata, PropertySet, and PropertyList). The idea is that you'd package up everything you want into one nested dict (exception message, type, and error metadata), and stuff all of that into the .metadata attribute at once. Something like this:

from ._task_metadata import NestedMetadataDict

class RepeatableQuantumError(RuntimeError, abc.ABC):

    @property
    @abc.abstractmethod
    def metadata(self) -> NestedMetadataDict | None:
        """Metadata from the raising `~lsst.pipe.base.Task` with more information
        about the failure. The contents of the dict are `~lsst.pipe.base.Task`
        -dependent, and must have `str` keys and `str`, `int`, `float`, `bool`, or
        nested-dictionary (with the same key and value types) values.
        """
        raise NotImplementedError()

class AnnotatedPartialOutputsError(RepeatableQuantumError):
    ...

    @classmethod
    def annotate(cls, exception: Exception, *args: GetSetDictMetadataHolder) -> AnnotatedPartialOutputsError:
        failure_info = {
            "message": str(exception),
            "type": introspection.get_full_type_name(exception),
        }
        if other := getattr(exception, "metadata", None):
            failure_info["metadata"] = other
        for item in args:
            # Some outputs may not exist, so we cannot set metadata on them; ignore
            # those here for convenience.
            if item is None or item.metadata is None:
                continue
            item.metadata.set_dict("failure", failure_info)
        return cls("Task completed with only partial outputs: see chained exception for details.")

One nice thing about the new methods and properties is that they let you treat tasks, exposures, and catalogs identically, so there's no need for separate kwargs that prejudge the kinds of things we might want to annotate.

python/lsst/pipe/base/_status.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/task.py Outdated Show resolved Hide resolved
self.assertEqual(task.metadata["failure_message"], msg)
result = "test_task.TaskTestCase.test_annotate_failure_task_exception.<locals>.TestError"
self.assertEqual(task.metadata["failure_type"], result)
# TODO: can we make this work without it being in "`scalars`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having it in scalars may be a sign that it won't work on PropertyList and hence FITS headers. I thought TaskMetadata would flatten out that dictionary itself, but maybe not? In any case, I think the important thing to do is some one-off testing with PropertyList (or use a downstream unit test), and add a public method somewhere in pipe_base to extract these annotations so we can hide whatever messiness we might need behind it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskMetadata is trying to be a bit like PropertyList but in Python. The internal pydantic model separates list items from scalar items but you aren't really meant to be looking inside that (but pydantic doesn't let me call them private properties).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the test doing:

self.assertEqual(task.metadata["failure_metadata"]["something"], 12345)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you want to write TaskMetadata to FITS headers then copying to PropertyList should work fine. I'm not sure what the use case is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timj : I didn't know what the API for nested metadata was here; that makes the test cleaner, thank you.

I'd suggest that nested metadata like this not print like this:

ipdb> task.metadata["failure_metadata"]
TaskMetadata(scalars={'something': 12345}, arrays={}, metadata={})

since that doesn't apply to me that you can just access the scalars by name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the default stringification of the Pydantic model (effectively repr). I think a __str__ implementation that returned a dict like thing should be fine. TaskMetadata is designed to match PropertyList without the C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did some experimenting and we do have an important asymmetry between TaskMetadata and PropertyList that we may need to work around (PropertySet behaves like TaskMetadata and unlike PropertyList):

t = TaskMetadata()
t["a"] = {"b": 3}
assert isinstance(t["a"], TaskMetadata)
assert t["a"]["b"] == 3

but with

p = PropertyList()
p["a"] = {"b": 3}

accessing p["a"] raises KeyError, because the assignment flattens the dict out into p["a.b"] = 3. @timj , do you know if there is a dedicated method to reconstruct those flattened mappings that works on all of PropertySet / PropertyList / TaskMetadata? If not, I think we should add one.

As an aside, while testing this I also discovered that TaskMetadata(b=3) just silently swallows the b=3 and does nothing with it, probably because of something Pydantic-related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PropertyList is not hierarchical but PropertySet and TaskMetadata are. PropertyList probably should not inherit from PropertySet. TaskMetadata is trying to do it all in Python but was never designed for PropertyList emulation, partly because Task metadata was historically a PropertySet. PropertyList in afw.Exposure has always been an interesting situation since it's sort of trying to emulate a FITS header without saying it emulates a FITS header (and the array values are problematic in FITS and best avoided in general). Ideally we would read back HIERARCH FITS headers into PropertyList and get a real hierarchy as in PropertySet.

Re the question on flattened mappings. You mean you want PropertyList to work like PropertySet and TaskMetadata so that p["a"] will return a PropertyList?

Yes, the TaskMetadata constructor doesn't really want you to pass any arguments into it because the pydantic model gets in the way. It's why the from_dict class method exists. from_metadata is trying to be the way you convert from a PropertySet to a TaskMetadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PropertyList probably should not inherit from PropertySet.

💯 and I once tried to fix this but it was a nightmare.

You mean you want PropertyList to work like PropertySet and TaskMetadata so that p["a"] will return a PropertyList?

I want a method that does that on all three classes, but I don't need it to be __getitem__, since I'm sure making __getitem__ do that on PropertyList would break lots of stuff. In the other two classes it could just delegate to __getitem__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving this conversation over to DM-42928, once that's been pushed? I think it's more appropriate for that one (and then this ticket can just use whatever is implemented there).

@parejkoj parejkoj force-pushed the tickets/DM-39842 branch 4 times, most recently from ada5ace to 3801e5b Compare February 14, 2024 23:48
# Some outputs may not exist, so we cannot set metadata on them.
if item is None:
continue
if (metadata := item.getMetadata()) is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have the calling code responsible for setting .metadata to not-None before this code is invoked. Is there a reason we can't assume/require that?

Copy link
Contributor Author

@parejkoj parejkoj Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so keen on having to set an empty PropertyList on a SourceCatalog before calling annotate, during error handling. Could we make the SourceCatalog or SimpleCatalog constructor (maybe not BaseTable, but SourceCatalog is the one most commonly used) set _metadata to an empty PropertyList, so we can assume it will exist? It looks like ExposureInfo defaults to an empty PropertySet, so there is precedent.

There are other things I'd like to annotate with this new-found power: ArrowAstropy outputs, for example, or BackgroundList! Sadly, Astropy.Table has named their equivalent "OrderedDict of whatever", meta instead.

Copy link
Contributor Author

@parejkoj parejkoj Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My dream would be to just pass the result Struct (or maybe result.getDict()?) into this annotate and let it annotate all the non-None items. That's probably wishful thinking though.

Part of me is tempted to put this

if isinstance(item, astropy.table.Table):
   item.meta.update(metadata)

in the loop, to get closer to that...

I wonder if it wouldn't benefit us to add a self.meta = self.metadata to Task, Exposure, Catalog, etc., so as to look more like them, and then make that be the interface here instead?

Copy link
Member

@TallJimbo TallJimbo Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having the SourceTable constructor initialize a PropertyList (it'll be SourceTable rather than SourceCatalog you want to modify, because it's BaseTable that holds the actual shared_ptr<PropertyList>.

Having a meta alias on our objects doesn't actually bring us enough closer to astropy.table.Table because I'm sure their meta object does not have the set_dict and get_dict methods I added to PropertyList, PropertySet, and TaskMetadata on the other ticket.

Maybe an adapter class and context manager would work:

@dataclasses.dataclass
class GetSetDictMetadataAdapter:
    metadata: TaskMetadata

@contextlib.contextmanager
def adapted_astropy_metadata(table: astropy.table.Table):
    metadata = TaskMetadata()
    yield GetSetDictMetadataAdapter(metadata)
    table.meta.update(metadata.to_dict())

That doesn't necessarily get you closer to being able to pass in just the struct to annotate, but I don't think we want annotate to have to do runtime checking to see if each item in the struct actually can hold metadata anyway. I like making what's annotated explicit in the task implementation, too.

This can only be called from within an except block that has caught
the given exception.
"""
# NOTE: can't test this in pipe_base because afw is not a dependency!
Copy link
Member

@TallJimbo TallJimbo Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put some new methods and Protocols up on #403 (with matching methods in daf_base and afw), but they're not quite what I wrote above. Instead, I've just added get_dict and set_dict methods that set and get true dict objects with arbitrary nesting in a consistent way on all three types (TaskMetadata, PropertySet, and PropertyList). The idea is that you'd package up everything you want into one nested dict (exception message, type, and error metadata), and stuff all of that into the .metadata attribute at once. Something like this:

from ._task_metadata import NestedMetadataDict

class RepeatableQuantumError(RuntimeError, abc.ABC):

    @property
    @abc.abstractmethod
    def metadata(self) -> NestedMetadataDict | None:
        """Metadata from the raising `~lsst.pipe.base.Task` with more information
        about the failure. The contents of the dict are `~lsst.pipe.base.Task`
        -dependent, and must have `str` keys and `str`, `int`, `float`, `bool`, or
        nested-dictionary (with the same key and value types) values.
        """
        raise NotImplementedError()

class AnnotatedPartialOutputsError(RepeatableQuantumError):
    ...

    @classmethod
    def annotate(cls, exception: Exception, *args: GetSetDictMetadataHolder) -> AnnotatedPartialOutputsError:
        failure_info = {
            "message": str(exception),
            "type": introspection.get_full_type_name(exception),
        }
        if other := getattr(exception, "metadata", None):
            failure_info["metadata"] = other
        for item in args:
            # Some outputs may not exist, so we cannot set metadata on them; ignore
            # those here for convenience.
            if item is None or item.metadata is None:
                continue
            item.metadata.set_dict("failure", failure_info)
        return cls("Task completed with only partial outputs: see chained exception for details.")

One nice thing about the new methods and properties is that they let you treat tasks, exposures, and catalogs identically, so there's no need for separate kwargs that prejudge the kinds of things we might want to annotate.

Copy link

codecov bot commented Feb 23, 2024

Codecov Report

Attention: Patch coverage is 92.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 82.36%. Comparing base (02714a3) to head (eb16be8).

Files Patch % Lines
python/lsst/pipe/base/_quantumContext.py 0.00% 1 Missing and 1 partial ⚠️
python/lsst/pipe/base/_status.py 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #400      +/-   ##
==========================================
- Coverage   82.37%   82.36%   -0.01%     
==========================================
  Files          93       93              
  Lines       10614    10645      +31     
  Branches     2010     2021      +11     
==========================================
+ Hits         8743     8768      +25     
- Misses       1518     1521       +3     
- Partials      353      356       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@parejkoj parejkoj force-pushed the tickets/DM-39842 branch 8 times, most recently from c885f56 to 56002e5 Compare February 28, 2024 01:31
@parejkoj parejkoj force-pushed the tickets/DM-39842 branch 3 times, most recently from 8758563 to 780427c Compare March 19, 2024 18:25
Copy link

@mrawls mrawls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK if the linting is also OK

@@ -371,7 +372,9 @@ def put(
" attributes must be passed as the values to put"
)
for name, refs in dataset:
valuesAttribute = getattr(values, name)
valuesAttribute = getattr(values, name, None)
if valuesAttribute is None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the auto-warning complaining because the syntax here is if thing is None instead of if not thing ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "auto-warning" are you talking about? I don't see one in Github CI below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now a walrus, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code coverage warning maybe the cause of confusion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AHA, yes, that was confusing me, I guess it's not a linter warning, but a test coverage warning. Weird that it always seemed to coincide with your usage of if thing is None 😆

# dependency; test_calibrateImage.py in pipe_tasks gives more coverage.
for item in args:
# Some outputs may not exist, so we cannot set metadata on them.
if item is None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Q as above, can this still have the desired behavior with the syntax if not item ?

@parejkoj parejkoj merged commit 7c47f2a into main Mar 22, 2024
13 checks passed
@parejkoj parejkoj deleted the tickets/DM-39842 branch March 22, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants