Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render a root-cause exception for dependency and join errors #3717

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Dec 4, 2024

Description

This PR reworks two exception types, DependencyError and JoinError. Both of these exceptions report that a task failed because some other task/future failed - in the dependency case, because a task dependency failed, and in the join case because one of the tasks/futures being joined failed.

This PR introduces a common superclass PropagatedException to acknowledge that the meaning and behaviour of these two exceptions is very similar.

PropagatedException has a new implementation for reporting the failures that are being propagated. Parsl has tried a couple of ways to do this in the past:

  • The implementation immediately before this PR reports only the immediate task IDs (or future reprs, for non-tasks) in the exception message. For details of the chain of exceptions and original/non-propagated exception, the user can examine the exception object via the dependent_exceptions_tids attribute.

  • Prior to PR Make dependency exceptions only report task ID, not full exception tree #1802, the repr/str (and so the printed form) of dependency exceptions rendered the entire exception. In the case of deep dependency chains or where a dependency graph has many paths to a root cause, this resulted in extremely voluminous output with a lot of boiler plate dependency exception text.

The approach introduced by this current PR attempts a fusion of these two approaches:

  • The user will often be waiting only on the final task of a dependency chain (because the DFK will be managing everything in between) - so they will often get a dependency exception.
  • When they get a dependency exception, they are likely to actually be interested in the root cause at the earliest part of the chain. So this PR makes dependency exceptions traverse the chain and discover a root cause
  • When there are multiple root causes, or multiple paths to the same root cause, the user should not be overwhelmed with output. So this PR picks a single root cause exception to report fully, and when there are other causes/paths adds a small annotation (+ others)
  • The user is sometimes interested in the path from that root cause exception to the current failure, but often not. That path is rendered roughly the same as immediately before this PR as a sequence of task IDs (or Future reprs for non-tasks)
  • Python has a native mechanism for indicating that an exception is caused by another exception, the __cause__ magic attribute which is usually populated by raise e1 from e2. This PR populates that magic attribute at construction so that displaying the exception will show the cause using Python's native format.
  • The user may want to ask other Parsl-relevant questions about the exception chain, so this PR keeps the dependent_exceptions_tids attribute for such introspection.

A dependency or join error is now rendered by Python as exactly two exceptions next to each other:


Traceback (most recent call last):
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 922, in _unwrap_futures
    new_args.extend([self.dependency_resolver.traverse_to_unwrap(dep)])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/functools.py", line 907, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/functools.py", line 907, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dependency_resolvers.py", line 48, in _
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 339, in handle_exec_update
    res = self._unwrap_remote_exception_wrapper(future)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 603, in _unwrap_remote_exception_wrapper
    result.reraise()
  File "/home/benc/parsl/src/parsl/parsl/app/errors.py", line 114, in reraise
    raise v
  File "/home/benc/parsl/src/parsl/parsl/app/errors.py", line 138, in wrapper
    return func(*args, **kwargs)
^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/taskchain.py", line 13, in failer
    raise RuntimeError("example root failure")
        ^^^^^^^^^^^^^^^^^
RuntimeError: example root failure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/benc/parsl/src/parsl/taskchain.py", line 16, in <module>
    inter(inter(inter(inter(inter(failer()))))).result()
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 339, in handle_exec_update
    res = self._unwrap_remote_exception_wrapper(future)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 601, in _unwrap_remote_exception_wrapper
    result = future.result()
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
parsl.dataflow.errors.DependencyError: Dependency failure for task 5. The representative cause is via task 4 <- task 3 <- task 2 <- task 1 <- task 0

Changed Behaviour

DependencyErrors and JoinErrors will render differently

Type of change

  • Update to human readable text: Documentation/error messages/comments

it was removed from the codebase in #1945 when the task
status record was an untyped dictionary

but then introduced to TaskRecord in #2392 as a bad merge
from benc-mypy
was removed in #1773 then bad reintroduced into taskrecord
as part of mypy work
…or-rendering

 Conflicts:
parsl/app/futures.py
parsl/dataflow/errors.py
…or-rendering' into benc-dependency-error-rendering
@benclifford benclifford changed the title Print a root-cause exception for DependencyError Render a root-cause exception for dependency and join errors Jan 15, 2025
@benclifford benclifford marked this pull request as ready for review January 15, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant