Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling during ingest #104

Open
ppanopticon opened this issue Aug 27, 2024 · 0 comments
Open

Error handling during ingest #104

ppanopticon opened this issue Aug 27, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ppanopticon
Copy link
Member

Task Description

Error handling in vitrivr-engine currently has two major shortcomings.

  • An Operator's implementer decides, if an error should be handled gracefully (i.e., log and continue) or not (i.e., throw an exception). This leads to inconsistent behaviour across the a pipeline.
  • If an error is logged, the caller of a pipeline has no way to access error information since most of the time, errors are simply logged. This is not ideal in cases, where vitrivr-engine is used as a library rather than a local service.

I therefore propose three major changes to how errors should be handled:

  • In case an error occurs, operators throw an ExtractionException . This exception reports on the error condition (retrievable, name of the operator and cause) and (optionally) wraps downstream exceptions. Throwing any other exception from within an Operator is considered a programmer's error. Therefore, proper exception handling is needed.
  • When configuring a pipeline, one can determine what error handling mode should be employed. Currently I see two modes: CONTINUE and ABORT (we can of course discuss other modes). This will lead to the introduction of transparent error handling stages in the flow.
  • Regardless of what mode is employed, a per-item summary should be provided in some Context object with information about what went wrong. This Context can be accessed by the caller of a pipeline.

In addition, one can also have a discussion as to how handled errors should affect Retrievables. It might make sense to include error information at a Retrievable level as well.

Currently, this is a discussion issue. I'm open for ideas and input.

# Dependencies

None

Boundary Conditions

This should be implemented in a way such that the error handling logic is injected transparently when pipelines are constructed, rather than requiring the operators to manipulate the flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants