Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let taps continue to next partitions and/or streams upon certain errors #280

Open
MeltyBot opened this issue Nov 29, 2021 · 5 comments
Open

Comments

@MeltyBot
Copy link
Contributor

MeltyBot commented Nov 29, 2021

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/282

Originally created by @laurentS on 2021-11-29 22:40:06


Summary

Follow up to !195 and probably related to #137 (and maybe #134)

In release 0.3.14, there is now support for RetriableAPIError and FatalAPIError, which I think should be extended to allow for the tap to continue after such an error (whether that's the default or an opt-in behaviour), in particular when dealing with a partitioned stream.

Proposed benefits

Example use case: tap-github can fetch data for a list of repos. Currently, if it fails at some point, the rest of the list is not handled at all. See this issue for more details. In that particular case, not handling the error means we're not able to retrieve data past the error which seems persistent on some repos.

Given that the tap is run as a separate process, bubbling the exception up to the top level doesn't actually help in that the calling process cannot get access to the details of it. Returning state (and maybe an error exit code) seems like a cleaner way to fail.

Proposal details

In Stream._sync_records, the code loops through a list of contexts, but there is no error handling. The loop could be enclosed in a try...except block which could selectively capture the above exceptions and update state (or not) accordingly, before proceeding with the next context.

Best reasons not to build

I can't really think of a reason why this would cause problems, but I only have the tap-github use case in mind. If there are downsides, that behaviour could be opt-in via a config option, with the default behaviour remaining as is.

@MeltyBot
Copy link
Contributor Author

@labelsync-manager labelsync-manager bot added the kind/Feature New feature or request label Jun 23, 2022
@edgarrmondragon edgarrmondragon changed the title Let taps continue to next partitions upon error Let taps continue to next partitions and/or streams upon certain errors Sep 6, 2022
@edgarrmondragon
Copy link
Collaborator

We have RetriableAPIError and FatalAPIError. Perhaps a new EndOfStreamError could signal the tap to continue on with the next partition or the next stream. Some python-ish pseudo code:

class Stream:
  def _sync_records():
    for context in partitions:
      try:
        # process partition
      except EndOfStreamError:
        continue

class Tap:
  def sync():
    for stream in streams:
      try:
        # process stream
      except EndOfStreamError:
        continue

@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented May 10, 2023

MR started but never merged: #635

Copy link

stale bot commented May 9, 2024

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label May 9, 2024
@edgarrmondragon
Copy link
Collaborator

Still relevant

@stale stale bot removed the stale label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants