Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the tap to continue on server side timeout (error 502) #52

Open
laurentS opened this issue Nov 24, 2021 · 3 comments
Open

Allow the tap to continue on server side timeout (error 502) #52

laurentS opened this issue Nov 24, 2021 · 3 comments

Comments

@laurentS
Copy link
Contributor

In their docs, github mention that long running requests (>10 sec) are interrupted and return nothing. We hit such a case and it seems that Github returns a 502 Bad gateway HTTP status code.

In our case, the issue comments stream gave such errors.

To see the error, run curl -v -o /dev/null "https://api.github.com/repos/bitcoin/bitcoin/issues/comments?per_page=100&sort=updated&direction=asc&since=2019-11-25T15:20:23" (this discards the output which is irrelevant, but shows the headers and the error 502). Curl helpfully shows a timer which stops at 10 seconds, confirming this is a server-side timeout.

When the tap is running with multiple repos listed in its config, it chokes on such errors and returns prematurely, and never finishes the list of tasks it's supposed to do. In my testing, retrying the same query led to the same result consistently. Lowering the per_page param seemed to get the data (in the case above, I had to go down to 10).

The tap should be able to get around such errors in a cleaner way:

  • ideally, it would retry the same endpoint but with a smaller per_page value for a while (say until it completed the current stream/repo)
  • the current MAX_PER_PAGE is set at 1000, and should be 100 max, according to docs
  • as a temporary workaround, the tap could simply skip the current repo and move on to the next one. This means some gaps in the data however 🕳️
@ericboucher
Copy link
Contributor

ericboucher commented Nov 29, 2021

  • MAX_PER_PAGE was updated in Adjust max-per-page constant to match api docs #53

  • Implementation idea:
    Similar to our list of tolerated_error, we could have a list of continue_on_errors. When a partition exits on such an error, it would add a flag in its context, error_occured: true. If such a flag is raised, we could then override [Stream]._increment_stream_state to check this flag and NOT update the state for this partition if it has been raised.

@laurentS
Copy link
Contributor Author

I opened https://gitlab.com/meltano/sdk/-/issues/282 to suggest a solution at the SDK level.

@laurentS
Copy link
Contributor Author

laurentS commented Feb 7, 2022

A couple of ideas to let the tap continue upon errors (assuming backoff has tried N times and failed, like we've seen a number of times in our logs):

Tolerated errors

We could add all sorts of error codes to the tolerated_http_errors list for a stream, but this has the downside of losing data, as the tap will update the bookmark to past the erroneous data.

Override stream request decorator

The RestStream class has a request_decorator which we could override in the tap when a specific config option is passed to continue_on_error. The decorator could then build a mock Response so that the rest of the stream moves on to the next partition. The state bookmark should be set to "before" the error, so it can be retried on the next run.

The difficulty would be in figuring out which errors are temporary, and which are not:

  • the example URL above is still returning the same 502 after 10s more than 3 months later. It's unlikely it will ever work. At what point do we consider an error "permanent"?
  • on the other hand, we had a 500 on /repos/opensourcedesign/opensourcedesign.github.io/commits yesterday which worked fine about 24h later.

A possible option would be to add a retried_count in the state bookmark which keeps track of how many times (or over how long) a specific partition+stream has been retried, and gives up after N attempts over multiple runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants