-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass client error policy down to snapshot and output plugin #115
Conversation
PQExpBuffer query = createPQExpBuffer(); | ||
appendPQExpBuffer(query, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X", | ||
appendPQExpBuffer(query, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X (error_policy '%s')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the query should look like this:
appendPQExpBuffer(query, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X (\"error_policy\" \'%s\')",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the double quotes around the param name are optional (this version worked when I tried it), but they're probably good practice, and it'll be more consistent with #101, so I've added them: 5d87428.
The single quotes don't need escaping though - a string literal can contain unescaped single quotes. Unless I misunderstood?
I think this patch is very cool. But there's a edge case like #113 , in this case I think we should have recreate a replication stream if |
Thanks for the review @badboyd! Re:
Assuming the error is persistent, recreating the replication stream wouldn't help, since the client would have to reconnect, but it would just reprocess the same row that made it fail in the first place (since the replication slot still exists), and get nowhere. The approach taken by this PR is to just skip over the data that caused the error, without terminating the replication stream, if The behaviour of There is a If you can reproduce the "missing chunk for toast" error, then I'd be interested to know if |
You are right about this. I've just commented in #113, please check it. |
@badboyd I agree it would be useful to have Bottled Water reconnect after Postgres is restarted, although I think that's unrelated to this pull request. Although if you're running Bottled Water in some kind of process supervisor like runit or supervisord - which is a good idea anyway - you can get the same effect by configuring the supervisor to restart BW if it exits with a nonzero error code. |
This adds an additional
error_policy
parameter to the snapshot functionbottledwater_export
and the logical decoding output plugin (specified viaSTART_REPLICATION
), with similar semantics to the--on-error
flag added to the client in PR #85. The default policy is "exit", which preserves the existing behaviour that any error encoding a row for output to the client will terminate the snapshot or replication stream. Under the "log" policy, instead the error will just be logged and the offending row skipped.This also ensures that the Bottled Water client passes the policy specified by
--on-error
down to the snapshot and output plugin. Previously this could lead to a confusing situation where you specified--on-error=log
but some errors would still terminate (since the error policy did not apply to the extension): e.g. #36 and #113.It also adds tests for handling of large values that exceed the extension's internal maximum buffer size, to illustrate and test for the difference between error policies.
This does not actually guarantee that all errors will be skipped in "log" mode - currently it skips only Avro encoding errors and errors writing the buffer. (e.g. it will probably not currently catch the "missing chunk for toast value" error described in #113.) This is because the extension was originally coded with the assumption that any error would exit early, and so it may not be safe to continue on after all possible errors - an unanticipated error might leave the extension in a bad state. We should be able to add handling for additional errors on a case by case basis.