-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
barman streaming replication stops working after switchover - normal wal archival continues to work #1035
Comments
We need more information here. For example a step by step of actions taken and results after each action. Based on what I see, I believe that this would be solved by using models and switching models on switchover/failover (this is what we recommend when using Barman with Patroni). |
Also, check the postgres logs. Barman is only starting |
@martinmarques The main issue seems that the replication slot doesn't continue with a continous wal archive, if you check my output I have a restart_lsn of C/1F000058, but barman tries to start with A/98000000 with timeline 12 . I haven't fixed that replication in some time, and currently it's on timeline 22. I'll try to reproduce so you can have some more info on that. |
I think it's not necessary to run with models, since none of my configs change (i.e. all urls stay exactly the same). Does barman do specific tasks (i.e. reset streaming wal position) on calling |
When testing the failover manually everything looked very good:
So I'm not sure how I'm constantly getting into that error state |
No, just switches from one configuration to another. Of course it doesn't some checks and tasks on the barman side so that the change applies cleanly. I hope that answers your question. I'll wait for more info. |
Hi,
I'm using barman in combination with zalando/postgres-operator on kubernetes.
When the postgres instance does a failover, somehow the barman streaming replication connection can't recover. Sadly I don't have a explicit log right now of that occuring.
But I also use normal wal archival, and I think that streaming replication should try to restart based on the previously archived wals. Currently it's just stuck at some old timeline point and will retain wals on the postgres server.
Configuration:
Barman status:
Status of replication slot after failovers:
Errors produced by barman receive-wal:
Commands necessary to restart wal archiving:
The text was updated successfully, but these errors were encountered: