-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last Vacuum Always Errors on Hot Standby Cluster #359
Comments
I have not given much thought to this problem (yet), but I would have put the pg_is_in_recovery() in the main query and added a test. There is note in the doc:
It does not seem to cover the standbys, or the test is broken. I put this on our TODO list this week. |
The service definitely assumes it will be executed on primary servers only. We usually do not deploy the service on standbys. Can't you avoid it? Can you add a dependance on the Anyway, a CRITICAL is a bit strong. After discussion with @ioguix, maybe we could fetch the role status while fetching the backend version. Then we can decide for each service, what to do. On standbys:
What do you think? |
Hi @Krysztophe, Thanks for the feedback. It would be difficult to not have it deploy on standbys in our environment as we have many and use an automated process to failover to them whenever something happens to the primary, and typically won't auto-failback unless there is a reason to. So a standby can be the primary for several months, and with the size of our organization it isn't feasible to try to deploy the service on Nagios when a failover happens and remove it when it changes back to the original primary. I can understand not wanting to return OK by default, as you are correct that it would potentially give a false sense of security if you are not expecting that behavior. So I think having it default as UNKNOWN would be a better default value. If adding an Thanks for all your help! |
Personnaly, I would prefer UNKNOWN and a A few services already test |
When using the last_vacuum check on a hot standby cluster a critical alert is always raised. This appears to be caused by the fact that a hot standby does not maintain metadata for when a vacuum job is replicated from the primary to the standby. When running the same check on the primary cluster there is no alert generated.
In attempting to fix this internally I have modified
sub check_last_vacuum
to borrow the code fromsub check_is_hot_standby
and only running the check if the cluster isn't a hot standby cluster. What would be the best way to address this issue, and would it be possible to get a more appropriate fix added directly to check_pgactivity?Below is what I did to
sub check_last_vacuum
which appears to work for my needs, but I am not a programmer so forgive any stupidity that you find.The text was updated successfully, but these errors were encountered: