-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MISC] Check for peer relation when getting the primary endpoint #366
[MISC] Check for peer relation when getting the primary endpoint #366
Conversation
This reverts commit 8184dfa.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dpe-3591-fix-shared-buffers-validation #366 +/- ##
==========================================================================
- Coverage 81.43% 81.25% -0.19%
==========================================================================
Files 10 10
Lines 2214 2235 +21
Branches 356 360 +4
==========================================================================
+ Hits 1803 1816 +13
- Misses 343 347 +4
- Partials 68 72 +4 ☔ View full report in Codecov by Sentry. |
…peer-rel-gh-runner
…or-peer-rel-gh-runner
if not self._peers: | ||
logger.debug("primary endpoint early exit: Peer relation not joined yet.") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the peer relation is not yet joined, don't try to find the primary.
except PostgreSQLListUsersError: | ||
logger.warning("Deferriing on_start: Unable to list users") | ||
event.defer() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping the wait in primary_endpoint could lead o exceptions here.
# TODO figure out why peer data is not available | ||
if ( | ||
primary_endpoint | ||
and len(self._units_ips) == 1 | ||
and len(self._peers.units) > 1 | ||
): | ||
logger.warning( | ||
"Possibly incoplete peer data: Will not map primary IP to unit IP" | ||
) | ||
return primary_endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On juju 3 we don't seem to be getting the peer data used to get the IP addr when a new unit is joining (e.g. when scaling in the test_db
suite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It smells dangerous...
logger.debug( | ||
"primary endpoint early exit: Primary IP not in cached peer list." | ||
) | ||
primary_endpoint = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_units_ips
is a property and will not update on retry. The Patroni response might change while retrying, but IMHO it's better to leave that to the topology observer script.
# TODO figure out why peer data is not available | ||
if ( | ||
primary_endpoint | ||
and len(self._units_ips) == 1 | ||
and len(self._peers.units) > 1 | ||
): | ||
logger.warning( | ||
"Possibly incoplete peer data: Will not map primary IP to unit IP" | ||
) | ||
return primary_endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It smells dangerous...
@@ -34,7 +34,6 @@ | |||
|
|||
|
|||
@pytest.mark.group(1) | |||
@pytest.mark.skip(reason="DB Admin tests are currently broken") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have it re-enable. I suspect it is not related to this PR but enabled as coincident. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reenabled it here to reduce the amount CI runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Dragomir, merging this to have progress with tests stabilisation otherwise shared_buffers test cannot be merged.
a33b39d
into
dpe-3591-fix-shared-buffers-validation
* Fix shared buffers validation Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Consider shared buffers config option value when calculating effective cache size value Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Fix PostgreSQL restart not being called Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Add and update unit test Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Update unit test Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Increase test timeout Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Increase test timeout Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Switch to GH runners * Increase stanza timeout * [MISC] Check for peer relation when getting the primary endpoint (#366) * Reenable landscape tests * Try GH runners * Back on self-hosted runner * Don't check for primary if peer didn't join yet * Don't recheck missing primary ip when looking for primary endpoint * Catch list user exception * Catch Psycopg2 error when trying to set config * Try to preflight connection on config validation * Don't retry db connection check * Set the primary endpoint before starting the primary * Revert "Set the primary endpoint before starting the primary" This reverts commit 8184dfa. * Wait for peer before starting primary * Switch to GH runners * Recheck DB connection * Don't check peer on primary startup * Try to skip primary IP validation if not all cluster IPs seem to be available * Comment * Review comments * Unit tests --------- Signed-off-by: Marcelo Henrique Neppel <[email protected]> Co-authored-by: Dragomir Penev <[email protected]> Co-authored-by: Dragomir Penev <[email protected]>
* Fix shared buffers validation Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Consider shared buffers config option value when calculating effective cache size value Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Fix PostgreSQL restart not being called Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Add and update unit test Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Update unit test Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Increase test timeout Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Increase test timeout Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Switch to GH runners * Increase stanza timeout * [MISC] Check for peer relation when getting the primary endpoint (canonical#366) * Reenable landscape tests * Try GH runners * Back on self-hosted runner * Don't check for primary if peer didn't join yet * Don't recheck missing primary ip when looking for primary endpoint * Catch list user exception * Catch Psycopg2 error when trying to set config * Try to preflight connection on config validation * Don't retry db connection check * Set the primary endpoint before starting the primary * Revert "Set the primary endpoint before starting the primary" This reverts commit 8184dfa. * Wait for peer before starting primary * Switch to GH runners * Recheck DB connection * Don't check peer on primary startup * Try to skip primary IP validation if not all cluster IPs seem to be available * Comment * Review comments * Unit tests --------- Signed-off-by: Marcelo Henrique Neppel <[email protected]> Co-authored-by: Dragomir Penev <[email protected]> Co-authored-by: Dragomir Penev <[email protected]>
Try to speed up and stabilise the tests. Based of the shared buffers PR.