Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Check for peer relation when getting the primary endpoint #366

Merged

Conversation

dragomirp
Copy link
Contributor

@dragomirp dragomirp commented Feb 22, 2024

Try to speed up and stabilise the tests. Based of the shared buffers PR.

@dragomirp dragomirp changed the title Check for peer rel gh runner [TEST] Check for peer rel gh runner Feb 22, 2024
Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 40.90909% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 81.25%. Comparing base (bf39c67) to head (f3e3d33).
Report is 1 commits behind head on dpe-3591-fix-shared-buffers-validation.

Files Patch % Lines
src/charm.py 40.90% 12 Missing and 1 partial ⚠️
Additional details and impacted files
@@                            Coverage Diff                             @@
##           dpe-3591-fix-shared-buffers-validation     #366      +/-   ##
==========================================================================
- Coverage                                   81.43%   81.25%   -0.19%     
==========================================================================
  Files                                          10       10              
  Lines                                        2214     2235      +21     
  Branches                                      356      360       +4     
==========================================================================
+ Hits                                         1803     1816      +13     
- Misses                                        343      347       +4     
- Partials                                       68       72       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dragomirp dragomirp marked this pull request as ready for review February 24, 2024 12:54
Comment on lines +303 to +305
if not self._peers:
logger.debug("primary endpoint early exit: Peer relation not joined yet.")
return None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the peer relation is not yet joined, don't try to find the primary.

Comment on lines +1056 to +1059
except PostgreSQLListUsersError:
logger.warning("Deferriing on_start: Unable to list users")
event.defer()
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the wait in primary_endpoint could lead o exceptions here.

src/charm.py Outdated Show resolved Hide resolved
Comment on lines +315 to +324
# TODO figure out why peer data is not available
if (
primary_endpoint
and len(self._units_ips) == 1
and len(self._peers.units) > 1
):
logger.warning(
"Possibly incoplete peer data: Will not map primary IP to unit IP"
)
return primary_endpoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On juju 3 we don't seem to be getting the peer data used to get the IP addr when a new unit is joining (e.g. when scaling in the test_db suite).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It smells dangerous...

logger.debug(
"primary endpoint early exit: Primary IP not in cached peer list."
)
primary_endpoint = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_units_ips is a property and will not update on retry. The Patroni response might change while retrying, but IMHO it's better to leave that to the topology observer script.

@dragomirp dragomirp changed the title [TEST] Check for peer rel gh runner [MISC] Check for peer relation when getting the primary endpoint Feb 24, 2024
Comment on lines +315 to +324
# TODO figure out why peer data is not available
if (
primary_endpoint
and len(self._units_ips) == 1
and len(self._peers.units) > 1
):
logger.warning(
"Possibly incoplete peer data: Will not map primary IP to unit IP"
)
return primary_endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It smells dangerous...

src/charm.py Outdated Show resolved Hide resolved
@@ -34,7 +34,6 @@


@pytest.mark.group(1)
@pytest.mark.skip(reason="DB Admin tests are currently broken")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have it re-enable. I suspect it is not related to this PR but enabled as coincident. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reenabled it here to reduce the amount CI runs.

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Dragomir, merging this to have progress with tests stabilisation otherwise shared_buffers test cannot be merged.

@dragomirp dragomirp merged commit a33b39d into dpe-3591-fix-shared-buffers-validation Feb 27, 2024
41 checks passed
@dragomirp dragomirp deleted the check-for-peer-rel-gh-runner branch February 27, 2024 09:19
dragomirp added a commit that referenced this pull request Feb 27, 2024
* Fix shared buffers validation

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Consider shared buffers config option value when calculating effective cache size value

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Fix PostgreSQL restart not being called

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Add and update unit test

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Update unit test

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Increase test timeout

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Increase test timeout

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Switch to GH runners

* Increase stanza timeout

* [MISC] Check for peer relation when getting the primary endpoint (#366)

* Reenable landscape tests

* Try GH runners

* Back on self-hosted runner

* Don't check for primary if peer didn't join yet

* Don't recheck missing primary ip when looking for primary endpoint

* Catch list user exception

* Catch Psycopg2 error when trying to set config

* Try to preflight connection on config validation

* Don't retry db connection check

* Set the primary endpoint before starting the primary

* Revert "Set the primary endpoint before starting the primary"

This reverts commit 8184dfa.

* Wait for peer before starting primary

* Switch to GH runners

* Recheck DB connection

* Don't check peer on primary startup

* Try to skip primary IP validation if not all cluster IPs seem to be available

* Comment

* Review comments

* Unit tests

---------

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Co-authored-by: Dragomir Penev <[email protected]>
Co-authored-by: Dragomir Penev <[email protected]>
BON4 pushed a commit to BON4/postgresql-operator that referenced this pull request Apr 23, 2024
* Fix shared buffers validation

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Consider shared buffers config option value when calculating effective cache size value

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Fix PostgreSQL restart not being called

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Add and update unit test

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Update unit test

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Increase test timeout

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Increase test timeout

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Switch to GH runners

* Increase stanza timeout

* [MISC] Check for peer relation when getting the primary endpoint (canonical#366)

* Reenable landscape tests

* Try GH runners

* Back on self-hosted runner

* Don't check for primary if peer didn't join yet

* Don't recheck missing primary ip when looking for primary endpoint

* Catch list user exception

* Catch Psycopg2 error when trying to set config

* Try to preflight connection on config validation

* Don't retry db connection check

* Set the primary endpoint before starting the primary

* Revert "Set the primary endpoint before starting the primary"

This reverts commit 8184dfa.

* Wait for peer before starting primary

* Switch to GH runners

* Recheck DB connection

* Don't check peer on primary startup

* Try to skip primary IP validation if not all cluster IPs seem to be available

* Comment

* Review comments

* Unit tests

---------

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Co-authored-by: Dragomir Penev <[email protected]>
Co-authored-by: Dragomir Penev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants