-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to retrieve the PostgreSQL version to initialise/update db-admin relation
during failover
#566
Comments
Hi, @nobuto-m! Thanks for the bug report. How did you take down the primary unit? Was it through |
It was a force poweroff to simulate a hardware failure instead of graceful shutdown or anything like that. |
Hi Nobuto, The blocked state issue should be addressed in #578 and available as 14/stable nowadays. Can you please re-check it and confirm the fix from your side? Thank you! |
Hi @nobuto-m , I would separate this bugreport in two parts:
As discussed today, I have invested a lot of time trying to reproduce the issue In the same time. I have reported followup ticket with UX improvement (based on rev 468): #618. In revision 429 I saw some instabilities on the initial deployment, which I cannot reproduce in rev 468 at all (tried 3 times). At this point I believe we can resolve this issue and focus on Raft/Failover related tickets you have reported separately. |
For the record, it's straightforward to reproduce
|
And by using unpinned version of the bundle, there is no traceback or the Aside from the fact that two node clusters are not maintainable nor sustainable and the status active/idle is wrong since the cluster failed to pick the new primary, the "issue" is fixed. $ git diff --no-index landscape-scalable_r33/bundle.yaml bundle.yaml
diff --git a/landscape-scalable_r33/bundle.yaml b/bundle.yaml
index 68ff865..715dece 100644
--- a/landscape-scalable_r33/bundle.yaml
+++ b/bundle.yaml
@@ -6,7 +6,6 @@ applications:
haproxy:
charm: ch:haproxy
channel: stable
- revision: 75
num_units: 1
expose: true
options:
@@ -17,7 +16,6 @@ applications:
landscape-server:
charm: ch:landscape-server
channel: stable
- revision: 111
num_units: 1
constraints: mem=4096
options:
@@ -25,7 +23,6 @@ applications:
postgresql:
charm: ch:postgresql
channel: 14/stable
- revision: 429
num_units: 1
options:
plugin_plpython3u_enable: true
@@ -38,7 +35,6 @@ applications:
rabbitmq-server:
charm: ch:rabbitmq-server
channel: 3.9/stable
- revision: 188
num_units: 1
options:
consumer-timeout: 259200000
|
@nobuto-m just for the history, what was your LXD version? I will update my old LXD version, but the used in my tests:
Tnx! |
Steps to reproduce
deploy landscape stable bundle
scale postgresql to 2 units (primary + replica)
$ juju add-unit postgresql -n 1
take down the primary unit and trigger the failover
Expected behavior
The failover succeeds, so the replica node will be promoted to primary. Then, the consumer of postgresql will be notified to write a new configuration file with the new primary node.
Actual behavior
The unit gets stuck at blocked
Failed to retrieve the PostgreSQL version to initialise/update db-admin relation
Landscape app still holds the previous primary PostgreSQL endpoint (192.168.151.108).
And the charm itself says the primary is the dead node.
Versions
Operating system: jammy
Juju CLI: 3.5.3-genericlinux-amd64
Juju agent: 3.5.3
Charm revision: 14/stable rev 429
LXD: N/A
Log output
Juju debug log:
landscape_model.log
Additional context
The text was updated successfully, but these errors were encountered: