Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4598] Handle upgrade of top of the stack Juju leader #492

Merged

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Jun 11, 2024

Issue:

The PostgreSQL VM charm rollback test started to fail by the combination of adding TimescaleDB to the Charmed PostgreSQL Snap + updating the upgrade library to LIBPATCH 16 (which removed a defer call that was hiding the issue fixed by this PR).

Explanation of the issue (considering the tests from the PostgreSQL VM charm):

  • In the test_upgrade_from_edge test:

    • TimescaleDB in the snap is leading to a longer snap refresh (only some more seconds).
    • With that, Patroni is performing a failover, so a new unit, different from the Juju leader unit (the former primary), becomes the primary.
    • Also with that, the former primary unit, the Juju leader unit, becomes a replica (not a synchronous standby).
  • In the test_fail_and_rollback test:

    • After the failed upgrade, the pre-upgrade-check is run, which sets the recovery state for the Juju leader unit (which is a replica).
    • When the refresh is started, the upgrade charm event is fired in each of the three units, setting the state to ready in all of them.
    • The upgrade changed event is fired only in the non-primary units because only the relation data from the Juju leader unit changed (the state changed from recovery to ready, while in the other units, it was already set to ready, so no change, no events firing for the Juju leader unit).
    • As the Juju leader unit is the replica, it’s the top of the stack unit (the one that should be upgraded first). However, as the upgrade relation changed, the event hasn’t fired for it; it cannot start the upgrade (in this case, it is a rollback).

For Kafka and MySQL operators on VM, it hasn’t happened because both always set the upgrade stack with the Juju leader unit as the last unit to be upgraded. For the following codes, remember that the build_upgrade_stack is always executed in the Juju leader unit.

Solution

If the Juju leader unit is the top unit of upgrading the stack, execute the on_upgrade_changed handler logic during a rollback. This way, it can run the logic that starts the upgrade process, in the end of the on_upgrade_changed handler.

Depends on canonical/data-platform-libs#176.

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
@marceloneppel marceloneppel marked this pull request as ready for review June 11, 2024 20:49
Copy link

codecov bot commented Jun 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.06%. Comparing base (0f2c8c2) to head (195b89c).
Report is 22 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   80.31%   71.06%   -9.25%     
==========================================
  Files          10       11       +1     
  Lines        2301     2772     +471     
  Branches      376      483     +107     
==========================================
+ Hits         1848     1970     +122     
- Misses        369      704     +335     
- Partials       84       98      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marceloneppel marceloneppel merged commit 4fb533b into main Jun 13, 2024
52 checks passed
@marceloneppel marceloneppel deleted the dpe-4598-handle-upgrade-of-top-of-the-stack-juju-leader branch June 13, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants