-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MISC] Revert upgrade lib #489
Closed
+7
−20
Closed
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the cause why the rollback gets stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @paulomach re: canonical/data-platform-libs#146
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dragomirp , this was causing concurrent upgrades on units, so I don't think we should revert it on the lib.
Does manual upgrades run also stuck?
Is the test timming out? I've also observed juju takes humongous amount of time to trigger
upgrade-charm
after a refresh being run, eating a lot of the wait period, forcing us to increase timeout times for upgrade tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real issue is the following:
test_upgrade_from_edge
test:test_fail_and_rollback
test:pre-upgrade-check
is run, which sets therecovery
state for the Juju leader unit (which is a replica).recovery
toready
, while in the other units, it was already set toready
, so no change, no events firing for the Juju leader unit).The new version of the upgrade library (which is being reverted through this PR) is also causing this issue to start happening because of the missing defer call that Dragomir commented about.
However, the root cause is the failover happening, which has shown a specific path of the execution that wasn’t covered by the lib. Before adding TimescaleDB, there was no failover, so the Juju leader unit was the primary during the entire test, and what happened in the
test_fail_and_rollback
test was the following:* After the failed upgrade, the
pre-upgrade-check
is run, which sets therecovery
state for the Juju leader unit (which is the primary).* When the refresh is started, the upgrade charm event is fired in each of the three units, setting the state to ready in all of them.
* As the Juju leader unit was not the top of the stack unit, it was ok for it not to receive the upgrade relation event in the first moment.
For Kafka and MySQL operators on VM, it hasn’t happened because both always set the upgrade stack with the Juju leader unit as the last unit to be upgraded. For the following codes, remember that the
build_upgrade_stack
is always executed in the Juju leader unit.It may be fixed by canonical/data-platform-libs#176.