-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid upgrade being killed by failed liveness probes #344
base: main
Are you sure you want to change the base?
Conversation
Thanks for submitting a PR! :) Will this always run an upgrade anytime the pod restarts or additional are spun up? Is there a way to disable upgrades until a user is ready? |
It will update the volume to the version of the image, so it's usually triggered by the user doing |
This doesn't solve the situation where you need to run the web installer. Probes should probably be changed not to fail if the web installer needs to be run (otherwise the pod never becomes ready because not installed, and you can't reach the web installer to install) |
Should you even be using the web based upgrader at all when using container images? The image has the updated files. If you use the web-based upgrader but don't update your container image version, you'll break stuff for sure. |
You can disable the web-based updater to prevent people from borking their installation:
https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/config_sample_php_parameters.html |
Yeah I also think it should auto-install but that is not the default behavior of running |
Hey, what would be needed to get this merged? |
@4censord I haven't looked at this PR closely, but the conflicts need to be solved and the changes need to be reviewed. With this chart there is no backporting, you just overrride |
I'll take a look at resolving the conflicts.
Ok |
The conflicts are solvable by simply rebasing onto the main branch. |
If I rebase, will somebody review? I don't like to put in work for it to be ignored |
11f5cf8
to
c65efeb
Compare
@jessebot can you please take a look at this? this is a longstanding issue with this chart and currently the only way to prevent upgrade failing is to disable all probes which is not really a solution. I'm using
And I will be left with the pod crashlooping forever until I manually fix it using |
How exactly are you fixing this? |
@4censord I use |
@budimanjojo Then i seem to have a different issue, that does not work for me. I have to roll back a backup after attempting an upgrade. |
@provokateurin Would you mind taking a look at this once it's convenient? |
Sorry for the delay. Slowly making my rounds 🙏 @remram44 I did a quick look over and it seems ok, but don't we still want to be able to set upgrade in the values.yaml? Also, there still needs to be a rebase, as there's conflicts. |
Also, tagged @provokateurin to get an additional review to ask about keeping the |
I'm not 100% sure if I read https://github.com/nextcloud/docker/blob/master/README.md correctly, but it sounds like the "normal" container will still try to do the upgrade as the default command is used there. Then we have a race condition between the two containers and depending on which one gets to work first the upgrade is killed by the probes or not. |
IIRK the upgrade env var is used for triggering the Nextcloud images upgrade mechanism. But because this PR completely supersedes the internal
Yeah, back in 2023 when we said its ready there weren't. |
No, the init container runs first, and the normal container only gets started once the upgrade is completes. |
You're right, I overlooked that it is an initContainer and not another sidecar. Then this makes sense to me, I'll have to give it some testing though to confirm it works as intended. |
{{- if .Values.nextcloud.securityContext}} | ||
securityContext: | ||
{{- with .Values.nextcloud.securityContext }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just be a single with instead of if+with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initContainer is copied from the container with only required changes, I am not cleaning up the existing container at this time. It would only make this patch harder to review.
If you could do a rebase onto the latest main state then I'll give it a test. |
In my opinion with the hooks. It is a better solution to move it to the initContainer (that should work). |
I'd say we should have multiple init-containers run in order.
While that adds complexity in more containers, and will increase startup time per pod, IMHO it's then easier that writing complex scripts for doing everything within one container. |
I'm not entirely sure what the advantage is. Multiple init containers that run the exact same image... that also means they will have to release/acquire a lock multiple time. I don't see what you mean be "decrease startup time per pod". At least as much work has to happen, so that doesn't seem true? |
This was meant to say "increase", fixed now. Every container that starts takes a few seconds until it's ready to run its scripts. Personally, I just would have done multiple containers, because that feels easier to do, rather than having to chain things in the same container. |
Bump. Will this get merged any time soon? Will you ask me to rebase and ignore it again? @jessebot @provokateurin please let me know if there is anything else you expect before merging, thanks |
It's unreal how long this PR is getting ignored and this is a real issue with the chart. |
Please keep in mind this chart is only maintained by volunteers and nobody is paid for this. |
@provokateurin I understand. But so is the PR author which happens to be very cooperative in the rebase requests and then keep getting ignored afterwards (multiple times). This change is very small and you should just take a quick look, and decide whether this is a good addition or not and proceed to test if you accept this change or reject and close the PR otherwise. This should take like one or two hours and not years. |
This changes the way updates work. It's unfortunately not as small as you may think, and because this is maintained by volunteers, it's when we have time to take a look at it.
@remram44 Please also bump the helm chart version a major version, as this removes an option and defaults to allowing updates. In the future, please check out the checks at the bottom of the PR and you can see if there's any default checks that we'll come back and ask you to change, as if the checks of a PR are not passing, we cannot merge it, according to the greater nextcloud org rules. You can also find the contributing guidelines here. EDIT: I just tried to check this PR in an incognito window while logged out of GitHub and it didn't show the bottom checks section, but it does still show each commit, and if you see a little ❌ beside a commit, you should be able to click it and it will show you which check has failed. Sorry about that. @provokateurin should we also add a note in the README that with this change, this chart will not auto-update anytime the tag is updates, so if you don't want that, you should manually specify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes comment: Needs Version bumped in Chart.yaml.
I didn't dive deep into this PR, but to my understanding it doesn't change this behavior? It only changes how the update is performed, or am I missing something? |
This PR only move the update process to an initContainer instead of in the main container that has probes. Those probes kill the container everytime when the upgrade process takes too long that will cause gitops tools like FluxCD and ArgoCD to rollback the chart because of the failed probes. And Nextcloud will refuse to start even after rolling back because of version mismatch when the container being killed mid upgrade. |
No the changes doesn't change the way updates work, it just move that job to another container instead. And I apologize for being such a jerk in the complaint but I have been dealing with this problem for too long. |
3acbf1e
to
c1b9fe0
Compare
Bumped to 6.0.0 |
@jessebot I can see the checks at the bottom, however they do not run until the workflow is approved:
I am not going to check this page everyday to see if maybe they got approved and there are results to see... |
It removes the option to set the update flag and always sets it going forward, but perhaps I've misunderstood. @provokateurin could you please keep helping here? |
This should not have been exposed. There is no way to use it, since you can't pass a custom command to the container. Signed-off-by: Remi Rampin <[email protected]>
Signed-off-by: Remi Rampin <[email protected]>
Signed-off-by: Remi Rampin <[email protected]>
23b861c
to
3948370
Compare
The The update always happens automatically, unless you run the container with a custom command (not the case of this chart), in which case it won't update unless |
Is there anything that one can do to progress this PR? I have to resort atm to setting an absurd timeframe for a startup probe to not kill an upgrade. This however delays starting Nextcloud all the time by multiple minutes. I would really love for the mechanism of init-containers for NC upgrades to go forward. |
also interested in this moving forward, in the meantime i've merged this branch with the current 6.3.0 and published it for my own use, if anyone else wants you can use it: The source of the packaged chart is here https://git.djf.lol/davidfrickert/helm-charts/src/branch/main/charts/nextcloud (still need to encounter an upgrade scenario as i've just started using it but i hope this will improve the currently terrible upgrade process) |
Maybe forking is the way forward. This is a critical issue where installation and upgrades get killed halfway through and obviously no maintainer has any interest in it. |
Pull Request
Description of the change
This runs the upgrade in an init container, before the main container starts. It sets the command arguments to "true", so the container exits immediately after the upgrade, and sets the variable
NEXTCLOUD_UPDATE
to1
, without which the upgrade step is skipped because there are command arguments (see entrypoint).Benefits
Avoid the container being stopped during the upgrade because of failed liveness probes, since init containers don't get probed.
Possible drawbacks
Runs an additional container, so it's a bit slower I guess.
Applicable issues
Additional information
This also removes the
nextcloud.update
value. I don't see a way people could possibly have used it though, since it only does something when you pass different arguments to the image, and there is no value in this chart that will allow you to do that.Checklist
Chart.yaml
according to semver.(optional) Variables are documented in the README.md