-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness probe kills pod while upgrading #333
Comments
I haven't tried a manual upgrade yet, but I'm interested in your findings, because we can then create a PR to guide others as well. If not, I'll update this issue when I do get around to testing an upgrade and share my findings as well. In case it's helpful: You can disable probes, but I'm not sure that's what you want. It took me a few tries to find settings that worked for my infra for at least the initial deployment, but I currently update these values in my ## Liveness and readiness probe values
## Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
##
livenessProbe:
enabled: true
initialDelaySeconds: 45
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 45
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
startupProbe:
enabled: false
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
successThreshold: 1 And if you're deploying directly to your cluster, you can edit the probe values, however, the time it takes will depend on your infrastructure, and so I'm not sure what values would be best for you, but you could do a |
Replicas can't race against each other because there is a locking mechanism implemented in the nextcloud docker image (See https://github.com/nextcloud/docker/blob/master/25/apache/entrypoint.sh#L127-L240). |
I can change It would probably be better to either:
About the number of replicas, there is a locking mechanism in the image (primitive -- not sure why it doesn't use |
@provokateurin
|
@remram44 would you like to create a PR for this change? Env vars are created here: helm/charts/nextcloud/templates/_helpers.tpl Lines 65 to 72 in a491977
And then you'd need to update the config parameter table in our README.md and also the values.yaml, likely around here with the env var and a comment about it: helm/charts/nextcloud/values.yaml Lines 72 to 85 in a491977
|
Sure, opened #343. Note that this doesn't fix this issue, we should still do something about the probe. |
That makes sense, but that should probably be second PR. I haven't worked with helm hooks before, so I'd need to do some research, but if you or anyone else knows the fix off the top of your head, you can submit a PR and we will review it. Note: I work on this in my spare time, and I've been on vacation, so I've been pretty busy in this repo, but I will be going back to work next week, so my response time may vary 🙏 I really appreciate your patience though. It feels great to see this repo improving :) |
I think the easiest way to do this is to use an init container. The Docker image looks like it supports doing the upgrade and then exiting, you just need to pass "true" as the command. I can work on a pull request. |
Awesome, thank you for working on this :) Please just link back to the relevant docker lines you're referencing in the PR, so others can quickly follow along and get up to speed. |
Any update on this work? I also run into this issue in every docker image upgrade, I guess I need to increase probe timeouts |
For now, disable the probes or increase the timeout on them. In the meantime, I've added follow up comments to #344 to potentially solve this in the future. |
I think a startup probe works well here, at least it seems to have worked on the latest upgrade. startupProbe:
enabled: true
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 100 Edit: never mind, was assuming latest helm chart changes had a nextcloud version change but not really. Will wait for next upgrade to see if it actually works |
It is really unclear how you mean for NextCloud to ever upgrade. There are no instructions #221 and automatic upgrade probably can't work since all replicas would race against each other, and pods get killed while doing it because of failing liveness checks.
How can this work?
The text was updated successfully, but these errors were encountered: