-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthz bad gateway when pod/service ip take time to propagate #23067
Comments
@batleforc exposure of the route should be relatively fast, could you please clarify when exactly are you facing this issue ( 5/10 min for the route to be accessible). The default hard startup timeout is 5 mins and at this point we do not plan to change it. |
Due to this case, we upped the timeout to 900s. We found out that the propagation of the service's ip to the targeted pod take some time and some time came a little bit after the pod are up but not soon enough for the backend. That's why I add a little retry in eclipse-che/che-operator#1874 that should cover the propagation time, but I would love to make it a parameter that the end user could tune in case of pretty slow CNI. |
To debug that, we used the different pod to debug the full chain of acknowledgement that the deployment is ready for the next step of startup. And have seen that either we need to add a little time in between the two call of the health on the backend side, or we add retry directly in the gateway. (need test with replacing the different element in the kube) |
Is it possible to have help in order to check if the change added in eclipse-che/che-operator#1874 can fix the problem we encounter ? (Building the image mostly and a possible case on how we can make the retry healthz modular https://github.com/eclipse-che/che-operator/pull/1874/files#diff-ebca2eefe12f7ba4a722c53d574ba1b2adee412909da8cdbc974c8f7fcbfb02fR655 ?) |
Hello |
Hello @batleforc |
Hello @tolusha , |
Is the provided image (quay.io/abazko/operator:23067) automatically updated ? |
Unfortunately now. |
So, the build seems okay, but I encounter a |
@tolusha So I found out the connected dot. |
Hello @batleforc |
Hello @tolusha |
devfile/devworkspace-operator#1321 has now been merged, which seems to resolve this issue. This change will appear when DevWorkspace Operator 0.32.0 is released. |
Describe the bug
During the startup process of workspace's pod, sometimes the ip take time to propagate and the double call of the healthz endpoint immediately came back with a bad gateway.
Che version
7.88
Steps to reproduce
Expected behavior
Don't wait 5 more minute when, the cluster take a short time to propagate the corresponding ip (like most of our case) and wait the 5 more minute when side resources take time loading.
Runtime
Kubernetes (vanilla), OpenShift
Screenshots
No response
Installation method
chectl/latest, chectl/next, OperatorHub
Environment
Linux, Amazon
Eclipse Che Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: