-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle/mitigate crash and context issues #11
Conversation
context canceled after 10 sec:
|
/publish |
oci-image-publish-on-comment: succeeded ✅ |
So, 10 sec after the context is created and between https://github.com/commercetools/telefonistka/blob/main/internal/pkg/argocd/argocd.go#L263 I increased the server http timeout because the number matched (10 sec) but it didn't help
|
/publish |
oci-image-publish-on-comment: failed ❌ |
/publish |
oci-image-publish-on-comment: succeeded ✅ |
/publish |
oci-image-publish-on-comment: succeeded ✅ |
Add conext to error messages
Fix some function return issues
This PR is missing a Jira ticket reference in the title or description. |
@@ -51,25 +50,29 @@ func groupObjsByKey(localObs []*unstructured.Unstructured, liveObjs []*unstructu | |||
} | |||
} | |||
localObs, _, err := controller.DeduplicateTargetObjects(appNamespace, localObs, &resourceInfoProvider{namespacedByGk: namespacedByGk}) | |||
errors.CheckError(err) | |||
if err != nil { | |||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap it with more context maybe?
resourceTracking := argo.NewResourceTracking() | ||
for _, res := range resources.Items { | ||
live := &unstructured.Unstructured{} | ||
err := json.Unmarshal([]byte(res.NormalizedLiveState), &live) | ||
errors.CheckError(err) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, maybe more context is useful.
@@ -80,7 +83,9 @@ func groupObjsForDiff(resources *application.ManagedResourcesResponse, objs map[ | |||
if local, ok := objs[key]; ok || live != nil { | |||
if local != nil && !kube.IsCRD(local) { | |||
err = resourceTracking.SetAppInstance(local, argoSettings.AppLabelKey, appName, namespace, argoappv1.TrackingMethod(argoSettings.GetTrackingMethod())) | |||
errors.CheckError(err) | |||
if err != nil { | |||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same.
cmd/telefonistka/server.go
Outdated
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second) | ||
// We don't use the request context as it might have a short deadline and we don't want to stop event handling based on that | ||
// But we do want to stop the event handling after a certain point, so: | ||
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second) | ||
defer cancel() | ||
githubapi.HandleEvent(r, ctx, mainGhClientCache, prApproverGhClientCache, githubWebhookSecret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here be dragons.
As discussed, this is probably fine, but it's not entirely right since the request only has 10 seconds, and we're now doing blocking background work that will take longer. Also, that work is accessing the request so it should be done with that after 10s.
Say HandleEvent takes 20s. When it returns we should've already been done 10 seconds ago. While I'm not entirely clear on what the impact of this is, it's clearly an issue.
As mentioned, instead, we should pull out the request processing part from the background processing part and return on the http request as soon as we can, just after kicking of another go routine to do the background work.
There are some considerations that's I'm not clear on about the background work e.g. in case of failures, what's the right way to handle that?
I think let's do a short pairing session to see if we can answer some questions and come up with a slightly better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual Event proccessing is move to backround thread. This require some refactoring, created ReciveWebhook and ReciveEventFile functions to represent the different behavior in Web Server VS CLI triggering while keeping to the GH stuff in the GH package
Minor comment change
/publish |
oci-image-publish-on-comment: succeeded ✅ |
Current version of code does fail after 120 sec when
|
/publish |
oci-image-publish-on-comment: succeeded ✅ |
After 4d55de3 the process no longer crash:
|
Handle cases where GetContents returns nil HTTP respons (like in Context cancellation)
🥷 Code experts: no user but you matched threshold 10 Oded-B has most 👩💻 activity in the files. See details
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame:
Activity based on git-commit:
Knowledge based on git-blame: To learn more about /:\ gitStream - Visit our Docs |
@hnnsgstfssn please check 16ea028 |
Description
errors.CheckError(err)
function calls to preven process exiting on failure.cancellation)
Type of Change
Checklist