Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Use heartbeats to correct for request token drift #777

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

luke-lombardi
Copy link
Contributor

  • Uses key events + request heartbeats to correct for token drift after a particular gateway crashed unexpectedly

The consequence of this is that if a gateway that was actively handling a request crashes or is forcibly terminated, there will be a 30 delay before any active containers that were handling requests have the token count incremented. This should fix "bricked" containers that had an inaccurately low token count.

@@ -74,7 +74,12 @@ func (t *EndpointTask) Cancel(ctx context.Context, reason types.TaskCancellation
}

func (t *EndpointTask) HeartBeat(ctx context.Context) (bool, error) {
heartbeatKey := Keys.endpointRequestHeartbeat(t.msg.WorkspaceName, t.msg.StubId, t.msg.TaskId)
task, err := t.es.backendRepo.GetTask(ctx, t.msg.TaskId)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downside is that we'll have to retrieve the task, which is not great, but the heartbeat check only happens every 5 seconds right now so if requests are faster than that they'll never hit these check

@luke-lombardi luke-lombardi merged commit ee7a259 into main Dec 11, 2024
3 checks passed
@luke-lombardi luke-lombardi deleted the ll/fix-gateway-crash-tokens branch December 11, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants