-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit minimal time poll.Wait* functions can wait between attempts #2543
Conversation
Seems like this one needs an update linter from #2539 |
@@ -91,7 +91,11 @@ func WaitWithBackoffWithRetries(ctx context.Context, b backoff.Backoff, numRetri | |||
sleep := b.Duration() | |||
if deadline, ok := ctx.Deadline(); ok { | |||
ctxSleep := time.Until(deadline) | |||
sleep = minDuration(sleep, ctxSleep) | |||
// We want to wait for smaller of backoff sleep and context sleep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: IIUC from the PR description the <-t.C
option is (often? always?) selected in the select
block below when sleep
is negative, is that the case?
If the deadline has passed, it seems that a sensible behavior would be to respect the context deadline by returning an error right away and thus not executing the operation.
It seems that with this change, ctx.Done():
is going to be selected below when the context has expired, independent of how large sleep
is, so in that case, why not make sleep
positive (strictly greater than 0)? Would that be sufficient?
Or even simpler, why not return with an error when ctxSleep <= 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this is a race condition. If ctx
already expired but did not perfom the cancel
yet, so there's nothing in the ctx.Done()
channel. At the same time if we have negative timer, t.C
will always have a value in the channel.
Just means that positive sleep (in nanoseconds) doesn't really lets the context to finish in my tests.
We could just return a context.DeadlineExceeded
error when ctxSleep <= 0
, but then execution will continue in race with termination of the context. While technically correct (deadline did exceed), I don't know which consequences this might have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other hand additional 10ms of wait doesn't really add that much considering default backoff starts with 100ms and goes up from there (so context deadline in 10s of ms doesn't make much sense)
f741e41
to
8e961ed
Compare
8e961ed
to
f20ae0f
Compare
pkg/poll/poll.go
Outdated
// but it has to be over os scheduling interval | ||
// otherwise context doesn't have time to finish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// but it has to be over os scheduling interval | |
// otherwise context doesn't have time to finish | |
// but it has to be > 0 to give ctx.Done() a chance | |
// to return below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment. See inline suggestion.
Currently calculation of sleep time can make it negative, which makes `t.C` channel return right away and does not give ctx time to finish on deadline. This means that even if context is expired, retries still happen until there are no more retries. The fix is to cap minimal sleep time at 5ms to make sure that context can perform expiration. NOTE: we could make the minimal sleep smaller, but it's wouldn't make sense getting less than OS scheduling frequency.
f20ae0f
to
4e86e12
Compare
Change Overview
Currently calculation of sleep time can make it negative, which makes
t.C
channel return right away and does not give ctx time to finish on deadline.This means that even if context is expired, retries still happen until there are no more retries.
Currently it's causing test failures in
snapshot_test.go
The fix is to cap minimal sleep time at 5ms to make sure that context can perform expiration.
NOTE: we could make the minimal sleep smaller, but it's wouldn't make sense getting less than OS scheduling frequency.
Pull request type
Please check the type of change your PR introduces:
Test Plan
Added new test to
poll_test.go
, but it's not always catching the race condition with context.