Skip to content

Commit

Permalink
uploader: update exponential backoff timeouts
Browse files Browse the repository at this point in the history
    uploader: update exponential backoff timeouts

    In the current architecture, catalyst-uploader instances are launched to
    upload each segment. During any given time, we can have multiple pids
    running where each instance attempts to write to s3 storage. If there's
    an outage on the storage provider, the exponential backoff retry logic
    kicks in and attempts to retry uploads.

    When multiple instances of catalyst-uploader are running, the retries tend
    to happen at roughly the same time in short burts leading us to quickly
    hit the kernel pthread_create limits. When this happens, the pods become
    CPU/mem bound eventually and pods may stop responding. To reduce the impact
    of this, the following changes are being made:
    * reduce # of retries from 7 to 4
    * set initial interval to 30s to space out the retry attempts
    * set max interval to 2min to space out even further

    Note that this reduces the probability of running into the same issue
    and is not a true fix. A proper fix would require a rearchitecture of
    how catalyst-uploader works in conjunction with Mist.
  • Loading branch information
emranemran committed Jun 10, 2024
1 parent 226ba72 commit 2d1f9f6
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions core/uploader.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ func (bc *ByteCounter) Write(p []byte) (n int, err error) {

func newExponentialBackOffExecutor() *backoff.ExponentialBackOff {
backOff := backoff.NewExponentialBackOff()
backOff.InitialInterval = 10 * time.Second
backOff.MaxInterval = 1 * time.Minute
backOff.InitialInterval = 30 * time.Second
backOff.MaxInterval = 2 * time.Minute
backOff.MaxElapsedTime = 0 // don't impose a timeout as part of the retries

return backOff
}

func UploadRetryBackoff() backoff.BackOff {
return backoff.WithMaxRetries(newExponentialBackOffExecutor(), 7)
return backoff.WithMaxRetries(newExponentialBackOffExecutor(), 4)
}

const segmentWriteTimeout = 5 * time.Minute
Expand Down

0 comments on commit 2d1f9f6

Please sign in to comment.