-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clients: Implement storage fallback for recordings #1303
Conversation
c90b2dc
to
54d5168
Compare
3bbdb4a
to
bbf1581
Compare
54d5168
to
de8eaad
Compare
bbf1581
to
63c4373
Compare
de8eaad
to
91d01c4
Compare
63c4373
to
e386c88
Compare
ce59e24
to
75789c3
Compare
068ed63
to
101c949
Compare
This is incomplete as we still need to fix the probing made on the manifest itself instead of just on each segment. Rabbit hole.
…liable storage Just proxies to Minio with a random error chance
101c949
to
3328231
Compare
@victorges I've done a refactor of this now so that we only check the backup location at the beginning of the job, working out which manifest to use and writing out a new manifest with absolute URLs if necessary. Then probing, clipping etc works as before (in theory). Does it look ok to you so far? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (can't approve it myself)
@victorges please could you give this one more sign off? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, other than that LGTM
if errPrimary != nil && !primaryNotFound { | ||
return nil, 0, errPrimary | ||
} | ||
if errBackup != nil && !backupNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird, if the primary works fine but the backup is broken, we should not return an error I guess. We should return the playlist from the primary bucket, isn't that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not found in the backup then we return the primary, but unfortunately we need to successfully check for the backups existence because we need to pick the largest of the two manifests lower down. If we failed to check the backup exists then it could be it does exist and is the more complete, larger of the two but we've missed that and gone ahead with the primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we want to fail completely if there is an error from the backup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the backup should always be working but just giving us a 404 most of the time
clients/manifest.go
Outdated
err = backoff.Retry(func() error { | ||
rc, err := GetFile(context.Background(), requestID, sourceManifestOSURL, dStorage) | ||
if err != nil { | ||
if time.Since(start) > 10*time.Second && errors.IsObjectNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this one? We anyway have the deadline in DownloadRetryBackOff()
set to 10 * 5s = 50s
, why do also need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment for this, we annoyingly had to include this because there's a chance that we will get a not found due to eventual consistency, so we still want to retry not found errors. However we don't want to wait quite as long as normal errors because it'll be quite a common case where the backup doesn't exist. So we don't to add a whole 50s of delay to every recording job essentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a lil constant for the 10s
too. maybe const MANIFEST_NOT_FOUND_INCONSISTENCY_TOLERANCE = 10s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm fine with with extracting a constant plus adding a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
clients/manifest.go
Outdated
err = backoff.Retry(func() error { | ||
rc, err := GetFile(context.Background(), requestID, sourceManifestOSURL, dStorage) | ||
if err != nil { | ||
if time.Since(start) > 10*time.Second && errors.IsObjectNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a lil constant for the 10s
too. maybe const MANIFEST_NOT_FOUND_INCONSISTENCY_TOLERANCE = 10s
?
import "testing" | ||
|
||
func TestGetStorageBackupURL(t *testing.T) { | ||
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to restore the default value after tests?
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"} | |
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"} | |
defer func() { StorageFallbackURLs = nil }() |
This implements the storage fallback logic following the design doc.
This builds on top of livepeer/go-tools#121 and #1302
The trickiest bits were the error handling for both manifest and segment file reading. We need to handle not found
errors explicitly from primary and backup to return appropriate errors to the caller.