Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clients: Implement storage fallback for recordings #1303

Merged
merged 19 commits into from
Jul 25, 2024
Merged

Conversation

victorges
Copy link
Member

@victorges victorges commented Jun 20, 2024

This implements the storage fallback logic following the design doc.

This builds on top of livepeer/go-tools#121 and #1302

The trickiest bits were the error handling for both manifest and segment file reading. We need to handle not found
errors explicitly from primary and backup to return appropriate errors to the caller.

@victorges victorges force-pushed the vg/feat/storage-fallback branch 5 times, most recently from c90b2dc to 54d5168 Compare June 21, 2024 12:53
@victorges victorges force-pushed the vg/fix/not-found-err-handling branch from 3bbdb4a to bbf1581 Compare June 21, 2024 13:07
@victorges victorges force-pushed the vg/feat/storage-fallback branch from 54d5168 to de8eaad Compare June 21, 2024 13:08
@victorges victorges force-pushed the vg/fix/not-found-err-handling branch from bbf1581 to 63c4373 Compare June 21, 2024 14:50
@victorges victorges force-pushed the vg/feat/storage-fallback branch from de8eaad to 91d01c4 Compare June 21, 2024 14:51
@victorges victorges changed the title Vg/feat/storage fallback clients: Implement storage fallback for recordings Jun 21, 2024
@victorges victorges marked this pull request as ready for review June 21, 2024 22:38
@victorges victorges force-pushed the vg/fix/not-found-err-handling branch from 63c4373 to e386c88 Compare June 24, 2024 17:34
@victorges victorges force-pushed the vg/feat/storage-fallback branch from ce59e24 to 75789c3 Compare June 24, 2024 17:35
@mjh1 mjh1 force-pushed the vg/feat/storage-fallback branch 2 times, most recently from 068ed63 to 101c949 Compare July 3, 2024 16:53
Base automatically changed from vg/fix/not-found-err-handling to main July 3, 2024 17:33
@mjh1 mjh1 force-pushed the vg/feat/storage-fallback branch from 101c949 to 3328231 Compare July 3, 2024 17:41
@mjh1
Copy link
Contributor

mjh1 commented Jul 4, 2024

@victorges I've done a refactor of this now so that we only check the backup location at the beginning of the job, working out which manifest to use and writing out a new manifest with absolute URLs if necessary. Then probing, clipping etc works as before (in theory). Does it look ok to you so far?

Copy link
Member Author

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (can't approve it myself)

clients/manifest.go Outdated Show resolved Hide resolved
clients/manifest.go Show resolved Hide resolved
clients/manifest.go Outdated Show resolved Hide resolved
clients/manifest.go Outdated Show resolved Hide resolved
clients/manifest.go Outdated Show resolved Hide resolved
clients/manifest.go Outdated Show resolved Hide resolved
errors/errors.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
pipeline/ffmpeg.go Outdated Show resolved Hide resolved
@mjh1
Copy link
Contributor

mjh1 commented Jul 11, 2024

@victorges please could you give this one more sign off?

@mjh1 mjh1 requested review from thomshutt and leszko and removed request for thomshutt July 11, 2024 09:50
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, other than that LGTM

config/storage_backup_url.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
test/steps/ffmpeg.go Outdated Show resolved Hide resolved
if errPrimary != nil && !primaryNotFound {
return nil, 0, errPrimary
}
if errBackup != nil && !backupNotFound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, if the primary works fine but the backup is broken, we should not return an error I guess. We should return the playlist from the primary bucket, isn't that the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not found in the backup then we return the primary, but unfortunately we need to successfully check for the backups existence because we need to pick the largest of the two manifests lower down. If we failed to check the backup exists then it could be it does exist and is the more complete, larger of the two but we've missed that and gone ahead with the primary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we want to fail completely if there is an error from the backup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the backup should always be working but just giving us a 404 most of the time

err = backoff.Retry(func() error {
rc, err := GetFile(context.Background(), requestID, sourceManifestOSURL, dStorage)
if err != nil {
if time.Since(start) > 10*time.Second && errors.IsObjectNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this one? We anyway have the deadline in DownloadRetryBackOff() set to 10 * 5s = 50s, why do also need this check?

Copy link
Contributor

@mjh1 mjh1 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment for this, we annoyingly had to include this because there's a chance that we will get a not found due to eventual consistency, so we still want to retry not found errors. However we don't want to wait quite as long as normal errors because it'll be quite a common case where the backup doesn't exist. So we don't to add a whole 50s of delay to every recording job essentially.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a lil constant for the 10s too. maybe const MANIFEST_NOT_FOUND_INCONSISTENCY_TOLERANCE = 10s?

Copy link
Contributor

@leszko leszko Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm fine with with extracting a constant plus adding a comment.

clients/manifest_test.go Show resolved Hide resolved
Copy link
Member Author

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

err = backoff.Retry(func() error {
rc, err := GetFile(context.Background(), requestID, sourceManifestOSURL, dStorage)
if err != nil {
if time.Since(start) > 10*time.Second && errors.IsObjectNotFound(err) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a lil constant for the 10s too. maybe const MANIFEST_NOT_FOUND_INCONSISTENCY_TOLERANCE = 10s?

import "testing"

func TestGetStorageBackupURL(t *testing.T) {
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to restore the default value after tests?

Suggested change
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"}
StorageFallbackURLs = map[string]string{"https://storj.livepeer.com/catalyst-recordings-com/hls": "https://google.livepeer.com/catalyst-recordings-com/hls"}
defer func() { StorageFallbackURLs = nil }()

@mjh1 mjh1 merged commit ac85dfd into main Jul 25, 2024
11 checks passed
@mjh1 mjh1 deleted the vg/feat/storage-fallback branch July 25, 2024 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants