ZipDeliveryService: complain loudly if the part already exists #2041

jmartin-sul · 2022-11-14T08:31:40Z

TODO

Currently, we just return silently if the part already exists in the cloud:

preservation_catalog/app/services/zip_delivery_service.rb

Lines 18 to 20 in ffb18f6

    
           def deliver 
        
             return if s3_part.exists?

However, we may want to update this behavior a bit: as far as I know, it'd be pretty unexpected to actually re-attempt delivery of something that'd been successfully uploaded, and I'm unaware of a situation where we queue multiple jobs to deliver the same part to the same endpoint. Additionally, we want to enable automated backfilling for missing ZippedMoabVersions (#2036). More on the relevance of that in "Context".

I'd suggest a helpful alert message like "WARNING: attempting to push a druid version zip part to an S3 location that already has content. Perhaps a replication failure was pruned from the database, but still needs to be cleaned up from the cloud. Prune the failure again and ask ops to delete the bad replicated content." And then I'd include the druid/version/endpoint in the HB alert context. However I wouldn't raise or cause the job to fail, and would still be sure to return a falsey value in this case, because there's no use in retrying the job, and we also don't want the calling AbstractDeliveryJob to proceed to calling ResultsRecorderJob.perform_later, since nothing was delivered.

Context

We'll also be trying to clean up known partial replications (see #1733), and we've just enabled a new audit error about sanity checks on size (see #1993) that will likely alert us to a few more replications that need re-doing.

Since we want to restrict deletion and overwrite of cloud archive content as much as possible, we have to request that ops handle deletion of partial replications.

So, we could occasionally run into a corner case where:

we've cleaned up the database records for a mis-replicated or partially replicated druid version on a given endpoint, using CatalogRemediator.prune_replication_failures or its rake task (prune_failed_replicationprune_failed_replication)
we've given ops the druid/version/endpoint combos for deletion from the DB records that prune_replication_failures determined needed a re-push (it'll be fed from CatalogToArchive audit results).
ops hasn't had a chance to actually delete the bad cloud content
CatalogToArchive runs for a druid in this situation, tries to backfill the missing content, it can't push over the yet-to-be-removed bad replicated content.

Since replication audit only runs every 3 months on a given druid, it seems unlikely that we'll run into this situation much if at all, so starting out with a simple HB alert to let us know this happened, and that we need to re-run prune_replication_failures on the druid version, seems like a better approach to me than trying to do something more automated.

The text was updated successfully, but these errors were encountered:

Fixes #2041

jmartin-sul added async jobs enhancement replication replication related questions or issues labels Nov 14, 2022

jmartin-sul mentioned this issue Nov 14, 2022

fix ZippedMoabVersion count less than number of object versions per PreservedObject #2029

Closed

edsu assigned edsu and unassigned edsu Nov 16, 2022

ndushay self-assigned this Nov 30, 2022

ndushay mentioned this issue Dec 1, 2022

ZipDeliveryService notifies Honeybadger if s3 part exists already #2092

Merged

mjgiarlo closed this as completed in #2092 Dec 1, 2022

mjgiarlo added a commit that referenced this issue Dec 1, 2022

Merge pull request #2092 from sul-dlss/zip-part-exists-error

2fa4ca0

Fixes #2041

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZipDeliveryService: complain loudly if the part already exists #2041

ZipDeliveryService: complain loudly if the part already exists #2041

jmartin-sul commented Nov 14, 2022 •

edited by ndushay

Loading

ZipDeliveryService: complain loudly if the part already exists #2041

ZipDeliveryService: complain loudly if the part already exists #2041

Comments

jmartin-sul commented Nov 14, 2022 • edited by ndushay Loading

TODO

Context

jmartin-sul commented Nov 14, 2022 •

edited by ndushay

Loading