Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use randomized content ID for Azure multipart uploads #6869

Merged
merged 10 commits into from
Dec 12, 2024

Conversation

avarnon
Copy link
Contributor

@avarnon avarnon commented Dec 11, 2024

Which issue does this PR close?

Closes #6868.

Rationale for this change

Explained in the issue

What changes are included in this PR?

In object_store's AzureClient.put_block(), use a random, u128 for part_idx instead of the counter based part_idx parameter.

Are there any user-facing changes?

No

@github-actions github-actions bot added the object-store Object Store Interface label Dec 11, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a test showing concurrent uploads not interfering with each other?

object_store/src/azure/client.rs Outdated Show resolved Hide resolved
@avarnon
Copy link
Contributor Author

avarnon commented Dec 11, 2024

Could we get a test showing concurrent uploads not interfering with each other?

@tustvold, I have added an integration test (azure_parallel_put_multipart_test) to src/azure/mod.rs which tries to interleave blocks from 2 separate streams. The test fails without the changes to /home/avarnon/code/github/apache/arrow-rs/object_store/src/azure/client.rs because the last block for each block ID is committed.

.unwrap();

multipart_upload_1.complete().await.unwrap();
let err = multipart_upload_2.complete().await.unwrap_err();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I have misunderstood something about this, but I would have expected the last writer to win? This would be consistent with other stores, e.g. S3 / GCS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the changes to object_store/src/azure/client.rs, the final file shows it has parts from both streams. With the changes, the file is completely from the first stream to complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look at the test setup, you can see that I am alternating which stream is writing a block first/second

Copy link
Contributor

@tustvold tustvold Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I get that, my point is that racing multipart uploads shouldn't fail, they should both succeed with the last writer winning. That is the expected semantic, making me think this is an incomplete fix.

It would be ideal if this integration test could be lifted into the generic test suite, to ensure all the various platforms behave consistently in this regard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that Azure Storage returns InvalidBlockList on the second complete no matter what. I don't think Azure is allowing the second complete to succeed since it's already shipped off the final file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be at odds with the documentation? https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list?tabs=microsoft-entra-id

In particular

Any uncommitted blocks are garbage collected if there are no successful calls to Put Block or Put Block List on the blob within a week following the last successful Put Block operation. If Put Blob is called on the blob, any uncommitted blocks are garbage collected.

Have you tried this against Azure proper, or just azurite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this against a real Azure storage account

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whelp, this wouldn't be the first time Azure copied AWS's homework badly... I guess an error is better than silently interleaving the bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if calling Put Block List is equivalent to Put Blob and is triggering garbage collection? I agree that consistency with an error is better than interleaving.

@@ -392,4 +396,99 @@ mod tests {
azure_storage_token
);
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be folded into azure_blob_test above as otherwise it will race

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to still be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. I have pushed an update

@@ -1109,3 +1111,67 @@ async fn delete_fixtures(storage: &DynObjectStore) {
.await
.unwrap();
}

/// Tests a race condition where 2 threads are performing multipart writes to the same path
pub async fn multipart_race_condition(storage: &dyn ObjectStore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to make this a top-level integration test can we add an argument last_writer_wins: bool, hook this up to allow a non-error return, and then add this to the AWS and GCS test suites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I don't have access to an AWS or GCS account to test those live

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to leave that to the CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed

@avarnon avarnon force-pushed the avarnon-random-multipart-part-idx branch from f4139db to 5edc040 Compare December 12, 2024 21:21

assert!(matches!(err, crate::Error::Generic { .. }), "{err}");
if last_writer_wins {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this should also yield a different final result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I missed that. Pushed

@avarnon avarnon force-pushed the avarnon-random-multipart-part-idx branch from 3ebef59 to fae706b Compare December 12, 2024 21:34
@avarnon
Copy link
Contributor Author

avarnon commented Dec 12, 2024

Pushed a fixup to satisfy AWS S3's minimum allowed upload size of 5,242,880

@tustvold
Copy link
Contributor

The limit is the other way round, all bar the last part must be 5MiB or more. Feel free to leave for now if you prefer

@avarnon
Copy link
Contributor Author

avarnon commented Dec 12, 2024

I have pushed a change which should make each block big enough for S3

@tustvold tustvold merged commit 84dba34 into apache:main Dec 12, 2024
14 checks passed
@tustvold
Copy link
Contributor

Thanks for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
2 participants