Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 EntityTooSmall error in flush/compaction #2662

Closed
evenyag opened this issue Oct 27, 2023 · 5 comments · Fixed by #2712
Closed

S3 EntityTooSmall error in flush/compaction #2662

evenyag opened this issue Oct 27, 2023 · 5 comments · Fixed by #2712
Labels
C-bug Category Bugs

Comments

@evenyag
Copy link
Contributor

evenyag commented Oct 27, 2023

What type of bug is this?

Unexpected error

What subsystems are affected?

Datanode

What happened?

S3 returns EntityTooSmall while closing the writer in flush/compaction job.

What operating system did you use?

Unrelated

Relevant log output and stack trace

2023-10-26T00:20:40.133578Z ERROR mito2::compaction: Region 5738076307456(1336, 0) failed to flush, cancel all pending tasks err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:40.133445Z ERROR mito2::worker::handle_compaction: Failed to compact region: 5738076307456(1336, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:40.133288Z ERROR mito2::compaction::twcs: Failed to compact region, region id: 5738076307456(1336, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:40.133225Z ERROR mito2::compaction::twcs: Failed to compact region: 5738076307456(1336, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:40.132355Z ERROR opendal::services: service=s3 operation=Writer::close path=***/public/1336/1336_0000000000/6a9426b7-08b3-4eda-b72c-3a11d35fe463.parquet written=10280584B -> data close failed: Unexpected (permanent) at Writer::close, context: { uri: https://***/***/public/1336/1336_0000000000/6a9426b7-08b3-4eda-b72c-3a11d35fe463.parquet?uploadId=VWvpO9WmNxMKKWft6_UCW84IkPgrhSESGXowCl2datfUa4YUUIXXZRdGqQjg3iwzsMcbUX3.WPmMXZ7P0mfvZytxYpmz.81WVMTzm6Qy1SNW8pnkJYBnOJBrWhj8jvcWLM8vGMXytD.tV4aCB..ELw--, response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "FYKYJ00MAA99869M", "x-amz-id-2": "nKpJck7LUmTybFzE7tj5UTuO5cQnJkRW47EjrG7jVVDAVHFIyvDotxmhqibFFsARs97HhScwH7A=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 26 Oct 2023 00:20:39 GMT", "server": "AmazonS3", "connection": "close"} }, service: s3, path: ***/public/1336/1336_0000000000/6a9426b7-08b3-4eda-b72c-3a11d35fe463.parquet } => S3Error { code: "EntityTooSmall", message: "Your proposed upload is smaller than the minimum allowed size", resource: "", request_id: "FYKYJ00MAA99869M" }    
2023-10-26T00:20:36.375293Z ERROR mito2::compaction: Region 5007931867136(1166, 0) failed to flush, cancel all pending tasks err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:36.375094Z ERROR mito2::worker::handle_compaction: Failed to compact region: 5007931867136(1166, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:36.374955Z ERROR mito2::compaction::twcs: Failed to compact region, region id: 5007931867136(1166, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:36.374894Z ERROR mito2::compaction::twcs: Failed to compact region: 5007931867136(1166, 0) err=0: Failed to write to buffer, at greptimedb/src/mito2/src/sst/stream_writer.rs:103:14
2023-10-26T00:20:36.374422Z ERROR opendal::services: service=s3 operation=Writer::close path=***/public/1166/1166_0000000000/1603f27a-7186-4e79-8da9-6d0b8f0f4c54.parquet written=4956038B -> data close failed: Unexpected (permanent) at Writer::close, context: { uri: https://***/***/public/1166/1166_0000000000/1603f27a-7186-4e79-8da9-6d0b8f0f4c54.parquet?uploadId=7Lg39Wl1OIdI0v5vPwUycYVftLk0qreungM8LtAiU.sdC_2IrDI0oxYRcpLsr6Z.Szl0yVKvui0rjyteaGlh3_WDo2N3Lk7ri8cas_UGWZ1g7q7.y0cH.G4frUYIrEZGy.sxjnqGLbiVFDuB007GZw--, response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "PYWVYKXMRMSJ0D40", "x-amz-id-2": "DMRs1qZMqGpRkMXM2x66LyHr/TM+pF3mQ56Lrkm3Afx9RiSQ4nnM15T/zB4NyRVsK3Tt7apRjIYQZ8MYa2LvMQ==", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 26 Oct 2023 00:20:36 GMT", "server": "AmazonS3", "connection": "close"} }, service: s3, path: ***/public/1166/1166_0000000000/1603f27a-7186-4e79-8da9-6d0b8f0f4c54.parquet } => S3Error { code: "EntityTooSmall", message: "Your proposed upload is smaller than the minimum allowed size", resource: "", request_id: "PYWVYKXMRMSJ0D40" }

How can we reproduce the bug?

Occasionally, not sure how to reproduce it exactly.

Error code: EntityTooSmall

Description: Your proposed upload is smaller than the minimum allowed object size. Each part must be at least 5 MB in size, except the last part.

Maybe related to OpenDAL.

Some reference:

@evenyag evenyag added the C-bug Category Bugs label Oct 27, 2023
@WenyXu
Copy link
Member

WenyXu commented Oct 27, 2023

By default, if we only invoke write once and drop the writer, the OpenDAL won't use the multiple-part uploads. We seem to invoke the write multiple times somewhere, and the buffer size is smaller than the minimum limit.

@killme2008
Copy link
Contributor

This issue must be looked into ASAP. cc @evenyag @v0y4g3r

@evenyag
Copy link
Contributor Author

evenyag commented Nov 7, 2023

However, this is returned by close... So it is invoking close_with_arrow_writer() and should be the last part. Since we write S3 every 8M except the last buffer. We might need to add some logs.

There is an issue in OpenDAL apache/opendal#3262 but not related to this issue, I guess.

Updated:

I find out the twcs compaction task sets the write buffer size to 4M. So it is possible to upload a part less than 5M.

let task = TwcsCompactionTask {
region_id,
schema: region_metadata,
sst_layer: access_layer,
outputs,
expired_ssts,
sst_write_buffer_size: ReadableSize::mb(4),

@evenyag
Copy link
Contributor Author

evenyag commented Nov 8, 2023

I reproduced the case in this branch. I created some SSTs larger than the buffer size and then triggered compaction.

@Xuanwo
Copy link
Contributor

Xuanwo commented Nov 14, 2023

We seem to invoke the write multiple times somewhere, and the buffer size is smaller than the minimum limit.

OpenDAL provides native buffer support to prevent such issue by:

let w = op.writer_with(path).buffer(8 * 1024 * 1024).await;

By enable buffer, writer will store input chunks in-momery and flush them on need. If buffer is not enabled, all write call to writer will become a direct API call to s3. So it also works if upper application controls the write size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants