Very large files require very large buffers #2185

JustinKyleJames · 2024-04-17T20:06:27Z

So that we can support retries on part uploads after failures, an entire part must be stored in the circular buffer.

We had a user who attempted to upload a 1.7 PB file. With the limit of 10,000 parts per upload, that means that each buffer must be 170 GB. That is an unreasonable amount of memory especially considering each streaming thread for each upload has its own buffer.

We should do the following:

When calculating the number of parts based on the file size and circular buffer size, check to see if we need more than 10,000 parts. (This should be done no matter what even if all we did was throw an error.)
If we calculate that we would need more than 10,000 parts:
- Update the part sizes so that there are 10,000 parts or less.
- Drop the requirement that the full part must be in memory.
- Drain the circular buffer as we are streaming the bytes from S3.
- Do not support retries if the part fails.

korydraughn · 2024-04-18T20:29:56Z

Turns out the user was attempting to upload a 1.7TB file, but that doesn't change the fact we need to add logic for making sure the total number of parts does not exceed 10000.

Also, AWS S3 doesn't support objects exceeding 5TB.

trel · 2024-04-18T20:39:44Z

https://aws.amazon.com/s3/faqs/#:~:text=The%20total%20volume%20of%20data,single%20PUT%20is%205%20GB.

alanking · 2024-04-18T20:40:57Z

Is that baked into the protocol, or is that just AWS? Just wondering...

trel · 2024-04-18T20:41:49Z

The protocol is what AWS says it is. There is no spec.

alanking · 2024-04-18T20:44:25Z

Oh, interesting. Good to know.

alanking added the consortium-member label Apr 17, 2024

trel added the enhancement label Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very large files require very large buffers #2185

Very large files require very large buffers #2185

JustinKyleJames commented Apr 17, 2024 •

edited by korydraughn

Loading

korydraughn commented Apr 18, 2024

trel commented Apr 18, 2024

alanking commented Apr 18, 2024

trel commented Apr 18, 2024

alanking commented Apr 18, 2024

Very large files require very large buffers #2185

Very large files require very large buffers #2185

Comments

JustinKyleJames commented Apr 17, 2024 • edited by korydraughn Loading

korydraughn commented Apr 18, 2024

trel commented Apr 18, 2024

alanking commented Apr 18, 2024

trel commented Apr 18, 2024

alanking commented Apr 18, 2024

JustinKyleJames commented Apr 17, 2024 •

edited by korydraughn

Loading