Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream blob data in chunks to files to not occupy too much memory #551

Merged
merged 8 commits into from
Sep 7, 2023

Conversation

adzialocha
Copy link
Member

@adzialocha adzialocha commented Sep 6, 2023

Establishes a stream from reading binary blob data from the database (via pagination) into writing the file on the file system. This allows for (parallely) handling very large blobs without occupying too much of memory.

Closes: #548

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@adzialocha adzialocha changed the base branch from main to development September 6, 2023 19:43
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Patch coverage: 91.28% and project coverage change: +0.04% 🎉

Comparison is base (a404d0c) 92.50% compared to head (77df667) 92.54%.
Report is 3 commits behind head on development.

Additional details and impacted files
@@               Coverage Diff               @@
##           development     #551      +/-   ##
===============================================
+ Coverage        92.50%   92.54%   +0.04%     
===============================================
  Files              105      105              
  Lines            17932    17988      +56     
===============================================
+ Hits             16588    16647      +59     
+ Misses            1344     1341       -3     
Files Changed Coverage Δ
aquadoggo/src/db/stores/entry.rs 99.40% <ø> (ø)
aquadoggo/src/db/stores/operation.rs 89.63% <ø> (ø)
aquadoggo/src/db/stores/schema.rs 98.37% <ø> (ø)
aquadoggo/src/db/types/entry.rs 90.00% <ø> (ø)
aquadoggo/src/db/types/operation.rs 88.00% <ø> (ø)
aquadoggo/src/graphql/queries/collection.rs 99.60% <ø> (ø)
...ggo/src/graphql/scalars/document_view_id_scalar.rs 78.57% <ø> (ø)
aquadoggo/src/materializer/tasks/dependency.rs 97.56% <ø> (ø)
...doggo/src/materializer/tasks/garbage_collection.rs 98.59% <ø> (ø)
aquadoggo/src/network/service.rs 30.03% <ø> (ø)
... and 18 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adzialocha adzialocha changed the title Stream blob data into files Stream blob data in chunks into files to not occupy too much memory Sep 6, 2023
@adzialocha adzialocha changed the title Stream blob data in chunks into files to not occupy too much memory Stream blob data in chunks to files to not occupy too much memory Sep 6, 2023
@adzialocha adzialocha marked this pull request as ready for review September 6, 2023 20:26
@adzialocha adzialocha requested a review from sandreae September 6, 2023 20:26
Copy link
Member

@sandreae sandreae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good, looking forward to exploring optimal configurations for managing memory use with this in place.

@sandreae sandreae merged commit 3760e20 into development Sep 7, 2023
10 checks passed
adzialocha added a commit that referenced this pull request Sep 8, 2023
* development:
  Make sure `/tmp` directory does not run out of scope before application ends (#557)
  Integrate `Bytes` value (#554)
  Stream blob data in chunks to files to not occupy too much memory (#551)
  Blobs directory configuration (#549)
  Use correct MAX_BLOB_PIECE_LENGTH from p2panda_rs
  Build a byte buffer over paginated pieces when assembling blobs (#547)
  HTTP routes to serve files with correct content type and etag headers (#544)
  Task for automatic garbage collection of unused documents and views (#500)
  Refactor tmp blob dir creation after rebase
  Fix after rebase
  "blob" materializer task (#493)
  Add static file server to `http` service (#483)
  Enable deletion of dangling `document_views` and related `document_view_fields` from db  (#491)
  BlobStore for retrieving raw blob data from the db (#484)
@adzialocha adzialocha deleted the stream-files branch September 8, 2023 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not assemble all blob pieces in memory but stream it directly into a file
2 participants