Stream blob data in chunks to files to not occupy too much memory #551

adzialocha · 2023-09-06T19:43:30Z

Establishes a stream from reading binary blob data from the database (via pagination) into writing the file on the file system. This allows for (parallely) handling very large blobs without occupying too much of memory.

Closes: #548

📋 Checklist

Add tests that cover your changes
Add this PR to the Unreleased section in CHANGELOG.md
Link this PR to any issues it closes
New files contain a SPDX license header

* development: Blobs directory configuration (#549)

codecov · 2023-09-06T20:09:15Z

Codecov Report

Patch coverage: 91.28% and project coverage change: +0.04% 🎉

Comparison is base (a404d0c) 92.50% compared to head (77df667) 92.54%.
Report is 3 commits behind head on development.

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #551      +/-   ##
===============================================
+ Coverage        92.50%   92.54%   +0.04%     
===============================================
  Files              105      105              
  Lines            17932    17988      +56     
===============================================
+ Hits             16588    16647      +59     
+ Misses            1344     1341       -3

Files Changed	Coverage Δ
aquadoggo/src/db/stores/entry.rs	`99.40% <ø> (ø)`
aquadoggo/src/db/stores/operation.rs	`89.63% <ø> (ø)`
aquadoggo/src/db/stores/schema.rs	`98.37% <ø> (ø)`
aquadoggo/src/db/types/entry.rs	`90.00% <ø> (ø)`
aquadoggo/src/db/types/operation.rs	`88.00% <ø> (ø)`
aquadoggo/src/graphql/queries/collection.rs	`99.60% <ø> (ø)`
...ggo/src/graphql/scalars/document_view_id_scalar.rs	`78.57% <ø> (ø)`
aquadoggo/src/materializer/tasks/dependency.rs	`97.56% <ø> (ø)`
...doggo/src/materializer/tasks/garbage_collection.rs	`98.59% <ø> (ø)`
aquadoggo/src/network/service.rs	`30.03% <ø> (ø)`
... and 18 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sandreae

Really good, looking forward to exploring optimal configurations for managing memory use with this in place.

* development: Make sure `/tmp` directory does not run out of scope before application ends (#557) Integrate `Bytes` value (#554) Stream blob data in chunks to files to not occupy too much memory (#551) Blobs directory configuration (#549) Use correct MAX_BLOB_PIECE_LENGTH from p2panda_rs Build a byte buffer over paginated pieces when assembling blobs (#547) HTTP routes to serve files with correct content type and etag headers (#544) Task for automatic garbage collection of unused documents and views (#500) Refactor tmp blob dir creation after rebase Fix after rebase "blob" materializer task (#493) Add static file server to `http` service (#483) Enable deletion of dangling `document_views` and related `document_view_fields` from db (#491) BlobStore for retrieving raw blob data from the db (#484)

Stream data from database into file

0d1807a

adzialocha changed the base branch from main to development September 6, 2023 19:43

adzialocha added 4 commits September 6, 2023 22:01

Fix tests

2094558

Merge branch 'development' into stream-files

21e8c83

* development: Blobs directory configuration (#549)

Run cargo fmt

e82cf24

Use OpenOptions to create file

5590dc2

Add comments and doc strings

c21bb5b

adzialocha changed the title ~~Stream blob data into files~~ Stream blob data in chunks into files to not occupy too much memory Sep 6, 2023

adzialocha changed the title ~~Stream blob data in chunks into files to not occupy too much memory~~ Stream blob data in chunks to files to not occupy too much memory Sep 6, 2023

adzialocha added 2 commits September 6, 2023 22:23

Add entry to CHANGELOG.md

a1e3f90

Update doc-string

77df667

adzialocha marked this pull request as ready for review September 6, 2023 20:26

adzialocha requested a review from sandreae September 6, 2023 20:26

adzialocha linked an issue Sep 6, 2023 that may be closed by this pull request

Do not assemble all blob pieces in memory but stream it directly into a file #548

Closed

sandreae approved these changes Sep 7, 2023

View reviewed changes

sandreae merged commit 3760e20 into development Sep 7, 2023
10 checks passed

adzialocha deleted the stream-files branch September 8, 2023 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream blob data in chunks to files to not occupy too much memory #551

Stream blob data in chunks to files to not occupy too much memory #551

adzialocha commented Sep 6, 2023 •

edited

Loading

codecov bot commented Sep 6, 2023 •

edited

Loading

sandreae left a comment

Stream blob data in chunks to files to not occupy too much memory #551

Stream blob data in chunks to files to not occupy too much memory #551

Conversation

adzialocha commented Sep 6, 2023 • edited Loading

📋 Checklist

codecov bot commented Sep 6, 2023 • edited Loading

Codecov Report

sandreae left a comment

Choose a reason for hiding this comment

adzialocha commented Sep 6, 2023 •

edited

Loading

codecov bot commented Sep 6, 2023 •

edited

Loading