Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove original file stem from filenames in the blobstorage (#4309) #4583

Conversation

iequidoo
Copy link
Collaborator

@iequidoo iequidoo commented Jul 27, 2023

Based on #5495

Fix #5338

@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 05a5904 to f8133a1 Compare July 28, 2023 03:16
@iequidoo iequidoo marked this pull request as ready for review July 28, 2023 04:08
@iequidoo iequidoo requested review from link2xt and r10s July 28, 2023 04:09
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch 3 times, most recently from d14434c to 1c8fa0d Compare July 30, 2023 05:46
@iequidoo iequidoo changed the title feat: Use random filename suffixes for blobstorage (#4309) feat: Use random filenames for blobstorage (#4309) Jul 30, 2023
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 1c8fa0d to 48edc0c Compare August 3, 2023 19:02
@r10s
Copy link
Member

r10s commented Aug 4, 2023

thinking it over again, forcing random file name in blob-directory will worsen UX at least on deltachat-desktop, at least currently:

on desktop, currently, eg. a pdf-reader is opened without the file being copied, the pdf-reader will show the filename then - and there, sth. as one-sheet.pdf or one-sheet-12345678.pdf is better than 1234567812345678.pdf. cmp eg the following pdf viewer on macos - the title would become sth. totally random with this pr:

Screenshot 2023-08-04 at 20 19 27

i known, the idea is that all files should be copied before being accessed from external apps, however, we're not there yet on any UI; most visible, however it seems to be on desktop, cc @Simon-Laux

so, as the filename in the blob-directory does not have a meaning anyways and we're anyways have no consistency there, it would remove pressure from UIs and simplify getting this pr to stable if we keep filename generation as is (or maybe always add the random number so that it is more visible where the "wrong" filenames are used).

this pr could add only add the save-function, use correct names on forwarding, and what else is done here (@iequidoo i find pr descriptions as "some discussions here and there" without some additional words hard for reviewers, it would be good to always have at least a handful sentences in your own words about reasonings, effects, successor of etc)

as a side-effect, this would keep debugging blob-dir simple :)

when UI are adapted to the new API, we can think over to use completely random file names.

@link2xt
Copy link
Collaborator

link2xt commented Aug 4, 2023

The main tracking issue for this PR is #4309

This PR is not meant to be a next step at this point, see my comment at #4309 (comment)
We indeed first need to adapt the UIs to avoid opening the files directly from the blobstore. Desktop is currently not even using the filename for the Save As dialog: deltachat/deltachat-desktop#3330

@iequidoo
Copy link
Collaborator Author

iequidoo commented Aug 5, 2023

@iequidoo i find pr descriptions as "some discussions here and there" without some additional words hard for reviewers, it would be good to always have at least a handful sentences in your own words about reasonings, effects, successor of etc

Ok, i will better improve commit messages, i prefer putting everything there and use PR descriptions just for additional info.

As for dc_msg_get_file(), i can implement what @link2xt suggested:

dc_msg_get_file API should create a temporary directory inside the blobdir, save a copy with unmangled filename, and return its path. In the housekeeping we regularly clear these directories. This API is kept for compatibility, but deprecated.

but then we need to add some new call like dc_msg_get_filedata_path() which will return the path of a file with mangled name (but with an original extension) to avoid unnecessary copying.

@iequidoo
Copy link
Collaborator Author

iequidoo commented Aug 5, 2023

Also i can just remove the last commit so that there will be random suffixes, but not whole filenames. Anyway now we can get files with random suffixes from dc_msg_get_file() in case of filename duplication

@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 48edc0c to b79b2e1 Compare August 28, 2023 22:30
@link2xt link2xt deleted the branch iequidoo/blob-random-file-suffixes October 25, 2023 21:22
@link2xt link2xt closed this Oct 25, 2023
@link2xt link2xt reopened this Oct 25, 2023
@iequidoo iequidoo changed the base branch from master to main October 26, 2023 01:13
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from b79b2e1 to 0776811 Compare October 28, 2023 03:13
@iequidoo iequidoo marked this pull request as draft October 28, 2023 03:24
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 0776811 to eadd138 Compare October 28, 2023 08:03
@iequidoo iequidoo marked this pull request as ready for review October 28, 2023 08:06
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from eadd138 to bb48814 Compare November 14, 2023 07:32
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from bb48814 to 53f01ca Compare December 2, 2023 20:04
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 53f01ca to f2bae94 Compare January 7, 2024 01:46
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from f2bae94 to 7358c33 Compare January 24, 2024 02:47
@iequidoo iequidoo changed the title feat: Use random filenames for blobstorage (#4309) feat: Use random filename suffixes for blobstorage (#4309) Jan 24, 2024
@iequidoo
Copy link
Collaborator Author

Removed the last commit to unblock merging this

@iequidoo iequidoo changed the base branch from main to iequidoo/dc_msg_save_file March 13, 2024 04:42
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 49b0e82 to aaffb9d Compare March 18, 2024 05:32
@iequidoo iequidoo force-pushed the iequidoo/dc_msg_save_file branch from a1bf4a9 to e635108 Compare March 18, 2024 05:33
@iequidoo iequidoo force-pushed the iequidoo/dc_msg_save_file branch 2 times, most recently from fbceb01 to 435ff59 Compare April 24, 2024 02:32
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from aaffb9d to 27dc07e Compare April 24, 2024 02:51
Base automatically changed from iequidoo/dc_msg_save_file to main April 24, 2024 19:38
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 27dc07e to 25f1eec Compare April 25, 2024 02:12
@iequidoo iequidoo changed the base branch from main to iequidoo/blob-random-file-suffixes April 25, 2024 02:20
Copy link
Member

@r10s r10s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a high-level comment:

making dc_msg_get_file() return unmangled name will improve some things in UI. at the same time, as this requires copying the file, converts the function from a cheap to an expensive one, potentially worsening things - UI have used dc_msg_get_file() more as get_state(), it is always fast anyways, no need to think over it can be slow or one needs to save results.

to target this, UI has to check all usages of dc_msg_get_file() - to avoid side effects, but also as dc_msg_get_file is anyways deprecated

but if UI anyways needs to check this, then it seems better to leave the functionality of dc_msg_get_file() as is - so it will become the dc_msg_get_filedata_path() (which it is already mostly, function description is mostly the same), UI wise, things may get slightly worse, as the filename is always “bad” - but UI would need to do the checks anyways and by this approach:

  • no unexpected slowness or worsening can happen - there are lots of calls to dc_msg_get_file() - and that in already complex areas as "share", i expect things to be overseen
  • no new deprecated API
  • things are working as before (apart from more often mangled names)
  • it is very visible to devs/users what is missing: if eg. external apps show the mangled names, one needs to transfer things via a BLOB+dc_msg_get_filename() or use dc_msg_save_file()
  • finally, but only as a sidenode on an implementation detail: not creating additional subdirectories in blobdir may have maintenance benefits in the future

might be i voted against that approach before, idk, but it looks better to me, thinking it over, also with the otherwise behaviour change of dc_msg_get_file()

Recently there was an accident with a chatbot that replaced its avatar set from the command line
with an unrelated avatar of a contact. Both the `selfavatar` setting and the contact avatar `i`
param pointed to `$BLOBDIR/avatar.png` at the time it was detected. How this happened is unclear,
but it is possible that `avatar.png` was removed, unmounted or otherwise not detected by the core,
and the core stored avatar received from the contact as `avatar.png`, while `selfavatar` config
still pointed to `$BLOBDIR/avatar.png`.

Such bugs are unavoidable even if the core itself has no bugs as we cannot rely on blobdir not
reside on the faulty network filesystem, being incorrectly backed up and restored etc., so we should
assume that files may be randomly removed. Then there may be dangling `$BLOBDIR/...` references in
the database which may accidentally point to unrelated files, could even be an `avatar.png` file
sent to the bot in private.

To prevent such bugs, we add random filename suffixes for the blobdir objects. Thanks to the added
Param::Filename these random suffixes aren't sent over the network.
@iequidoo iequidoo force-pushed the iequidoo/blob-random-file-suffixes branch from b248bbf to ed33f30 Compare April 26, 2024 00:19
)

This way filenames in the blobstorage are just random hex numbers. This also allows us to get rid of
the `sanitize-filename` dependency.

This also requires `Param::Filename` to be set to "debug_logging*.xdc" for messages containing
logging webxdc-s, otherwise they are not detected properly. This is done in "fix:
Message::set_file_from_bytes(): Set Param::Filename", so don't forget to update senders as well.
…pace (#5338)

Before file extensions were also limited to 32 chars, but extra chars in the beginning were just cut
off, e.g. "file.with_lots_of_characters_behind_point_and_double_ending.tar.gz" was considered to
have an extension "d_point_and_double_ending.tar.gz". Better to take only "tar.gz" then.

Also don't include whitespace-containing parts in extensions. File extensions generally don't
contain whitespaces.
@iequidoo iequidoo force-pushed the iequidoo/blob-random-filename branch from 25f1eec to 562c4f8 Compare April 27, 2024 04:17
@iequidoo iequidoo requested a review from r10s April 27, 2024 04:19
@iequidoo iequidoo force-pushed the iequidoo/blob-random-file-suffixes branch 3 times, most recently from deca65d to 0610b7d Compare May 23, 2024 21:51
@iequidoo iequidoo force-pushed the iequidoo/blob-random-file-suffixes branch 4 times, most recently from 8a315e8 to d2e299d Compare October 5, 2024 18:47
@iequidoo iequidoo marked this pull request as draft October 6, 2024 21:23
@iequidoo iequidoo changed the title feat: Use random filenames for blobstorage (#4309) fix: Assume file extensions are 32 chars max and don't contain whitespace (#5338) Oct 6, 2024
@iequidoo iequidoo closed this Oct 8, 2024
@iequidoo
Copy link
Collaborator Author

iequidoo commented Oct 8, 2024

Opened #6033 instead

@iequidoo iequidoo changed the title fix: Assume file extensions are 32 chars max and don't contain whitespace (#5338) feat: Remove original file stem from filenames in the blobstorage (#4309) Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delta Chat treats a part of file name as an extension Use random filenames or hashes for blobstorage
3 participants