-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
severe file corruption - 2 different bugs - not sure if in sshfs or fuse3, reporting in both projects #302
Comments
I did further investigation and was able to triage the bug better. Here is more easily reproducible and more informative test. Lets increase dcache_stat_timeout, because it masks the bug. Corruption happens when copying changed files before dcache_stat_timeout expiration. lets mount sshfs with the following command line: Here is the test:
The result is:
This means than first read of the file after the change returns corrupted content - file is cut to the old size, but has a new content. In my use case - this caused a lot of PDF files changed by a script and then copied over sshfs to be severely corrupted - unopenable. There is also a second bug - related to the first. If you repeat the test, but mount sshfs with a slightly different command line: Then the result is:
This seems related to cache handling and so the bug is probably in fuse3 and most probably due to improper cache and/or stat attributes cache expiration in auto_cache open. |
thanks for the report! that's some solid digging. can you test with a different fuse program that uses ssh as well, such as rclone to test if its a problem with the fuse library? |
I was preparing to test with rclone, but then stumbled upon this text in their docs: Attribute caching The default is 1s which caches files just long enough to avoid too many callbacks to rclone from the kernel. In theory 0s should be the correct value for filesystems which can change outside the control of the kernel. However this causes quite a few problems such as rclone using too much memory, rclone not serving files to samba and excessive time listing directories. The kernel can cache the info about a file for the time given by --attr-timeout. You may see corruption if the remote file changes length during this window. It will show up as either a truncated file or a file with garbage on the end. With --attr-timeout 1s this is very unlikely but not impossible. The higher you set --attr-timeout the more likely it is. The default setting of "1s" is the lowest setting which mitigates the problems above. If you set it higher (10s or 1m say) then the kernel will call back to rclone less often making it more efficient, however there is more chance of the corruption issue above. If files don't change on the remote outside of the control of rclone then there is no chance of corruption. This is the same as setting the attr_timeout option in mount.fuse. This all sounds reasonable and was valid before libfuse3 tried implementing auto_cache and ac_attr_timeout. Auto_cache, at least in theory, was supposed to prevent those issues, by allowing:
The only problem is - ar_attr_timeout expires the filesize/attributes cache AFTER the read, not before So - it seems - this is libfuse bug. Rclone seems to use the same libfuse3 as sshfs though ? |
It appears sshfs uses libfuse3-dev sshfs/.github/workflows/build-ubuntu.yml Line 25 in eadf7f1
where as rclone uses libfuse-dev which is actually libfuse 2 |
libfuse issue: libfuse/libfuse#945 |
Arch Linux
SSHFS version 3.7.3
FUSE library version 3.16.2
using FUSE kernel interface version 7.38
fusermount3 version: 3.16.2
Steps to reproduce:
second and third md5sums of the file are different
the file has not been changed between them
there is also another bug - sometimes the second md5sum is THE SAME as first, even though the file is edited between taking md5sums. this happens even though "-o auto_cache,ac_attr_timeout=0" is given, and happens even with 0 timeout for detecting changes. happens less frequently, but if you do ~10 consecutive test (possibly with re-mounting between them) - this bug will also happen
I suspect this can be due to ac_attr_timeout=0 not properly invalidaing cache in libfuse3, so i will file the same report in libfuse project
The text was updated successfully, but these errors were encountered: