Replace datalad-fuse with davfs2 #77

jwodder · 2024-06-12T13:45:50Z

Closes #76.

After this is merged, ~dandi/cronlib/dandisets-healthstatus on drogon will have to be updated to the new code (manually?).

codecov · 2024-06-12T13:48:20Z

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 60.62%. Comparing base (c51d691) to head (2edf8e0).
Report is 2 commits behind head on main.

Files	Patch %	Lines
code/src/healthstatus/mounts.py	33.33%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   60.71%   60.62%   -0.10%     
==========================================
  Files          10       10              
  Lines         840      838       -2     
  Branches      193      192       -1     
==========================================
- Hits          510      508       -2     
  Misses        310      310              
  Partials       20       20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yarikoptic

One minor comment but otherwise -- let's do it! Please merge/deploy on drogon.

I have interrupted, saved, pulled, pushed results in a running session under screen on drogon, so it is all in your hands now.

code/setup.cfg

Co-authored-by: Yaroslav Halchenko <[email protected]>

jwodder · 2024-06-12T14:27:10Z

@yarikoptic

ask_auth needs to be set to 0 in /etc/davfs2/davfs2.conf
It appears that dandi@drogon does not have permission to run the appropriate mount & umount commands via passwordless sudo. At the very least, sudo -ln complains that a password is required.
- Based on this comment, you set up sudo for jwodder, but that's not the user we're running dandisets-healthstatus under.

yarikoptic · 2024-06-12T15:25:53Z

done
done

dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse

jwodder · 2024-06-12T15:27:53Z

@yarikoptic The setup should be complete now. Should I just run it in one of my screen sessions or leave that up to you?

yarikoptic · 2024-06-12T15:42:31Z

please run under your screen so to check if all is good etc.

jwodder · 2024-06-12T16:06:44Z

@yarikoptic Problem: The code previously recorded the hash of the current commit of each Dandiset repository, but with davfs2, we don't have access to that information.

yarikoptic · 2024-06-12T17:24:00Z

that's interesting! we need some kind of a notion of a version, e.g. last-modified-date. Could you query from dandi-archive API per each dandiset?

we are going through drafts only avoiding releases, right?

yarikoptic · 2024-06-12T17:25:09Z

hold on -- you expose that in webdav:

does davfs2 expose it as mtime for the draft/ folder ? if so -- take that and be done?

jwodder · 2024-06-12T19:54:03Z

@yarikoptic The script seems to be running fine now, but now just traversing Dandiset file hierarchies seems slower than before.

yarikoptic · 2024-06-13T02:09:24Z

how much slower? my "diagnostic" is the fact that there is barely any python or MATLAB cpu busy tasks :-/

yarikoptic · 2024-06-13T13:30:13Z

it might relate to

web dyno getting OOMKilled dandidav#118
as when I tail -f the current log file on drogon:/mnt/backup/dandi/heroku-logs/dandidav it "pauses" at heroku[web.1]: Error R14 (Memory quota exceeded) although it could be red-herring.

jwodder · 2024-06-13T13:53:36Z

@yarikoptic I had to restart the script in order to apply a fix. Looking at the script's output now, just the listing of the dandisets/ directory took 30 seconds to get from 000003 to 000026, at which point it got stuck for three minutes, then four more entries were read in 6 seconds, and it now seems to be alternating between stalling for several minutes and retrieving small numbers of entries. I do not have any timings for datalad-fuse to compare against.

Before you propose any explanations, let me point out that requests to dandidav to list the entries of dandisets/ are responded to with the complete directory listing, so this is presumably davfs2's fault for not caching appropriately. It seems v1.7.0 of davfs2 added caching, but it ended up making things slower.

jwodder · 2024-07-08T13:42:31Z

@yarikoptic I'm almost done rewriting the code to traverse Dandisets via the Archive API instead of mounted-WebDAV (as discussed in the meeting last week). However, there are issues with updating the test-files subcommand (for running a test on a local file and optionally saving the test results to the Dandiset's status.yaml). Do we want to keep this subcommand? What about keeping its ability to save to status.yaml? If yes to both, should the command assume that the given file paths are always in a WebDAV mount? If not, how should the Dandiset ID and asset path be determined?

yarikoptic · 2024-07-08T17:21:31Z

Do we have dandiset.yaml in the top of the dandiset? If yes -- just traversal up until hitting it and that would give you ID and top path to deduce asset path. Cache results of such helper function per each folder value so we do not re-traverse without need.

Can't check now since I believe davfs2 is simply hanging and thus two hanging jobs I have mentioned last week:

dandi    4133314  0.0  0.0      0     0 ?        D    Jun29   0:02 [python]
dandi    4188696  0.0  0.0 422664  4408 ?        D    Jun29   0:10 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000045/draft/sub-01be78e7-8741-4b40-bd64-79ed745431b5/sub-01be78e7-8741-4b40-bd64-79ed745431b5_ses-f6f55a42-6105-4cc9-9c8c-3d3264828b8b_behavior+image.nwb
dandi    4188715  0.0  0.0 414872  4420 ?        D    Jun29   0:02 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000059/draft/sub-MS12/sub-MS12_ses-Peter-MS12-170719-095305-concat_desc-raw_ecephys.nwb
dandi    4189021  0.0  0.0      0     0 ?        Zl   Jun29   0:02 [python] <defunct>
dandi    4189095  0.0  0.0      0     0 ?        Zl   Jun29   0:02 [python] <defunct>

those should be killed, davfs2 restarted.

jwodder · 2024-07-08T17:42:13Z

@yarikoptic

Do we have dandiset.yaml in the top of the dandiset?

If test-files is run on a clone, copy, or WebDAV-mount of a Dandiset, yes. However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a dandiset.yaml will be slow.

Moreover, if updating status.yaml, currently test-files needs to traverse the whole of the Dandiset it's operating on in order to get all asset paths in order to prune assets from status.yaml that have since been deleted. The options for handling this are:

Traverse the filesystem normally — will be slow if using a davfs2 mount
Get the list of assets from the Archive API — If the local Dandiset is not up to date with the archive (which can happen when not using a WebDAV mount), this can lead to problems, such as the very file that was tested being deleted from status.yaml.
Add a CLI option for selecting one of the above two behaviors
Remove test-files's behavior of pruning deleted assets

Which one should I go with?

those should be killed, davfs2 restarted.

I ran kill -9 on the pynwb_open_load_ns.py last week, but they're still stuck. I don't know what those zombie processes are. I sent you a message in Slack asking you to use sudo to kill the davsf2 process. I tried unmounting the davfs2 mount when I started writing this comment, but the process has just been hanging since.

yarikoptic · 2024-07-08T19:03:13Z

However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a dandiset.yaml will be slow.

we do not even need to list the directory, just rely on os.path.lexists or alike while lru caching the call so that for a folder we never do it twice through runtime of that process. Moreover could be optimized that if any of the parents already known to have dandiset.yaml - no need to ask for any child folder. This way it would literally be once per dandiset traversal up.

jwodder · 2024-07-08T19:09:28Z

@yarikoptic Please address the rest of my comment as well.

yarikoptic · 2024-07-08T19:16:26Z

Re test-files -- couldn't we just completely disable dealing with the files which are not given to the test-files invocation, i.e. pruning them etc? Pruning deleted paths in general should probably be the job for "dandiset-wide" operation (even if we randomly select only some to go through).

jwodder · 2024-07-08T19:18:05Z

@yarikoptic Yes, that is option number 4.

yarikoptic · 2024-07-08T19:26:30Z

Cool, let's go with it

jwodder · 2024-07-09T12:35:47Z

@yarikoptic Another problem: When running test-files on a Dandiset, what should be done about updating the draft_modified timestamp in status.yaml? Should it be updated and, if so, to what value/how should that value be obtained? What if status.yaml does not exist yet or has not yet been updated to use draft_modified?

yarikoptic · 2024-07-09T16:47:48Z

For test-files -- i.e. running test on specific files only -- I do not think we should update draft_modified . It is mostly an interface to troubleshoot errors and timeouts while also "not wasting" doing so and possibly updating those files records. As such, can avoid concerning itself with any dandiset wide (removal of gone entries, draft_modifed, etc) operations.

Replace datalad-fuse with davfs2

fc4eeae

jwodder requested a review from yarikoptic June 12, 2024 13:45

yarikoptic approved these changes Jun 12, 2024

View reviewed changes

code/setup.cfg Outdated Show resolved Hide resolved

Update code/setup.cfg

2edf8e0

Co-authored-by: Yaroslav Halchenko <[email protected]>

jwodder merged commit d9f2748 into main Jun 12, 2024
14 of 16 checks passed

jwodder deleted the gh-76 branch June 12, 2024 14:22

jwodder mentioned this pull request Jul 9, 2024

Traverse assets via Archive API #81

Merged

3 tasks

jwodder added the datalad-fuse Relating to accessing assets over a datalad-fuse mount label Jul 12, 2024

jwodder added WebDAV Relating to accessing assets over a WebDAV mount performance More efficient use of time and space asset access How the program gets the assets and lists thereof labels Jul 12, 2024

This was referenced Jul 15, 2024

datalad-fuse via fsspec caching is too slow - needs a solution #53

Closed

Remove parallel processing? #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace datalad-fuse with davfs2 #77

Replace datalad-fuse with davfs2 #77

jwodder commented Jun 12, 2024

codecov bot commented Jun 12, 2024 •

edited

Loading

yarikoptic left a comment

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 13, 2024

yarikoptic commented Jun 13, 2024

jwodder commented Jun 13, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 9, 2024

yarikoptic commented Jul 9, 2024

Replace datalad-fuse with davfs2 #77

Replace datalad-fuse with davfs2 #77

Conversation

jwodder commented Jun 12, 2024

codecov bot commented Jun 12, 2024 • edited Loading

Codecov Report

yarikoptic left a comment

Choose a reason for hiding this comment

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

yarikoptic commented Jun 12, 2024

jwodder commented Jun 12, 2024

yarikoptic commented Jun 13, 2024

yarikoptic commented Jun 13, 2024

jwodder commented Jun 13, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 8, 2024

yarikoptic commented Jul 8, 2024

jwodder commented Jul 9, 2024

yarikoptic commented Jul 9, 2024

codecov bot commented Jun 12, 2024 •

edited

Loading