Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace datalad-fuse with davfs2 #77

Merged
merged 2 commits into from
Jun 12, 2024
Merged

Replace datalad-fuse with davfs2 #77

merged 2 commits into from
Jun 12, 2024

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Jun 12, 2024

Closes #76.

After this is merged, ~dandi/cronlib/dandisets-healthstatus on drogon will have to be updated to the new code (manually?).

@jwodder jwodder requested a review from yarikoptic June 12, 2024 13:45
Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 60.62%. Comparing base (c51d691) to head (2edf8e0).
Report is 2 commits behind head on main.

Files Patch % Lines
code/src/healthstatus/mounts.py 33.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   60.71%   60.62%   -0.10%     
==========================================
  Files          10       10              
  Lines         840      838       -2     
  Branches      193      192       -1     
==========================================
- Hits          510      508       -2     
  Misses        310      310              
  Partials       20       20              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment but otherwise -- let's do it! Please merge/deploy on drogon.

I have interrupted, saved, pulled, pushed results in a running session under screen on drogon, so it is all in your hands now.

code/setup.cfg Outdated Show resolved Hide resolved
Co-authored-by: Yaroslav Halchenko <[email protected]>
@jwodder jwodder merged commit d9f2748 into main Jun 12, 2024
14 of 16 checks passed
@jwodder jwodder deleted the gh-76 branch June 12, 2024 14:22
@jwodder
Copy link
Member Author

jwodder commented Jun 12, 2024

@yarikoptic

  • ask_auth needs to be set to 0 in /etc/davfs2/davfs2.conf
  • It appears that dandi@drogon does not have permission to run the appropriate mount & umount commands via passwordless sudo. At the very least, sudo -ln complains that a password is required.
    • Based on this comment, you set up sudo for jwodder, but that's not the user we're running dandisets-healthstatus under.

@yarikoptic
Copy link
Member

  • done
  • done
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse

@jwodder
Copy link
Member Author

jwodder commented Jun 12, 2024

@yarikoptic The setup should be complete now. Should I just run it in one of my screen sessions or leave that up to you?

@yarikoptic
Copy link
Member

please run under your screen so to check if all is good etc.

@jwodder
Copy link
Member Author

jwodder commented Jun 12, 2024

@yarikoptic Problem: The code previously recorded the hash of the current commit of each Dandiset repository, but with davfs2, we don't have access to that information.

@yarikoptic
Copy link
Member

that's interesting! we need some kind of a notion of a version, e.g. last-modified-date. Could you query from dandi-archive API per each dandiset?

we are going through drafts only avoiding releases, right?

@yarikoptic
Copy link
Member

hold on -- you expose that in webdav:

image

does davfs2 expose it as mtime for the draft/ folder ? if so -- take that and be done?

@jwodder
Copy link
Member Author

jwodder commented Jun 12, 2024

@yarikoptic The script seems to be running fine now, but now just traversing Dandiset file hierarchies seems slower than before.

@yarikoptic
Copy link
Member

how much slower? my "diagnostic" is the fact that there is barely any python or MATLAB cpu busy tasks :-/

@yarikoptic
Copy link
Member

it might relate to

  • web dyno getting OOMKilled dandidav#118
    as when I tail -f the current log file on drogon:/mnt/backup/dandi/heroku-logs/dandidav it "pauses" at heroku[web.1]: Error R14 (Memory quota exceeded) although it could be red-herring.

@jwodder
Copy link
Member Author

jwodder commented Jun 13, 2024

@yarikoptic I had to restart the script in order to apply a fix. Looking at the script's output now, just the listing of the dandisets/ directory took 30 seconds to get from 000003 to 000026, at which point it got stuck for three minutes, then four more entries were read in 6 seconds, and it now seems to be alternating between stalling for several minutes and retrieving small numbers of entries. I do not have any timings for datalad-fuse to compare against.

Before you propose any explanations, let me point out that requests to dandidav to list the entries of dandisets/ are responded to with the complete directory listing, so this is presumably davfs2's fault for not caching appropriately. It seems v1.7.0 of davfs2 added caching, but it ended up making things slower.

@jwodder
Copy link
Member Author

jwodder commented Jul 8, 2024

@yarikoptic I'm almost done rewriting the code to traverse Dandisets via the Archive API instead of mounted-WebDAV (as discussed in the meeting last week). However, there are issues with updating the test-files subcommand (for running a test on a local file and optionally saving the test results to the Dandiset's status.yaml). Do we want to keep this subcommand? What about keeping its ability to save to status.yaml? If yes to both, should the command assume that the given file paths are always in a WebDAV mount? If not, how should the Dandiset ID and asset path be determined?

@yarikoptic
Copy link
Member

Do we have dandiset.yaml in the top of the dandiset? If yes -- just traversal up until hitting it and that would give you ID and top path to deduce asset path. Cache results of such helper function per each folder value so we do not re-traverse without need.

Can't check now since I believe davfs2 is simply hanging and thus two hanging jobs I have mentioned last week:

dandi    4133314  0.0  0.0      0     0 ?        D    Jun29   0:02 [python]
dandi    4188696  0.0  0.0 422664  4408 ?        D    Jun29   0:10 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000045/draft/sub-01be78e7-8741-4b40-bd64-79ed745431b5/sub-01be78e7-8741-4b40-bd64-79ed745431b5_ses-f6f55a42-6105-4cc9-9c8c-3d3264828b8b_behavior+image.nwb
dandi    4188715  0.0  0.0 414872  4420 ?        D    Jun29   0:02 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000059/draft/sub-MS12/sub-MS12_ses-Peter-MS12-170719-095305-concat_desc-raw_ecephys.nwb
dandi    4189021  0.0  0.0      0     0 ?        Zl   Jun29   0:02 [python] <defunct>
dandi    4189095  0.0  0.0      0     0 ?        Zl   Jun29   0:02 [python] <defunct>

those should be killed, davfs2 restarted.

@jwodder
Copy link
Member Author

jwodder commented Jul 8, 2024

@yarikoptic

Do we have dandiset.yaml in the top of the dandiset?

If test-files is run on a clone, copy, or WebDAV-mount of a Dandiset, yes. However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a dandiset.yaml will be slow.

Moreover, if updating status.yaml, currently test-files needs to traverse the whole of the Dandiset it's operating on in order to get all asset paths in order to prune assets from status.yaml that have since been deleted. The options for handling this are:

  • Traverse the filesystem normally — will be slow if using a davfs2 mount
  • Get the list of assets from the Archive API — If the local Dandiset is not up to date with the archive (which can happen when not using a WebDAV mount), this can lead to problems, such as the very file that was tested being deleted from status.yaml.
  • Add a CLI option for selecting one of the above two behaviors
  • Remove test-files's behavior of pruning deleted assets

Which one should I go with?

those should be killed, davfs2 restarted.

I ran kill -9 on the pynwb_open_load_ns.py last week, but they're still stuck. I don't know what those zombie processes are. I sent you a message in Slack asking you to use sudo to kill the davsf2 process. I tried unmounting the davfs2 mount when I started writing this comment, but the process has just been hanging since.

@yarikoptic
Copy link
Member

However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a dandiset.yaml will be slow.

we do not even need to list the directory, just rely on os.path.lexists or alike while lru caching the call so that for a folder we never do it twice through runtime of that process. Moreover could be optimized that if any of the parents already known to have dandiset.yaml - no need to ask for any child folder. This way it would literally be once per dandiset traversal up.

@jwodder
Copy link
Member Author

jwodder commented Jul 8, 2024

@yarikoptic Please address the rest of my comment as well.

@yarikoptic
Copy link
Member

Re test-files -- couldn't we just completely disable dealing with the files which are not given to the test-files invocation, i.e. pruning them etc? Pruning deleted paths in general should probably be the job for "dandiset-wide" operation (even if we randomly select only some to go through).

@jwodder
Copy link
Member Author

jwodder commented Jul 8, 2024

@yarikoptic Yes, that is option number 4.

@yarikoptic
Copy link
Member

Cool, let's go with it

@jwodder jwodder mentioned this pull request Jul 9, 2024
3 tasks
@jwodder
Copy link
Member Author

jwodder commented Jul 9, 2024

@yarikoptic Another problem: When running test-files on a Dandiset, what should be done about updating the draft_modified timestamp in status.yaml? Should it be updated and, if so, to what value/how should that value be obtained? What if status.yaml does not exist yet or has not yet been updated to use draft_modified?

@yarikoptic
Copy link
Member

For test-files -- i.e. running test on specific files only -- I do not think we should update draft_modified . It is mostly an interface to troubleshoot errors and timeouts while also "not wasting" doing so and possibly updating those files records. As such, can avoid concerning itself with any dandiset wide (removal of gone entries, draft_modifed, etc) operations.

@jwodder jwodder added the datalad-fuse Relating to accessing assets over a datalad-fuse mount label Jul 12, 2024
@jwodder jwodder added WebDAV Relating to accessing assets over a WebDAV mount performance More efficient use of time and space asset access How the program gets the assets and lists thereof labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asset access How the program gets the assets and lists thereof datalad-fuse Relating to accessing assets over a datalad-fuse mount performance More efficient use of time and space WebDAV Relating to accessing assets over a WebDAV mount
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch deployment on drogon to use dandidav + davfs2 instead of datalad-fuse
2 participants