Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting remote IPFS nodes has poor file system performance #19

Closed
djdv opened this issue Nov 17, 2022 · 2 comments
Closed

Mounting remote IPFS nodes has poor file system performance #19

djdv opened this issue Nov 17, 2022 · 2 comments
Assignees

Comments

@djdv
Copy link
Owner

djdv commented Nov 17, 2022

This is (likely) due to the port from our custom interface (which was mostly channel oriented, had optimizations, and caching) to the new Go standard fs.FS which required porting and redesigning.

The port was done somewhat hastily, and generally works, but it's suboptimal.
Local connections are acceptable, but connections to remote systems are slow enough that some operating systems will not tolerate it (operation timeouts can happen if bad enough).

This was not the case in our old prototypes. Remote mounted nodes were totally practical to use, but currently are not in this implementation.


I'm working on this now in the cgofuse pkg which utilizes a Go fs.FS, as well as the various fs.FS implementations for the IPFS APIs.
So far I've found a lot of obvious problems and some mistakes.

  1. The file table we use to track open handles in FUSE was a bare minimum map that just grew forever in size.
    This has been amended with a dynamic array (Go slice) which seems more optimal.
    A smaller table is faster to iterate through and index into than the constantly growing map, which likely reduces a lot of lock contention.
    The implementation is basic and will later need some amenities like exposing the ability to change the maximum allowed open handles (similar to ulimit, but within our API).
    There's also a very minor consideration about shrinking the table when handles get freed up, but since those slots get re-used during operation, it's likely that re-allocations would not be worth the few bytes it could save.
  2. When we moved from our custom API to the (then new) fs.FS API, we lost a lot of optimizations and real-time features.
    The standard is more slice oriented than channel oriented, so things had to be buffered in batches in places where they were emitted as-needed before.
    And the ported code is newer, and the API is different enough that our caching methods no longer applied.
    Both of these are non-issues beyond just putting time into it to do them properly again with regard to the standard Go FS interface rather than what we had previously.
  3. Some of the interface implementations right now are doing unnecessary work. I made a mistake in regard to directory entries when reading the documentation. 2/3 of the interfaces state that entries must be ordered by their name, while the one we're implementing lets us return them in arbitrary/"directory"-order.
    Some of the IPFS APIs return a stream whos order is not guarantied. Due to my misreading of the Go documentation, I was thus preloading the entire content of the stream, sorting it, and only then could we start feeding entries to the OS. That's wrong, we can just return them as we receive them to the OS. It's only the higher-level extension to fs.FS that specify order, not the ReadDirFile interface methods.
    We also have the liberty to extend fs.FS and we do. However, to be a proper general wrapper for any fs.FS implementation; we have to implement functions that adapt the standard interfaces, to our custom extensions. (Following the style of things like fs.ReadFile, and the other extension patterns as encouraged by the Go standard / FS interface proposal)

There's probably other things too. Right now 2/3 of these are done, with the 3rd in progress.
I'll make a PR for these when complete. But it's probably going to be swathing and annoying to review (for someone external) because I'm just trying to get it done rather than make the commit history pretty.

@djdv djdv self-assigned this Nov 17, 2022
@djdv
Copy link
Owner Author

djdv commented Mar 23, 2023

I added a bunch of caching mechanisms to IPFS, PinFS, and IPNS to reduce the amount of HTTP API calls we were making.
These are included in the latest pre-release.
This issue can be closed after those changes get properly reviewed and merged.

At least in my anecdotal tests, the performance difference is dramatic.
Mounting a remote IPFS node went from "impractical/painful" to "not bad".
Local nodes see benefits as well.

@djdv
Copy link
Owner Author

djdv commented Jul 11, 2023

There's still room for improvement in the caching implementation (#30) but the initial version of that has now been merged into master.
I maintain that while it's not "fast", it's practically viable compared to how things were before the cache was implemented.
Luckily, most users will likely be mounting IPFS nodes which are on the same machine or at least the same LAN where this is basically a non-issue.

@djdv djdv closed this as completed Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant