Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFS - do not scan whole list of files, like android client #3120

Open
tomdereub opened this issue Apr 14, 2021 · 21 comments
Open

VFS - do not scan whole list of files, like android client #3120

tomdereub opened this issue Apr 14, 2021 · 21 comments
Assignees
Labels
confirmed bug approved by the team enhancement enhancement of a already implemented feature/code feature: 💽 virtual filesystem Performance 🚀

Comments

@tomdereub
Copy link
Contributor

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Expected behaviour

I would like the windows desktop client to behave like the android client : the tree view is downloaded only when it’s accessed for the first time, one folder at a time, and then it’s kept in cache to access it quickly, but it never needs to scan all files.

Actual behaviour

I’m trying virtual file system feature with nextcloud desktop 3.2 on Windows. I mount quite a big folder (about 1.4 To of data, about 2 million files), and it takes very long time first to scan all files, and then to create the tree view in the explorer.
Actually, scanning all files on the server took a few hours, but then creating the tree view is taking days (no internet traffic during this operation, so it seems to be just local operations). It's was quite quick at the beginning (hundred of files per second), and it's now very slow (a few files per second).

image

Steps to reproduce

  1. Install windows desktop client 3.2
  2. Add a synchronization using virtual file system on a big folder

Client configuration

Client version: 3.2

Operating system: Windows

OS language: French

Server configuration

Nextcloud version: 20.0.7

Storage backend (external storage): the synchronized folder is mounted as an external storage on Nextcloud.

Logs

Tell me if you want some logs.

Linked discussion on the forum : https://help.nextcloud.com/t/vfs-do-not-scan-whole-list-of-files-like-android-client/113623

@FlexW FlexW added enhancement enhancement of a already implemented feature/code feature: 💽 virtual filesystem labels Apr 29, 2021
@Zegorax
Copy link

Zegorax commented Aug 26, 2021

I have the exact same problem. We are hosting around 3TB of data, and if a new computer is added, it will need to stay turned on for weeks (18 days) to be able to scan the directories and build the tree view. I was only able to make it work once. Every other attempt failed because of some sort of error in Nextcloud desktop, which made it scan from the beginning again.

I abandoned the idea. OneDrive is the perfect example for this, it does it very well and is implementing the native CloudFiles API and only asks the directory the user is requesting to browse.

Any idea when this feature could be implemented ?

@mgallien
Copy link
Collaborator

I have the exact same problem. We are hosting around 3TB of data, and if a new computer is added, it will need to stay turned on for weeks (18 days) to be able to scan the directories and build the tree view. I was only able to make it work once. Every other attempt failed because of some sort of error in Nextcloud desktop, which made it scan from the beginning again.

I abandoned the idea. OneDrive is the perfect example for this, it does it very well and is implementing the native CloudFiles API and only asks the directory the user is requesting to browse.

Any idea when this feature could be implemented ?

I do not understand what you are trying to say.
With OneDrive, the full hierarchy will be synced also. This is a requirement of the CFApi from Microsoft.

@Zegorax
Copy link

Zegorax commented Aug 26, 2021

@mgallien Yes, the hierarchy is downloaded dynamically from the server, and it is really a breeze to use in our company. For example, when adding a new shared library, the virtual files are accessibles instantly.

Whereas Nextcloud needs to download beforehand the whole hierarchy. And if there is millions of files, the initial sync never ends.

@mgallien mgallien reopened this Aug 27, 2021
@mgallien
Copy link
Collaborator

@tomdereub I guess OneDrive for example seems to have an optimization that is used when you setup the account the first time to have it faster than Nextcloud.
I guess I will see if we can find an equivalent solution to help.
Let's not keep discussing and agree about this evolution. I will then track any progress on this and update this issue once a solution is done.

@mgallien mgallien added the confirmed bug approved by the team label Aug 27, 2021
@Zegorax
Copy link

Zegorax commented Aug 27, 2021

@mgallien If you need another example, SeaFile with its SeaDrive virtual file system is also working very well.

@tomdereub
Copy link
Contributor Author

Any progress about this issue ?

@tomdereub
Copy link
Contributor Author

Up, any progress about this issue ? I'm trying again, with Nextcloud 25 and desktop client 3.8.2, the initial scan doesn't manage to finish (even after days), so it's still unusable with lots of data.

@tomdereub
Copy link
Contributor Author

Is it relevant to keep 2 issues open for this problem ? This one and #4424.
I think there is a choice to do :

  • the current issue proposes that nextcloud desktop does not scan the whole list of files, like android client does, or like montainduck can do. This option seems to me the most efficient and scalable.
  • improve speed for initial sync enough so that indexing all files is no more problem.

Actually, mountainduck proposes the 2 possibilities :
image

The dev team, have you already made a choice ? Do one, the other or both ? Just tell us if we can help in some way.

@wmeneses
Copy link

wmeneses commented Mar 3, 2024

I tried webdrive, and it is much better than other webdav type solutions, but it is too expensive to implement for many users, unfortunately nextcloud and s3 also has its problems, but at least I can keep the users happy for now, really if there is not a quick solution I will leave nextcloud only to be consulted via web and I will go back to google Drive.

@wmeneses
Copy link

wmeneses commented Mar 6, 2024

@mgallien If you need another example, SeaFile with its SeaDrive virtual file system is also working very well.

Hi, I did not know seafile, if I do not find a solution to nextcloud profitable, it can be a good alternative?

@github-project-automation github-project-automation bot moved this to 🧭 Planning evaluation (don't pick) in 🤖 🍏 Clients team Aug 7, 2024
@mgallien
Copy link
Collaborator

mgallien commented Aug 7, 2024

would need an implementation of CF_CALLBACK_TYPE_FETCH_PLACEHOLDERS callback type (see https://learn.microsoft.com/en-us/windows/win32/api/cfapi/ne-cfapi-cf_callback_type)
apparently something we need to implement to get feature parity with other cloud file providers

@mgallien
Copy link
Collaborator

mgallien commented Aug 7, 2024

that would be the ultimate solution to speed up initial sync especially for users with deep hierarchy

@tobiasKaminsky tobiasKaminsky moved this from 🧭 Planning evaluation (don't pick) to 🏗️ In progress in 🤖 🍏 Clients team Aug 9, 2024
@tobiasKaminsky tobiasKaminsky moved this from 🏗️ In progress to 📄 To do (max 2 entries / member) in 🤖 🍏 Clients team Aug 9, 2024
@Gwindalmir
Copy link

Gwindalmir commented Sep 27, 2024

A feature/improvement that would help in my case is if the client obeyed the ignore lists before descending further into the tree.

I have several external storage mounts for ease of access through the web interface when I'm outside the local network (or for sharing externally), but I don't need them in sync in my client. So I have them listed in the ignore list (.sync-exclude.lst).
However, the VFS doesn't parse those files before descending into child folders.

What it should do is when it requests a folder listing and encounters an ignore list, it should keep track of those entries, then only continue into non-ignored child folders.
Instead, it spends hours scanning tens of thousands of files it won't actually sync.

@PhilippSchlesinger
Copy link

@Gwindalmir
While the situation in your use case would definitely improve with the more general solution propesed in this issue, there are separate issues for VFS and traditional sync:

@github-project-automation github-project-automation bot moved this to 🧭 Planning evaluation (don't pick) in 💻 Desktop Clients team Oct 14, 2024
@Rello Rello moved this from 🧭 Planning evaluation (don't pick) to 📄 To do (max 2 entries / member) in 💻 Desktop Clients team Oct 14, 2024
@ne0YT
Copy link

ne0YT commented Nov 10, 2024

@mgallien hey there :) I just wanted to o check in if you have an idea about a fix? To smh figure out when it will be done.

@ne0YT
Copy link

ne0YT commented Nov 25, 2024

@Rello you removed this from the todo? so this will not happen any time soon?

@MaxXor
Copy link

MaxXor commented Nov 26, 2024

If true it's an unfortunate decision. Nextcloud desktop windows client is basically unusable for very large instances (> 1TB, >10m files). Initially sync takes several hours.

@Rello
Copy link
Contributor

Rello commented Nov 26, 2024

Hello,
we are reorganizing our efforts and are in the process to evaluate all existing requests. as the topic is not closed, its not off the list...

@ne0YT
Copy link

ne0YT commented Dec 20, 2024

@Rello so now it is off the list? so it will not be done any time soon?

@Rello
Copy link
Contributor

Rello commented Dec 21, 2024

@ne0YT the topic is not closed

@ne0YT
Copy link

ne0YT commented Dec 23, 2024

#4424 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed bug approved by the team enhancement enhancement of a already implemented feature/code feature: 💽 virtual filesystem Performance 🚀
Projects
None yet
Development

No branches or pull requests