Integration | Collaboration with https://github.com/abraunegg/onedrive #105

abraunegg · 2021-01-27T19:44:40Z

@jstaf
Great tool you have created.

Would love to collaborate here & potentially work out a way to utilise your tool to solve this open issue: abraunegg/onedrive#757 - or get your help in building / creating a similar overlay for the tool which I now maintain.

If you could let me know what your thoughts are that would be greatly appreciated.

jstaf · 2021-01-28T03:33:00Z

@abraunegg I admittedly am not familiar enough with your project's codebase to suggest an immediate solution - however, I can probably help by explaining how onedriver implemented the on-demand downloads.

The important thing to note is that onedriver doesn't actually sync the files, it's a filesystem that actually acts as a "middleman" that handles I/O syscalls on behalf of the kernel when they try to read a directory or open a file. So when a program tries to read a directory or open a file, it makes a syscall to the kernel and then the kernel asks onedriver what's in the directory or what's in the file. This gives onedriver the ability to download files and fetch metadata on demand: every I/O operation on OneDrive files go through onedriver. So to implement a similar on-demand functionality you would need to intercept syscalls in a similar manner.

Here's a sample code path (this is for listing the files of a directory, but reading/writing files is similar):

A program (say "ls") tries to read a directory (specifically, it makes the readdir syscall to the kernel)
The kernel calls the Readdir entrypoint here: https://github.com/jstaf/onedriver/blob/master/fs/inode.go#L217
The Readdir() function fetches all of the files from a thread-safe map and downloads missing entries if we don't have them yet: https://github.com/jstaf/onedriver/blob/master/fs/cache.go#L260 (there's similar handling for files, but they get fetched from a k/v database on disk).
Information returned by onedriver's Readdir is forwarded by the kernel to whatever program originally tried to read the directory.

abraunegg · 2021-01-29T18:42:21Z

@jstaf
Thanks - will have to look into this a bit deeper. In terms of caching all the online items, using a /delta call and storing the tree of items / objects is what 'onedrive' does at the moment. This is refreshed every 5 mins, however you can call /delta on a specific path - so ls could make a query to /delta + path & provide the details as well.

Handling the download is going to be a tricky one. My initial attempts there is an issue in pausing the application request long enough for a file to download. OneDrive itself does not really support HTTP/2, so all calls need to HTTP/1.1, and 4xx, 5xx error handling / retry needs to be really tight + utilising QuickXOR / hash of the file + size to determine if it has been correctly downloaded.

How are uploads to OneDrive being handled? For Business / SharePoint accounts there are many quirks with file types where OneDrive modifies the file post upload by removing it (if you query the original endpoint you get a 404) and they add metadata, thus, file is now out of sync with the file on the local disk.

jstaf · 2021-02-03T00:42:00Z

File downloads work the same way as readdir(), in this case onedriver intercepts the open() syscall and performs the download if the checksum of a local copy does not match what's on the server: https://github.com/jstaf/onedriver/blob/master/fs/inode.go#L786

Uploads are where things get kinda weird (uploads are independent of fs ops to keep things fast so no one has to wait for things to upload every time they save):

When a user creates a file, it uploads an empty file to obtain a OneDrive ID
Read/writes happen against the local file
When flush() or fsync() get called, it creates an "upload session" with the contents to be uploaded and inserts this into a queue
An "upload manager" periodically reads from the queue and uploads any pending upload sessions and retries these if they fail: https://github.com/jstaf/onedriver/blob/master/fs/upload_manager.go. Uploads use the OneDrive ID created earlier so we don't lose the file if the user renames the file right after they save it. (All items are getting tracked by ID instead of path for this reason.)

jstaf · 2021-05-16T05:15:29Z

Going to close this one because there's no specific tasks that need to be completed in this issue.

jstaf closed this as completed May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration | Collaboration with https://github.com/abraunegg/onedrive #105

Integration | Collaboration with https://github.com/abraunegg/onedrive #105

abraunegg commented Jan 27, 2021

jstaf commented Jan 28, 2021

abraunegg commented Jan 29, 2021

jstaf commented Feb 3, 2021

jstaf commented May 16, 2021

Integration | Collaboration with https://github.com/abraunegg/onedrive #105

Integration | Collaboration with https://github.com/abraunegg/onedrive #105

Comments

abraunegg commented Jan 27, 2021

jstaf commented Jan 28, 2021

abraunegg commented Jan 29, 2021

jstaf commented Feb 3, 2021

jstaf commented May 16, 2021