Skip to content

Multi Project Syncing Technical Design

Chris Dopuch edited this page Jun 29, 2015 · 15 revisions

Sicksync 2.0 Multi-Project Syncing

Goal

The end goal of this feature request for Sicksync is to be able to sync multiple different projects to multiple destination locations. Users will be able to have a per-project .sicksync-config.json which dictates the config settings for that project. Sicksync will also support a global .sicksync-config.json that exists in the user's home directory which contains the master list of all sicksync projects for that user. sicksync --all or sicksync -a will kick off multiple sicksync processes, one for each active project. Sicksync will also support aliases for projects. Running just sicksync will look for a .sicksync-config.json in the current directory, and move up the directory structure until it finds one. Alternatively, you can use an alias to run a specific project. The mapping of alias to project location will be maintained in the global config.

Projects

Each individual sicksync process corresponds to one project on the user's client machine. Each process will watch for changes in one directory subtree. On detecting a change, sicksync will sync that file to each of the destination locations for that project. Projects will support a one-to-many mapping from the client machine to any number of remote machines.

Implementation

The changes to the functionality of sicksync will also entail changes to the way that it's configured. We intend to support two methods of configuration for sicksync 2.0. The first method is done in the style of jshint, where the program first looks in the working directory for a config file, and recurses up the file tree until it finds one. If no config file is found, or the user passes the -g (or maybe -a; global vs. all) flag, sicksync will use the global config file stored in the user's home directory.

A config file of the first style will continue to look like existing .sicksync-config.json files. This method is intended to give users a per-project configuration for sicksync. A per-project config file should live in its project's top level directory. It contains the same information that .sicksync-config.json currently does, but the format will be slightly different. Instead of storing a just destinationLocation and hostname, the file will allow for a list of destination objects. Each destination object contains a hostname and destinationLocation so that you can sync your project to multiple remote machines and directories.

The second type of config file will contain more information. This config file represents a global store of all of the given sicksync projects on the user's system. It will maintain a list of all the sicksync project locations. When the user starts sicksync in a mode that uses this config, the sicksync process will spawn a number of child processes which each use one of the config files stored for each project.

Protocol

Previously, all communication was done between one client and one remote sicksync process. This new design will necessitate the sicksync client being able to send data to multiple remote processes. The remote processes will also need to be able to listen for multiple connections from different running sicksync processes. This constitutes a many-to-many network connection.

The existing sicksync protocol involves starting the remote sicksync web socket server through an ssh connection. A child process creates an ssh connection to the remote machine and then starts sicksync-remote. The stdout of the child process is redirected to the stdout of the local sicksync process. This setup inherently enforces a one-to-one relationship of parent to child between the sicksync client and remote. This configuration will not be able to support a many-to-many network model.

Sicksync 2.0 will daemonize the remote sicksync process. The daemon will act as a web socket server that can receive connections from multiple client sicksync processes. When a user starts the sicksync client, it will ping each of the remote locations of the project and start the daemon if it isn't already running. Next, it will establish a connection to the web socket server daemon and begin sending and receiving messages. On disconnect, the server daemon will check if there are no other connected clients, and if so, will terminate itself (we don't want it to just run forever).

Implementation

The new design for communication between a remote server and multiple clients will necessitate large changes to the way that sicksync functions. Using an SSH connection to start the remote server will continue to be a viable option, but continuing to use it for communication with the server is not. In order to facilitate multiple open lines of communication with clients, the server will need to do all of its communication via the web sockets module.

When a client sicksync process starts up, it will attempt to connect to the remote server that it's pointed to. If it can connect, then it establishes two way communication with the server through web sockets and begins the normal process of watching and syncing files. If the server cannot be connected with web sockets, then the client will start an SSH connection with the server. The SSH connection will execute the remote start-daemon script, which in turn spawns the remote-daemon process (the server proper). Once the server is initialized and listening for connections, it will send a message to its parent process. The parent process on receipt of this message will propagate that message through the ssh connection back to the client indicating that the server is ready. Then the start-daemon process will terminate itself, orphaning the remote-daemon process (thus daemonizing it). This ends the ssh connection, and now the client is ready to connect to the server and begin the normal file watching and syncing procedure.

Connection Diagram

Open Questions

Q: How do I get a node process not to exit when it thinks that no further i/o can happen? I'm having problems overall with getting the whole ssh multi-process scheme of getting the remote server started.

Q: Should we keep using WS as the communication form? Are there other preferable options?

Future Outlook

Once sicksync 2.0 is released, there are additional features we would like to implement:

  • Allow for optional daemonized sicksync client
    • sicksync --daemon or a flag in the config
    • Outputs stdout/stderr to log files instead of the terminal
    • Frees up a terminal window for the user
    • Potentially lower bandwidth usage due to fewer overall connections (just one per remote machine, instead of one per remote machine per project)
    • (Stretch goal) OS level notifications
  • Turn off file watching on a big-sync
    • Will lower the computational intensity of file watching for large changes
    • Need to check if changes occur between start and finish of big-sync (use chokidar with polling)