Unisync
Documentation Download GitHub

How Unisync Works

Connection over SSH

When you run unisync locally, it will read the config file and start an SSH connection to the remote side-- either by running the ssh command or by using its built-in SSH client (depending on the method setting.) Once connected, it will look for the remote instance of unisync in the places specified by remote_unisync_path setting, and then run it. It runs unisync over ssh with the -stdserver command line flag, which starts it in server mode, ready to receive instructions from the client.

Connection over DirectTLS

Alternatively, unisync can listen directly on a port if you run it like this:

unisync -server :12345

A client with the proper config settings can then connect to this server directly. The connection is TLS encrypted, and the client and server will both authenticate each other using the TLS certificate. See the Unisync Without SSH page for more details on how to configure this.

Sync on Startup, then Watch for Changes

Once the connection is established, it will run through a sync to make sure nothing changed since the last time it was run. Then it will start watching for changes on both sides. When it detects any changes, it will run the full sync operation again. By default, it uses the operating system's directory-watching API to detect changes. This works great on Windows, Mac, and Linux, and potentially less great on the BSDs (which rely on the older kqueue API). Also, some Linux distros have kernel settings misconfigured in a way that may break directory-watching on directories with many thousands of files. For situations like that, the watch_local and watch_remote settings can be used to switch to polling (which is monitoring by repeatedly listing the directory contents).

The Sync Operation

Roughly speaking, a sync operation goes like this:

  1. Generate a list of files and directories on both sides. The list will include each name, size, and last-modified date.

  2. Compare the lists to identify differences. If a file/directory exists on one side and not the other, that's obviously a difference. If a file exists on both sides, but the size or last-modified are different, then the files are considered different.

    Side note: If a file/directory is not different, but its mode differs in the bits that the chmod_mask and chmod_mask_dir settings are tracking, sync over the mode changes.

  3. For each difference, we need to know which side "wins." If the cache file exists (and it will on every sync except the first), read it-- Whichever side departs from the cache is the winner.

  4. If the cache file doesn't exist (usually because this is the first time we've synced these particular directories), or both sides depart from the cache, then revert to simple syncing logic. In this logic, an existing file/directory always wins over a non-existing one (so no file/directory will ever be deleted), and the newest file (by date-modified) wins.

  5. Now that we've determined which side wins for a given file/directory, copy the winner to overwrite the loser. This includes copying over its file mode, to the extent that the chmod_mask and chmod_mask_dir settings allow.

  6. Repeat steps 3-5 for each difference that we've identified.

  7. Run through the whole process again, just to make sure no more changes happened while we were busy syncing.

  8. After we've run through a syncing process without finding any differences, write the cache file. The cache file is a local file list, the one we generated in step 1, and we will write it to ~/.unisync/[name].cache.

The Cache File

The first time unisync syncs between a particular local and remote point, it has no history about the state of things on either end-- all it has is the local and remote file list with file sizes and date-modified times. Therefore, the syncing logic is simple: if a file only exists on one side, then transfer it to the other. If a file exists on both sides, they will be considered "identical" if they have the same size and date-modified time. Otherwise, one of them will replace the other. We have 4 options for which to prefer: the local side, or the remote side, or the newer file (by date-modified) or the older one. The prefer setting determines which of these is used, and it will prefer the newer file by default.

The initial sync logic alone is not enough for proper syncing. On its own, it could break in several ways:

  1. What if a file is deleted on the local side? Presumably we want the deletion to propagate to the remote side, but the above logic would simply put the file back.
  2. What if one or both clocks are wrong? Let's say you're editing files locally, and your clock is set one hour back (or your time zone is not set correctly). Every time you edit a file, unisync would determine that the remote side is newer and undo your changes.

The naive solution to this problem would be to just continuously watch the filesystems on both the local and remote side, so we see any changes as they happen and therefore know which events need to be propagated. However, this is not reliable: for one thing, someone might make changes while unisync is not running. We would need to correctly sync those changes after it's restarted. Also, the operating system APIs we have for monitoring directories can be unreliable on some operating systems. They do mostly work, but we need a better strategy than that for reliable syncing.

To prevent problems like these and make syncing "just work" from the user's perspective, we keep a sync cache on the local side. It is generated on the initial sync, and updated after every successful sync. It is stored in a file called ~/.unisync/[name].cache by default, where [name] is your sync config file.

The cache file remembers the name, size, and date-modified of every file and directory after the last successful sync. When changes happen since the last sync (even if unisync was not running when the change occurred), the cache is used to determine which side is newer and propagate that change to the older side. For example, let's say we delete a file locally. Unisync would see that the remote side agrees with the cache and the local side does not, therefore local is newer and the deletion should propagate to remote.

Of course, it's possible to change a file both locally and remotely while unisync is not running, and then start it. It would realize that both sides have departed from what the cache says, and so the initial sync algorithm would decide which side wins.