2.1. Operating system¶
The Sync Appliance runs on x86-64 Linux systems. Version 1.19.1 has been tested on the following environments:
- Debian 8.3, 9
- Ubuntu 12.04 LTS, 14.04 LTS, 16.04 LTS, 18.04 LTS
- Centos 7
There are two basic considerations as far as hardware needs are concerned:
- sync throughput (sustained number of operations expected in the system)
- storage needs:
- data storage
- database storage
Only a modest amount of memory (around 200 KB) is required per connection, so a commodity server with a few GB of RAM can provide service to tens of thousands of users as far as memory is concerned.
The Sync Appliance can use external storage backends backed by Amazon S3 (or custom servers using the S3 protocol) with built-in deduplication, automatic compression and encryption.
As of 1.19.1 only single-node deployments are supported, so the scalability is limited by the resources available on a single host. Future versions will include distribution capabilities for further scalability.
Even on a single node, the Sync Appliance can easily give service to hundreds of active users.
The scalability of single-node deployments is ultimately limited by the sustainable throughput, and storage available for the database and the data storage backend. The following subsections expand on each of these points.
2.2.1. Processing speed and database¶
A modest server with 2GB RAM and a consumer-grade 2-TB HDD can give service to hundreds of users using remote storage mounted locally.
The Sync Appliance can sustain large number of operations even on modest hardware; e.g., a $20/month 2GB, 2-core DigitalOcean VM can sync several million files a day. This easily satisfies the stationary load imposed by thousands of active users.
As any database-supported service, the Sync Appliance benefits from fast storage (such as SSD disks). The internal database, however, has been designed to yield adequate performance (several million operations/day) even on consumer-grade hard-disk drives, by aggregating operations and trading latency (up to a few hundred milliseconds) for throughput, and by avoiding fragmentation in the working set and exploiting the fact that most sync operations are correlated (same project/directory, etc.). Thanks to these techniques, the database can yield adequate throughput even when the database size is much larger than the available RAM.
Each sync operation takes up space in the database to:
- represent the latest file/project state
- record the details of the operation, project history, etc.
- represent global per-project snapshots allowing point-in-time recovery
In deployments with tens of millions of files, the aggregate cost of each operation has been found to lie between 1 and 3 KB overall. This would impose a limit between 2 billion and >600 million operations on a host with a consumer-grade 2TB HDD.
When a project is purged by a user, the associated space is freed from the database.
A large fraction of the space used in the DB is needed to represent project snapshots. Future versions will feature data expiration policies to release the resources held by snapshots no longer needed.
2.2.2. Storage backends¶
The Sync Appliance uses by default a local storage backend limited by the capacity of local disk or mounted remote storage (NFS, distributed system, etc.).
The Sync Appliance can use external storage backends backed by Amazon S3 (or custom servers using the S3 protocol) with built-in deduplication, automatic compression and encryption. Additional backends can be configured by the admin user via the webapp.
The Sync Appliance makes three assumptions about the storage backends:
- read-your-writes consistency
- atomic renames (standard POSIX semantics)
Conventional remote storage like e.g. NFS mounts over mirrored disks satisfy these easily. Note in particular that the storage backend needs not support file locks or other problematic features.