Distributed file system. Meant for educational & training purposes.
Download the source
git clone [email protected]:RobinUS2/xyzfs.git
cd xyzfs
Build in the correct folder
./build.sh
In order to run the rests (with race build on)
./test.sh
In order to run xyzFS in Docker, use the following
cd docker
./build_container.sh
./run_container.sh
- no master
- no single point of failure (SPOF)
- shared nothing
- built-in replication
- reed solomon error correction (erasure coding)
- able to deal with lots of small files
- able to deal with temporary / short-lived files
- cluster: set of nodes that run xyzFS and behave as one virtual large distributed filesystem
- node: (virtual) machine that runs one instance of the xyzFS binary
- volume: location on disk of a node where data is stored
- block: chunk of data that is stored in at least one volume which is replicated
- shard: part of a block which can be either data or parity for erasure coding
- file: representation of a file like in a typical file system
- HTTP 8080: REST API for CRUD operations on the file system
- TCP 3322: Binary gossip between nodes
- TCP 3323: Binary transport between nodes (reliable)
- UDP 3324: Binary transport between nodes
- implement replication (index over tcp to primary replicas, index over udp to all nodes, contents over tcp to replicaes)
- implement REST PUT
- implement REST GET
- implement thombstones to support deletes
- implement REST DELETE
- recover lost shards from parity paritions
- compression (disk, transport, in-memory)
- temporary shards (not persisted to disk, very fast writes/reads)
- writable shards (1-n), where new data is written to
- implement fastest node detection (Expected Latency Selector (ELS) of Spotify)
- versioning of ring (file name hash => node ) translation layer information, that supports adding nodes without increasing latency