Skip to content

Commit

Permalink
docs: add Scaling the cluster to clustering article (#170)
Browse files Browse the repository at this point in the history
* docs: add Scaling the cluster to clustering article

Signed-off-by: mbshields <[email protected]>

* docs: add Scaling the cluster to clustering article - spellcheck

Signed-off-by: mbshields <[email protected]>

* docs: add scale-out info to clustering article

Signed-off-by: mbshields <[email protected]>

* docs: add words to wordlist

Signed-off-by: mbshields <[email protected]>

* docs: add info to scaleout article

Signed-off-by: mbshields <[email protected]>

* docs: add scaleout article to index

Signed-off-by: mbshields <[email protected]>

* docs: revised for comments

Signed-off-by: mbshields <[email protected]>

* docs: spellcheck

Signed-off-by: mbshields <[email protected]>

* docs: spellcheck again

Signed-off-by: mbshields <[email protected]>

* docs: describe DNS-based routing for load balancing

Signed-off-by: mbshields <[email protected]>

* docs: address latest review comments

Signed-off-by: mbshields <[email protected]>

* docs: add Registry sync

Signed-off-by: mbshields <[email protected]>

---------

Signed-off-by: mbshields <[email protected]>
  • Loading branch information
mbshields authored May 31, 2024
1 parent f175692 commit 242d0f4
Show file tree
Hide file tree
Showing 4 changed files with 217 additions and 20 deletions.
9 changes: 9 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ api
apikey
APIKeyPayload
APIs
architected
artifacthub
artifactType
ASLR
Expand All @@ -27,6 +28,7 @@ BoltDB
boolean
Bugfixes
busybox
caching
CD
certDir
checksum
Expand Down Expand Up @@ -71,6 +73,7 @@ dex
discoverable
DistContentDigestKey
DN
DNS
Dockerfile
dropdown
dryRun
Expand Down Expand Up @@ -104,7 +107,9 @@ graphQL
graphql
gui
haproxy
HAProxy
hostname
hostnames
href
html
htpasswd
Expand Down Expand Up @@ -199,6 +204,7 @@ podman
pollInterval
pprof
PR
preimage
prometheus
PRs
pulledWithin
Expand Down Expand Up @@ -227,11 +233,14 @@ Satisfiable
satisfiable
SBOM
SBOMs
scalable
SDK
secretkey
semver
serviceAccount
SHA
sharding
SipHash
skipverify
skopeo
SLI
Expand Down
55 changes: 35 additions & 20 deletions docs/articles/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,20 @@
> :point_right: High availability of the zot registry is supported by the following features:
>
> - Stateless zot instances to simplify scale out
> - Shared remote storage
> - Bare-metal and Kubernetes deployments

To ensure high-availability of the registry, zot supports a clustering
scheme with stateless zot instances/replicas fronted by a loadbalancer
To ensure high availability of the registry, zot supports a clustering
scheme with stateless zot instances/replicas fronted by a load balancer
and a shared remote backend storage. This scheme allows the registry
service to remain available even if a few replicas fail or become
unavailable. Loadbalancing across many zot replicas can also increase
unavailable. Load balancing across many zot replicas can also increase
aggregate network throughput.

![504569](../assets/images/504569.jpg){width="400"}
![504569](../assets/images/504569.jpg){width="500"}

> :pencil2: Beginning with zot release v2.1.0, you can design a highly scalable cluster that does not require configuring the load balancer to direct repository queries to specific zot instances within the cluster. See [Scale-out clustering](scaleout.md). Scale-out clustering is the preferred method if you are running v2.1.0 or later.
Clustering is supported in both bare-metal and Kubernetes environments.
> :pencil2:
Expand All @@ -24,28 +27,28 @@ Clustering is supported in both bare-metal and Kubernetes environments.

### Prerequisites

- A highly-available loadbalancer such as `HAProxy` configured to direct traffic to zot replicas.
- A highly-available load balancer such as HAProxy configured to direct traffic to zot replicas

- Multiple zot replicas as `systemd` services hosted on multiple hosts or VMs.
- Multiple zot replicas as `systemd` services hosted on multiple hosts or VMs

- AWS S3 API-compatible remote backend storage.
- AWS S3 API-compatible remote backend storage

## Kubernetes deployment

### Prerequisites

- A zot Kubernetes
[Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
with required number of replicas.
with required number of replicas

- AWS S3 API-compatible remote backend storage.

- A zot Kubernetes
[Service](https://kubernetes.io/docs/concepts/services-networking/service/).
[Service](https://kubernetes.io/docs/concepts/services-networking/service/)

- A zot Kubernetes [Ingress
Gateway](https://kubernetes.io/docs/concepts/services-networking/ingress/)
if the service needs to be exposed outside.
if the service needs to be exposed outside

## Implementing stateless zot

Expand All @@ -56,25 +59,25 @@ zot maintains two types of durable state:
- the image metadata in the registry’s cache

In a stateless clustering scheme, the image data is stored in the remote
storage backend and the registry cache is disabled by turning off both
deduplication and garbage collection.
storage backend and the registry cache is disabled by turning off
deduplication.

## Ecosystem tools

The [OCI Distribution
Specification](https://github.com/opencontainers/distribution-spec)
imposes certain rules about the HTTP URI paths to which various
ecosystem tools must conform. Consider these rules when setting the HTTP
prefixes during loadbalancing and ingress gateway configuration.
prefixes during load balancing and ingress gateway configuration.

## Examples

zot supports clustering by using multiple stateless zot replicas with shared S3 storage and an `HAProxy` (with sticky session) load-balancing traffic to the replicas.
Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. Each replica is responsible for one or more repositories.

### YAML configuration
### HAProxy configuration

<details>
<summary markdown="span">Click here to view a sample haproxy configuration.</summary>
<summary markdown="span">Click here to view a sample HAProxy configuration.</summary>

```yaml

Expand Down Expand Up @@ -117,15 +120,27 @@ defaults
frontend zot
bind *:8080
mode http
use_backend zot-instance1 if { path_beg /v2/repo1/ }
use_backend zot-instance2 if { path_beg /v2/repo2/ }
use_backend zot-instance3 if { path_beg /v2/repo3/ }
default_backend zot-cluster

backend zot-cluster
mode http
balance roundrobin
server zot1 127.0.0.1:8081 check
server zot2 127.0.0.1:8082 check
server zot3 127.0.0.1:8083 check
cookie SERVER insert indirect nocache
server zot-server1 127.0.0.1:9000 check cookie zot-server1
server zot-server2 127.0.0.2:9000 check cookie zot-server2
server zot-server3 127.0.0.3:9000 check cookie zot-server3

backend zot-instance1
server zot-server1 127.0.0.1:9000 check maxconn 30

backend zot-instance2
server zot-server2 127.0.0.2:9000 check maxconn 30

backend zot-instance3
server zot-server3 127.0.0.3:9000 check maxconn 30
```

</details>
Expand All @@ -141,7 +156,7 @@ backend zot-cluster
"distSpecVersion": "1.0.1-dev",
"storage": {
"rootDirectory": "/tmp/zot",
"dedupe": true,
"dedupe": false,
"storageDriver": {
"name": "s3",
"rootdirectory": "/zot",
Expand Down
172 changes: 172 additions & 0 deletions docs/articles/scaleout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Scale-out clustering

> :point_right: A cluster of zot instances can be easily scaled with no repo-specific intelligence in the load balancing scheme, using:
>
> - Stateless zot instances to simplify scale out
> - Shared remote storage
> - zot release v2.1.0 or later
Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot instances run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. A highly scalable cluster can be architected by automatically sharding based on repository name so that each zot instance is responsible for a subset of repositories.

In a cloud deployment, the shared backend storage (such as AWS S3) and metadata storage (such as DynamoDB) can also be easily scaled along with the zot instances.

> :pencil2: For high availability clustering with earlier zot releases, see [zot Clustering](clustering.md).
## Prerequisites

For easy scaling of instances (replicas), the following conditions must be met:

- All zot replicas must be running zot release v2.1.0 (or later) with identical configurations.
- All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas.
- Each zot replica in the cluster has its own IP address, but all replicas use the same port number.


## How it works

Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. A repo in storage can be written to only by the zot replica responsible for that repo.

When a zot replica in the cluster receives an image push or pull request for a repo, the receiving replica hashes the repo path and consults a hash table to determine which replica is responsible for the repo.

- If the hash indicates that another replica is responsible, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the response to the requestor.
- If the hash indicates that the current (receiving) replica is responsible, the request is handled locally.


> :pencil2: For better resistance to collisions and preimage attacks, zot uses SipHash as the hashing algorithm.
Either of the following two schemes can be used to reach the cluster.

### Using a single entry point load balancer

![504569](../assets/images/504569.jpg){width="500"}

When a single entry point load balancer such as [HAProxy](https://www.haproxy.com/) is deployed, the number of zot replicas can easily be expanded by simply adding the IP addresses of the new replicas in the load balancer configuration.

When the load balancer receives an image push or pull request for a repo, it forwards the request to any replica in the cluster. No repo-specific programming of the load balancer is needed because the load balancer does not need to know which replica owns which repo. The replicas themselves can determine this.

### Using DNS-based load balancing

Because the scale-out architecture greatly simplifies the role of the load balancer, it may be possible to eliminate the load balancer entirely. A scheme such as [DNS-based routing](https://coredns.io/plugins/loadbalance/) can be implemented, exposing the zot replicas directly to the clients.

## Configuration examples

In these examples, clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancer forwarding traffic to the replicas.

### Cluster member configuration

In the replica configuration, each replica must have a list of its peers configured in the "members" section of the JSON structure. This is a list of reachable addresses or hostnames. Each replica owns one of these addresses.

The replica must also have a hash key for hashing the repo path of the image request and a TLS certificate for authenticating with its peers.

<details>
<summary markdown="span">Click here to view a sample cluster configuration for each replica. See the "cluster" section in the JSON structure.</summary>

```json
{
"distSpecVersion": "1.1.0",
"storage": {
"rootDirectory": "/tmp/zot",
"dedupe": false,
"remoteCache": true,
"storageDriver": {
"name": "s3",
"rootdirectory": "/zot",
"region": "us-east-1",
"regionendpoint": "localhost:4566",
"bucket": "zot-storage",
"secure": false,
"skipverify": false
},
"cacheDriver": {
"name": "dynamodb",
"endpoint": "http://localhost:4566",
"region": "us-east-1",
"cacheTablename": "ZotBlobTable",
"repoMetaTablename": "ZotRepoMetadataTable",
"imageMetaTablename": "ZotImageMetaTable",
"repoBlobsInfoTablename": "ZotRepoBlobsInfoTable",
"userDataTablename": "ZotUserDataTable",
"versionTablename": "ZotVersion",
"apiKeyTablename": "ZotApiKeyTable"
}
},
"http": {
"address": "0.0.0.0",
"port": "9000",
"tls": {
"cert": "test/data/server.cert",
"key": "test/data/server.key"
}
},
"log": {
"level": "debug"
},
"cluster": {
"members": [
"zot-server1:9000",
"zot-server2:9000",
"zot-server3:9000"
],
"hashKey": "loremipsumdolors",
"tls": {
"cacert": "test/data/ca.crt"
}
}
}
```

</details>

### HAProxy configuration

The HAProxy load balancer uses a simple round-robin balancing scheme and delivers a cookie to the requestor to maintain a sticky session connection to the assigned replica.

<details>
<summary markdown="span">Click here to view a sample HAProxy configuration.</summary>

```yaml

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
maxconn 2000
stats timeout 30s

defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000

frontend zot
bind *:8080
default_backend zot-cluster

backend zot-cluster
mode http
balance roundrobin
cookie SERVER insert indirect nocache
server zot-server1 127.0.0.1:9000 check cookie zot-server1
server zot-server2 127.0.0.2:9000 check cookie zot-server2
server zot-server3 127.0.0.3:9000 check cookie zot-server3

```

</details>

## When a replica fails

The scale-out clustering scheme described in this article is not self-healing when a replica fails. In case of a replica failure, only those repositories that are mapped to the failed replica are affected. If the error is not transient, the cluster must be resized and restarted to exclude that replica.

> :bulb: With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster.
## CVE repository in a zot cluster environment

CVE scanning is not supported for cloud deployments. In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine.

## Registry sync

The [sync](../articles/mirroring.md) feature of zot, either on demand or periodic, is compatible with scale-out clustering. In this case, the repo names are hashed to a particular replica and only that replica will perform the sync.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ nav:
- Retention Policies: articles/retention.md
- Mirroring: articles/mirroring.md
- Clustering: articles/clustering.md
- Scale-out clustering: articles/scaleout.md
- Monitoring: articles/monitoring.md
- Using GraphQL for Enhanced Searches: articles/graphql.md
- Benchmarking with zb: articles/benchmarking-with-zb.md
Expand Down

0 comments on commit 242d0f4

Please sign in to comment.