From 5388ac76439a995190f172e6e5567e884ecffb5a Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 15 Apr 2024 11:31:27 -0700 Subject: [PATCH 01/12] docs: add Scaling the cluster to clustering article Signed-off-by: mbshields --- docs/articles/clustering.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 55945329..36ece28d 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -13,7 +13,8 @@ service to remain available even if a few replicas fail or become unavailable. Loadbalancing across many zot replicas can also increase aggregate network throughput. -![504569](../assets/images/504569.jpg){width="400"} +![504569](../assets/images/504569.jpg){width="500"}
Figure 1: a zot cluster with load balancing
+ Clustering is supported in both bare-metal and Kubernetes environments. > :pencil2: @@ -168,3 +169,17 @@ backend zot-cluster ``` + +## Scaling the cluster + +An existing zot cluster (see [Figure 1](#figure1)) can easily be expanded with no programming of the load balancer other than adding the IP addresses of the new zot servers. + +For easy scaling, the following conditions must be met: + +- All zot servers in the cluster use remote storage at a single shared S3 backend. There is no local cacheing in the zot servers. +- Each repo is served by one zot server, and that server is solely responsible for serving all images of that repo. +- A repo in storage can be written to only by the zot server associated with that repo. +- The URI format sent to the load balancer must be /v2//: + +The load balancer does not need to know which zot server serves which repo. When the load balancer receives an image request, it sends the request to any zot server in the cluster. The receiving server hashes the repo path and consults a hash table in storage to determine which server is responsible for the repo. The receiving server then proxies for the responsible server, obtaining the requested image and returning it to the requestor. + From 4584b33e643496c94b3f7c3e566f99df6d5ec166 Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 15 Apr 2024 11:36:53 -0700 Subject: [PATCH 02/12] docs: add Scaling the cluster to clustering article - spellcheck Signed-off-by: mbshields --- .wordlist.txt | 1 + docs/articles/clustering.md | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.wordlist.txt b/.wordlist.txt index 8def3581..e8b56ab5 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -27,6 +27,7 @@ BoltDB boolean Bugfixes busybox +caching CD certDir checksum diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 36ece28d..73ad728f 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -176,7 +176,7 @@ An existing zot cluster (see [Figure 1](#figure1)) can easily be expanded with n For easy scaling, the following conditions must be met: -- All zot servers in the cluster use remote storage at a single shared S3 backend. There is no local cacheing in the zot servers. +- All zot servers in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot servers. - Each repo is served by one zot server, and that server is solely responsible for serving all images of that repo. - A repo in storage can be written to only by the zot server associated with that repo. - The URI format sent to the load balancer must be /v2//: From 18be84ab1f09709769cdba947fb9f551777802b5 Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 6 May 2024 10:56:51 -0700 Subject: [PATCH 03/12] docs: add scale-out info to clustering article Signed-off-by: mbshields --- docs/articles/clustering.md | 66 ++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 19 deletions(-) diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 73ad728f..60d33511 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -3,19 +3,19 @@ > :point_right: High availability of the zot registry is supported by the following features: > > - Stateless zot instances to simplify scale out +> - Shared remote storage > - Bare-metal and Kubernetes deployments -To ensure high-availability of the registry, zot supports a clustering -scheme with stateless zot instances/replicas fronted by a loadbalancer +To ensure high availability of the registry, zot supports a clustering +scheme with stateless zot instances/replicas fronted by a load balancer and a shared remote backend storage. This scheme allows the registry service to remain available even if a few replicas fail or become -unavailable. Loadbalancing across many zot replicas can also increase +unavailable. Load balancing across many zot replicas can also increase aggregate network throughput. ![504569](../assets/images/504569.jpg){width="500"}
Figure 1: a zot cluster with load balancing
- Clustering is supported in both bare-metal and Kubernetes environments. > :pencil2: > The remote backend storage must be [S3 API-compatible](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html). @@ -25,7 +25,7 @@ Clustering is supported in both bare-metal and Kubernetes environments. ### Prerequisites -- A highly-available loadbalancer such as `HAProxy` configured to direct traffic to zot replicas. +- A highly-available load balancer such as HAProxy configured to direct traffic to zot replicas. - Multiple zot replicas as `systemd` services hosted on multiple hosts or VMs. @@ -66,16 +66,16 @@ The [OCI Distribution Specification](https://github.com/opencontainers/distribution-spec) imposes certain rules about the HTTP URI paths to which various ecosystem tools must conform. Consider these rules when setting the HTTP -prefixes during loadbalancing and ingress gateway configuration. +prefixes during load balancing and ingress gateway configuration. ## Examples -zot supports clustering by using multiple stateless zot replicas with shared S3 storage and an `HAProxy` (with sticky session) load-balancing traffic to the replicas. +Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. -### YAML configuration +### HAProxy YAML configuration
- Click here to view a sample haproxy configuration. + Click here to view a sample HAProxy configuration. ```yaml @@ -123,9 +123,10 @@ frontend zot backend zot-cluster mode http balance roundrobin - server zot1 127.0.0.1:8081 check - server zot2 127.0.0.1:8082 check - server zot3 127.0.0.1:8083 check + cookie SERVER insert indirect nocache + server zot0 127.0.0.1:9000 check cookie zot0 + server zot1 127.0.0.2:9000 check cookie zot1 + server zot2 127.0.0.3:9000 check cookie zot2 ``` @@ -170,16 +171,43 @@ backend zot-cluster ```
-## Scaling the cluster -An existing zot cluster (see [Figure 1](#figure1)) can easily be expanded with no programming of the load balancer other than adding the IP addresses of the new zot servers. +## Easy scaling of the cluster + +You can design a cluster (see [Figure 1](#figure1)) in which the number of replicas can easily be expanded (or reduced) with no programming of the load balancer other than adding the IP addresses of the new replicas. The shared storage can also be easily increased or decreased. + +### Prerequisites -For easy scaling, the following conditions must be met: +For easy scaling of replicas, the following conditions must be met: -- All zot servers in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot servers. -- Each repo is served by one zot server, and that server is solely responsible for serving all images of that repo. -- A repo in storage can be written to only by the zot server associated with that repo. +- All zot replicas must be running zot release v2.0.4 (or later) with identical configurations. +- All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. +- Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. +- A repo in storage can be written to only by the zot replica associated with that repo. +- Each zot replica in the cluster has its own IP address, but all replicas use the port number. - The URI format sent to the load balancer must be /v2//: -The load balancer does not need to know which zot server serves which repo. When the load balancer receives an image request, it sends the request to any zot server in the cluster. The receiving server hashes the repo path and consults a hash table in storage to determine which server is responsible for the repo. The receiving server then proxies for the responsible server, obtaining the requested image and returning it to the requestor. +### How it works + +A highly available and scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. + +When the load balancer receives an image request, it sends the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. + +### Cluster member configuration + +Each replica must have a list of its peers and must have a hash key for hashing the repo path of the image request. The following is an example of the cluster configuration in each replica: + +```json + "cluster": { + "members": [ + "127.0.0.1:9000", + "127.0.0.2:9000", + "127.0.0.3:9000" + ], + "hashKey": "loremipsumdolors" + } + +``` +## CVE repository in a zot cluster environment +In the clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and Trivy as the detection engine. \ No newline at end of file From 816e085da94e6ab2858c95c0fec5a12491c7b118 Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 6 May 2024 11:12:57 -0700 Subject: [PATCH 04/12] docs: add words to wordlist Signed-off-by: mbshields --- .wordlist.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.wordlist.txt b/.wordlist.txt index e8b56ab5..79dc57fd 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -9,6 +9,7 @@ api apikey APIKeyPayload APIs +architected artifacthub artifactType ASLR @@ -105,6 +106,7 @@ graphQL graphql gui haproxy +HAProxy hostname href html @@ -228,11 +230,13 @@ Satisfiable satisfiable SBOM SBOMs +scalable SDK secretkey semver serviceAccount SHA +sharding skipverify skopeo SLI From 77aef929363e5e0d1203a1db41d9ea9817e3133e Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 13 May 2024 10:32:31 -0700 Subject: [PATCH 05/12] docs: add info to scaleout article Signed-off-by: mbshields --- docs/articles/clustering.md | 57 +++---------- docs/articles/scaleout.md | 154 ++++++++++++++++++++++++++++++++++++ 2 files changed, 163 insertions(+), 48 deletions(-) create mode 100644 docs/articles/scaleout.md diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 60d33511..98631dfb 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -14,7 +14,9 @@ service to remain available even if a few replicas fail or become unavailable. Load balancing across many zot replicas can also increase aggregate network throughput. -![504569](../assets/images/504569.jpg){width="500"}
Figure 1: a zot cluster with load balancing
+![504569](../assets/images/504569.jpg){width="500"} + +> :pencil2: Beginning with zot release v2.1.0, you can design a highly scalable cluster that does not require configuring the load balancer to direct repository queries to specific zot instances within the cluster. See [Easy scaling of a zot cluster](scaleout.md). This is the preferred method if you are running v2.1.0 or later. Clustering is supported in both bare-metal and Kubernetes environments. > :pencil2: @@ -25,11 +27,11 @@ Clustering is supported in both bare-metal and Kubernetes environments. ### Prerequisites -- A highly-available load balancer such as HAProxy configured to direct traffic to zot replicas. +- A highly-available load balancer such as HAProxy configured to direct traffic to zot replicas -- Multiple zot replicas as `systemd` services hosted on multiple hosts or VMs. +- Multiple zot replicas as `systemd` services hosted on multiple hosts or VMs -- AWS S3 API-compatible remote backend storage. +- AWS S3 API-compatible remote backend storage ## Kubernetes deployment @@ -37,16 +39,16 @@ Clustering is supported in both bare-metal and Kubernetes environments. - A zot Kubernetes [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) - with required number of replicas. + with required number of replicas - AWS S3 API-compatible remote backend storage. - A zot Kubernetes - [Service](https://kubernetes.io/docs/concepts/services-networking/service/). + [Service](https://kubernetes.io/docs/concepts/services-networking/service/) - A zot Kubernetes [Ingress Gateway](https://kubernetes.io/docs/concepts/services-networking/ingress/) - if the service needs to be exposed outside. + if the service needs to be exposed outside ## Implementing stateless zot @@ -170,44 +172,3 @@ backend zot-cluster ``` - - -## Easy scaling of the cluster - -You can design a cluster (see [Figure 1](#figure1)) in which the number of replicas can easily be expanded (or reduced) with no programming of the load balancer other than adding the IP addresses of the new replicas. The shared storage can also be easily increased or decreased. - -### Prerequisites - -For easy scaling of replicas, the following conditions must be met: - -- All zot replicas must be running zot release v2.0.4 (or later) with identical configurations. -- All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. -- Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. -- A repo in storage can be written to only by the zot replica associated with that repo. -- Each zot replica in the cluster has its own IP address, but all replicas use the port number. -- The URI format sent to the load balancer must be /v2//: - -### How it works - -A highly available and scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. - -When the load balancer receives an image request, it sends the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. - -### Cluster member configuration - -Each replica must have a list of its peers and must have a hash key for hashing the repo path of the image request. The following is an example of the cluster configuration in each replica: - -```json - "cluster": { - "members": [ - "127.0.0.1:9000", - "127.0.0.2:9000", - "127.0.0.3:9000" - ], - "hashKey": "loremipsumdolors" - } - -``` -## CVE repository in a zot cluster environment - -In the clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and Trivy as the detection engine. \ No newline at end of file diff --git a/docs/articles/scaleout.md b/docs/articles/scaleout.md new file mode 100644 index 00000000..9102ef30 --- /dev/null +++ b/docs/articles/scaleout.md @@ -0,0 +1,154 @@ +# Easy scaling of a zot cluster + +> :point_right: A cluster of zot replicas can easily be scaled with no repo-specific programming of the load balancer using: +> +> - Stateless zot instances to simplify scale out +> - Shared remote storage +> - zot release v2.1.0 or later + +Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot replicas run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. + +![504569](../assets/images/504569.jpg){width="500"} + +The number of replicas can easily be expanded by simply adding the IP addresses of the new replicas in the load balancer configuration. No repo-specific programming of the load balancer is needed. The shared storage can also be easily increased or decreased. + +> :pencil2: For high availability clustering with earlier zot releases, see [zot Clustering](clustering.md). + +## Prerequisites + +For easy scaling of replicas, the following conditions must be met: + +- All zot replicas must be running zot release v2.1.0 (or later) with identical configurations. +- All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. +- Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. +- A repo in storage can be written to only by the zot replica associated with that repo. +- Each zot replica in the cluster has its own IP address, but all replicas use the port number. +- The URI format sent to the load balancer must be /v2//: + +Beginning with zot release v2.1.0, garbage collection is allowed in the shared cluster storage. + +## How it works + +A highly scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. + +When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. + +When an image push request is received but no responsible replica is found for the requested repo, the receiving replica becomes the responsible replica and updates the hash table. + +## Configuration examples + +Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. + +### Cluster member configuration + +In the replica configuration, each replica must have a list of its peers. The replica must also have a hash key for hashing the repo path of the image request and a TLS certificate for authenticating with its peers. + +
+ Click here to view a sample cluster configuration for each replica. See the "cluster" section in the JSON structure. + +```json +{ + "distSpecVersion": "1.1.0", + "storage": { + "rootDirectory": "/tmp/zot", + "dedupe": false, + "remoteCache": true, + "storageDriver": { + "name": "s3", + "rootdirectory": "/zot", + "region": "us-east-1", + "regionendpoint": "localhost:4566", + "bucket": "zot-storage", + "secure": false, + "skipverify": false + }, + "cacheDriver": { + "name": "dynamodb", + "endpoint": "http://localhost:4566", + "region": "us-east-1", + "cacheTablename": "ZotBlobTable", + "repoMetaTablename": "ZotRepoMetadataTable", + "imageMetaTablename": "ZotImageMetaTable", + "repoBlobsInfoTablename": "ZotRepoBlobsInfoTable", + "userDataTablename": "ZotUserDataTable", + "versionTablename": "ZotVersion", + "apiKeyTablename": "ZotApiKeyTable" + } + }, + "http": { + "address": "127.0.0.1", + "port": "9001", + "tls": { + "cert": "test/data/server.cert", + "key": "test/data/server.key" + } + }, + "log": { + "level": "debug" + }, + "cluster": { + "members": [ + "127.0.0.1:9000", + "127.0.0.2:9000", + "127.0.0.3:9000" + ], + "hashKey": "loremipsumdolors", + "tls": { + "cacert": "test/data/ca.crt" + } + } +} +``` + +
+ +### HAProxy YAML configuration + +The HAProxy load balancer uses a simple round-robin balancing scheme and delivers a cookie to the requestor to maintain a sticky session connection to the assigned replica. + +
+ Click here to view a sample HAProxy configuration. + +```yaml + +global + log /dev/log local0 + log /dev/log local1 notice + chroot /var/lib/haproxy + maxconn 2000 + stats timeout 30s + +defaults + log global + mode tcp + option tcplog + option dontlognull + timeout connect 5000 + timeout client 50000 + timeout server 50000 + +frontend zot + bind *:8080 + default_backend zot-cluster + +backend zot-cluster + mode http + balance roundrobin + cookie SERVER insert indirect nocache + server zot0 127.0.0.1:9000 check cookie zot0 + server zot1 127.0.0.2:9000 check cookie zot1 + server zot2 127.0.0.3:9000 check cookie zot2 + +``` + +
+ +## When a replica fails + +Unlike the earlier [simple clustering scheme](clustering.md), the scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, you must bring down the cluster, repair the failed replica, and reestablish the cluster. + +With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster. + +## CVE repository in a zot cluster environment + +In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine. From b488f1fabaf9089345eab7b985bde1b4fd0618e1 Mon Sep 17 00:00:00 2001 From: mbshields Date: Mon, 13 May 2024 10:46:42 -0700 Subject: [PATCH 06/12] docs: add scaleout article to index Signed-off-by: mbshields --- mkdocs.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/mkdocs.yml b/mkdocs.yml index 4ccd5587..6c973fd7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -126,6 +126,7 @@ nav: - Retention Policies: articles/retention.md - Mirroring: articles/mirroring.md - Clustering: articles/clustering.md + - Easy scaling of a cluster: articles/scaleout.md - Monitoring: articles/monitoring.md - Using GraphQL for Enhanced Searches: articles/graphql.md - Benchmarking with zb: articles/benchmarking-with-zb.md From 7e17ec732b34ca5cc09e201bd3a6a18b58f8c7ba Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 15 May 2024 10:46:46 -0700 Subject: [PATCH 07/12] docs: revised for comments Signed-off-by: mbshields --- docs/articles/clustering.md | 27 +++++++++++++++------- docs/articles/scaleout.md | 45 ++++++++++++++++++++----------------- mkdocs.yml | 2 +- 3 files changed, 45 insertions(+), 29 deletions(-) diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 98631dfb..9eb90698 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -16,7 +16,7 @@ aggregate network throughput. ![504569](../assets/images/504569.jpg){width="500"} -> :pencil2: Beginning with zot release v2.1.0, you can design a highly scalable cluster that does not require configuring the load balancer to direct repository queries to specific zot instances within the cluster. See [Easy scaling of a zot cluster](scaleout.md). This is the preferred method if you are running v2.1.0 or later. +> :pencil2: Beginning with zot release v2.1.0, you can design a highly scalable cluster that does not require configuring the load balancer to direct repository queries to specific zot instances within the cluster. See [Scale-out clustering](scaleout.md). Scale-out clustering is the preferred method if you are running v2.1.0 or later. Clustering is supported in both bare-metal and Kubernetes environments. > :pencil2: @@ -59,8 +59,8 @@ zot maintains two types of durable state: - the image metadata in the registry’s cache In a stateless clustering scheme, the image data is stored in the remote -storage backend and the registry cache is disabled by turning off both -deduplication and garbage collection. +storage backend and the registry cache is disabled by turning off +deduplication. ## Ecosystem tools @@ -72,7 +72,7 @@ prefixes during load balancing and ingress gateway configuration. ## Examples -Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. +Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. Each replica is responsible for one or more repositories. ### HAProxy YAML configuration @@ -120,16 +120,27 @@ defaults frontend zot bind *:8080 mode http + use_backend zot-instance1 if { path_beg /v2/repo1/ } + use_backend zot-instance2 if { path_beg /v2/repo2/ } + use_backend zot-instance3 if { path_beg /v2/repo3/ } default_backend zot-cluster backend zot-cluster mode http balance roundrobin cookie SERVER insert indirect nocache - server zot0 127.0.0.1:9000 check cookie zot0 - server zot1 127.0.0.2:9000 check cookie zot1 - server zot2 127.0.0.3:9000 check cookie zot2 + server zot-server1 127.0.0.1:9000 check cookie zot-server1 + server zot-server2 127.0.0.2:9000 check cookie zot-server2 + server zot-server3 127.0.0.3:9000 check cookie zot-server3 +backend zot-instance1 + server zot-server1 127.0.0.1:9000 check maxconn 30 + +backend zot-instance2 + server zot-server2 127.0.0.2:9000 check maxconn 30 + +backend zot-instance3 + server zot-server3 127.0.0.3:9000 check maxconn 30 ``` @@ -145,7 +156,7 @@ backend zot-cluster "distSpecVersion": "1.0.1-dev", "storage": { "rootDirectory": "/tmp/zot", - "dedupe": true, + "dedupe": false, "storageDriver": { "name": "s3", "rootdirectory": "/zot", diff --git a/docs/articles/scaleout.md b/docs/articles/scaleout.md index 9102ef30..732c1791 100644 --- a/docs/articles/scaleout.md +++ b/docs/articles/scaleout.md @@ -1,22 +1,24 @@ -# Easy scaling of a zot cluster +# Scale-out clustering -> :point_right: A cluster of zot replicas can easily be scaled with no repo-specific programming of the load balancer using: +> :point_right: A cluster of zot instances can easily be scaled with no repo-specific programming of the load balancer using: > > - Stateless zot instances to simplify scale out > - Shared remote storage > - zot release v2.1.0 or later -Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot replicas run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. +Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot instances run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. Scale-out is achieved by automatically sharding based on repository name so that each zot instance is responsible for a subset of repositories. ![504569](../assets/images/504569.jpg){width="500"} -The number of replicas can easily be expanded by simply adding the IP addresses of the new replicas in the load balancer configuration. No repo-specific programming of the load balancer is needed. The shared storage can also be easily increased or decreased. +The number of instances can easily be expanded by simply adding the IP addresses of the new instances in the load balancer configuration. No repo-specific programming of the load balancer is needed. + +In a cloud deployment, the shared backend storage (such as AWS S3) and metadata storage (such as DynamoDB) can also be easily scaled along with the zot instances. > :pencil2: For high availability clustering with earlier zot releases, see [zot Clustering](clustering.md). ## Prerequisites -For easy scaling of replicas, the following conditions must be met: +For easy scaling of instances (replicas), the following conditions must be met: - All zot replicas must be running zot release v2.1.0 (or later) with identical configurations. - All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. @@ -25,15 +27,16 @@ For easy scaling of replicas, the following conditions must be met: - Each zot replica in the cluster has its own IP address, but all replicas use the port number. - The URI format sent to the load balancer must be /v2//: -Beginning with zot release v2.1.0, garbage collection is allowed in the shared cluster storage. ## How it works A highly scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. -When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. +When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table to determine whether the request can be handled locally or must be forwarded to another replica that is responsible for the repo. If the latter, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. + +> :pencil2: For better resistance to collisions and preimage attacks, zot uses SipHash as the hashing algorithm. -When an image push request is received but no responsible replica is found for the requested repo, the receiving replica becomes the responsible replica and updates the hash table. +> :bulb: Because this scale-out scheme greatly simplifies the role of the load balancer, it may be possible to eliminate the load balancer entirely by using a scheme such as DNS-based routing, exposing the zot replicas directly to the clients. ## Configuration examples @@ -41,7 +44,9 @@ Clustering is supported by using multiple stateless zot replicas with shared S3 ### Cluster member configuration -In the replica configuration, each replica must have a list of its peers. The replica must also have a hash key for hashing the repo path of the image request and a TLS certificate for authenticating with its peers. +In the replica configuration, each replica must have a list of its peers configured in the "members" section of the JSON structure. This is a list of reachable addresses or hostnames. Each replica owns one of these addresses. + +The replica must also have a hash key for hashing the repo path of the image request and a TLS certificate for authenticating with its peers.
Click here to view a sample cluster configuration for each replica. See the "cluster" section in the JSON structure. @@ -76,8 +81,8 @@ In the replica configuration, each replica must have a list of its peers. The re } }, "http": { - "address": "127.0.0.1", - "port": "9001", + "address": "0.0.0.0", + "port": "9000", "tls": { "cert": "test/data/server.cert", "key": "test/data/server.key" @@ -88,9 +93,9 @@ In the replica configuration, each replica must have a list of its peers. The re }, "cluster": { "members": [ - "127.0.0.1:9000", - "127.0.0.2:9000", - "127.0.0.3:9000" + "zot-server1:9000", + "zot-server2:9000", + "zot-server3:9000" ], "hashKey": "loremipsumdolors", "tls": { @@ -135,9 +140,9 @@ backend zot-cluster mode http balance roundrobin cookie SERVER insert indirect nocache - server zot0 127.0.0.1:9000 check cookie zot0 - server zot1 127.0.0.2:9000 check cookie zot1 - server zot2 127.0.0.3:9000 check cookie zot2 + server zot-server1 127.0.0.1:9000 check cookie zot-server1 + server zot-server2 127.0.0.2:9000 check cookie zot-server2 + server zot-server3 127.0.0.3:9000 check cookie zot-server3 ``` @@ -145,10 +150,10 @@ backend zot-cluster ## When a replica fails -Unlike the earlier [simple clustering scheme](clustering.md), the scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, you must bring down the cluster, repair the failed replica, and reestablish the cluster. +The scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, only those repositories that are mapped to the failed replica are affected. If the error is not transient, the cluster must be resized and restarted to exclude that replica. -With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster. +> :pencil2: With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster. ## CVE repository in a zot cluster environment -In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine. +CVE scanning is not supported for cloud deployments. In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine. diff --git a/mkdocs.yml b/mkdocs.yml index 6c973fd7..5d3ecdca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -126,7 +126,7 @@ nav: - Retention Policies: articles/retention.md - Mirroring: articles/mirroring.md - Clustering: articles/clustering.md - - Easy scaling of a cluster: articles/scaleout.md + - Scale-out clustering: articles/scaleout.md - Monitoring: articles/monitoring.md - Using GraphQL for Enhanced Searches: articles/graphql.md - Benchmarking with zb: articles/benchmarking-with-zb.md From 471c96fcd4229248c0ee9937b73a77464ffd5505 Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 15 May 2024 10:49:36 -0700 Subject: [PATCH 08/12] docs: spellcheck Signed-off-by: mbshields --- .wordlist.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.wordlist.txt b/.wordlist.txt index 79dc57fd..54044fe6 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -73,6 +73,7 @@ dex discoverable DistContentDigestKey DN +DNS Dockerfile dropdown dryRun @@ -108,6 +109,7 @@ gui haproxy HAProxy hostname +hostnames href html htpasswd @@ -202,6 +204,7 @@ podman pollInterval pprof PR +preimage prometheus PRs pulledWithin From 99ba5e86553a62b83cf97eb61951bdf14de01305 Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 15 May 2024 11:08:18 -0700 Subject: [PATCH 09/12] docs: spellcheck again Signed-off-by: mbshields --- .wordlist.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/.wordlist.txt b/.wordlist.txt index 54044fe6..d8244f0f 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -240,6 +240,7 @@ semver serviceAccount SHA sharding +SipHash skipverify skopeo SLI From 7fbed46153ba8ac9ec9c961063e0f1779d24a6cc Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 22 May 2024 16:16:54 -0700 Subject: [PATCH 10/12] docs: describe DNS-based routing for load balancing Signed-off-by: mbshields --- docs/articles/scaleout.md | 43 +++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/articles/scaleout.md b/docs/articles/scaleout.md index 732c1791..7a1dc82c 100644 --- a/docs/articles/scaleout.md +++ b/docs/articles/scaleout.md @@ -1,16 +1,12 @@ # Scale-out clustering -> :point_right: A cluster of zot instances can easily be scaled with no repo-specific programming of the load balancer using: +> :point_right: A cluster of zot instances can be easily scaled with no repo-specific intelligence in the load balancing scheme, using: > > - Stateless zot instances to simplify scale out > - Shared remote storage > - zot release v2.1.0 or later -Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot instances run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. Scale-out is achieved by automatically sharding based on repository name so that each zot instance is responsible for a subset of repositories. - -![504569](../assets/images/504569.jpg){width="500"} - -The number of instances can easily be expanded by simply adding the IP addresses of the new instances in the load balancer configuration. No repo-specific programming of the load balancer is needed. +Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot instances run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. A highly scalable cluster can be architected by automatically sharding based on repository name so that each zot instance is responsible for a subset of repositories. In a cloud deployment, the shared backend storage (such as AWS S3) and metadata storage (such as DynamoDB) can also be easily scaled along with the zot instances. @@ -22,25 +18,38 @@ For easy scaling of instances (replicas), the following conditions must be met: - All zot replicas must be running zot release v2.1.0 (or later) with identical configurations. - All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. -- Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. -- A repo in storage can be written to only by the zot replica associated with that repo. -- Each zot replica in the cluster has its own IP address, but all replicas use the port number. -- The URI format sent to the load balancer must be /v2//: - +- Each zot replica in the cluster has its own IP address, but all replicas use the same port number. +- The URI format sent to the cluster must be /v2//: ## How it works -A highly scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. +Each repo is served by one zot replica, and that replica is solely responsible for serving all images of that repo. A repo in storage can be written to only by the zot replica responsible for that repo. + +When a zot replica in the cluster receives an image push or pull request for a repo, the receiving replica hashes the repo path and consults a hash table to determine which replica is responsible for the repo. -When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table to determine whether the request can be handled locally or must be forwarded to another replica that is responsible for the repo. If the latter, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor. +- If the hash indicates that another replica is responsible, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the response to the requestor. +- If the hash indicates that the current (receiving) replica is responsible, the request is handled locally. +- If the hash indicates that no replica is responsible, the receiving replica becomes the responsible replica for that repo, and the request is handled locally. > :pencil2: For better resistance to collisions and preimage attacks, zot uses SipHash as the hashing algorithm. -> :bulb: Because this scale-out scheme greatly simplifies the role of the load balancer, it may be possible to eliminate the load balancer entirely by using a scheme such as DNS-based routing, exposing the zot replicas directly to the clients. +Either of the following two schemes can be used to reach the cluster. + +### Using a single entry point load balancer + +![504569](../assets/images/504569.jpg){width="500"} + +When a single entry point load balancer such as [HAProxy](https://www.haproxy.com/) is deployed, the number of zot replicas can easily be expanded by simply adding the IP addresses of the new replicas in the load balancer configuration. + +When the load balancer receives an image push or pull request for a repo, it forwards the request to any replica in the cluster. No repo-specific programming of the load balancer is needed because the load balancer does not need to know which replica owns which repo. The replicas themselves can determine this. + +### Using DNS-based load balancing + +Because the scale-out architecture greatly simplifies the role of the load balancer, it may be possible to eliminate the load balancer entirely. A scheme such as [DNS-based routing](https://coredns.io/plugins/loadbalance/) can be implemented, exposing the zot replicas directly to the clients. ## Configuration examples -Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. +In these examples, clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancer forwarding traffic to the replicas. ### Cluster member configuration @@ -150,9 +159,9 @@ backend zot-cluster ## When a replica fails -The scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, only those repositories that are mapped to the failed replica are affected. If the error is not transient, the cluster must be resized and restarted to exclude that replica. +The scale-out clustering scheme described in this article is not self-healing when a replica fails. In case of a replica failure, only those repositories that are mapped to the failed replica are affected. If the error is not transient, the cluster must be resized and restarted to exclude that replica. -> :pencil2: With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster. +> :bulb: With an HAProxy load balancer, we recommend implementing an [HAProxy circuit breaker](https://www.haproxy.com/blog/circuit-breaking-haproxy) to monitor and protect the cluster. ## CVE repository in a zot cluster environment From bd2d78b2bebb3da2779de33cd64ab236872a994e Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 29 May 2024 13:43:26 -0700 Subject: [PATCH 11/12] docs: address latest review comments Signed-off-by: mbshields --- docs/articles/clustering.md | 2 +- docs/articles/scaleout.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/articles/clustering.md b/docs/articles/clustering.md index 9eb90698..da0244fa 100644 --- a/docs/articles/clustering.md +++ b/docs/articles/clustering.md @@ -74,7 +74,7 @@ prefixes during load balancing and ingress gateway configuration. Clustering is supported by using multiple stateless zot replicas with shared S3 storage and an HAProxy (with sticky session) load balancing traffic to the replicas. Each replica is responsible for one or more repositories. -### HAProxy YAML configuration +### HAProxy configuration
Click here to view a sample HAProxy configuration. diff --git a/docs/articles/scaleout.md b/docs/articles/scaleout.md index 7a1dc82c..d12b0b3d 100644 --- a/docs/articles/scaleout.md +++ b/docs/articles/scaleout.md @@ -19,7 +19,7 @@ For easy scaling of instances (replicas), the following conditions must be met: - All zot replicas must be running zot release v2.1.0 (or later) with identical configurations. - All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas. - Each zot replica in the cluster has its own IP address, but all replicas use the same port number. -- The URI format sent to the cluster must be /v2//: + ## How it works @@ -29,7 +29,7 @@ When a zot replica in the cluster receives an image push or pull request for a r - If the hash indicates that another replica is responsible, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the response to the requestor. - If the hash indicates that the current (receiving) replica is responsible, the request is handled locally. -- If the hash indicates that no replica is responsible, the receiving replica becomes the responsible replica for that repo, and the request is handled locally. + > :pencil2: For better resistance to collisions and preimage attacks, zot uses SipHash as the hashing algorithm. @@ -116,7 +116,7 @@ The replica must also have a hash key for hashing the repo path of the image req
-### HAProxy YAML configuration +### HAProxy configuration The HAProxy load balancer uses a simple round-robin balancing scheme and delivers a cookie to the requestor to maintain a sticky session connection to the assigned replica. From 99ec9b23812e7ed3ad40c1573a88a029e273c23d Mon Sep 17 00:00:00 2001 From: mbshields Date: Wed, 29 May 2024 15:17:33 -0700 Subject: [PATCH 12/12] docs: add Registry sync Signed-off-by: mbshields --- docs/articles/scaleout.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/articles/scaleout.md b/docs/articles/scaleout.md index d12b0b3d..a1bb57c8 100644 --- a/docs/articles/scaleout.md +++ b/docs/articles/scaleout.md @@ -166,3 +166,7 @@ The scale-out clustering scheme described in this article is not self-healing wh ## CVE repository in a zot cluster environment CVE scanning is not supported for cloud deployments. In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine. + +## Registry sync + +The [sync](../articles/mirroring.md) feature of zot, either on demand or periodic, is compatible with scale-out clustering. In this case, the repo names are hashed to a particular replica and only that replica will perform the sync. \ No newline at end of file