docs: add Scaling the cluster to clustering article #170

mbshields · 2024-04-15T18:34:07Z

What type of PR is this?

documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

rchincha · 2024-04-16T16:17:08Z

cc: @vrajashkr

Signed-off-by: mbshields <[email protected]>

andaaron

LGTM

Signed-off-by: mbshields <[email protected]>

rchincha · 2024-05-14T16:14:47Z

docs/articles/clustering.md

@@ -122,9 +125,10 @@ frontend zot
 backend zot-cluster
    mode http
    balance roundrobin
-    server zot1 127.0.0.1:8081 check


https://www.haproxy.com/blog/path-based-routing-with-haproxy
^ use this example instead

route to a backend based on path's prefix

use_backend zot-instance1 if { path_beg /v2/repo1/ }
use_backend zot-instance2 if { path_beg /v2/repo2/ }

backend zot-instance1
server zot-server1 127.0.0.1:8080 check maxconn 30

backend zot-instance2
server zot-server2 127.0.0.1:8081 check maxconn 30

zot config dedupe=false

it is this manual and dynamic config (repos may come and go anytime), which we improve upon in the new scale-out

In basic clustering article, zot and HAProxy configs are revised. Please verify.

rchincha · 2024-05-14T16:16:33Z

docs/articles/scaleout.md

@@ -0,0 +1,154 @@
+# Easy scaling of a zot cluster


"Scale-out clustering"

Renamed title

rchincha · 2024-05-14T16:18:46Z

docs/articles/scaleout.md

+> -   zot release v2.1.0 or later
+
+Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances.  As before, multiple identical zot replicas run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments. 
+


Scale-out is achieved by automatically sharding based on repository name so that each zot instance is responsible for a subset of repositories.

In the cloud deployment case, the backend (for example S3) and metadata storage (for example dynamodb) can be scaled along with the zot instances.

rchincha · 2024-05-14T16:22:14Z

docs/articles/scaleout.md

+- Each zot replica in the cluster has its own IP address, but all replicas use the port number.
+- The URI format sent to the load balancer must be /v2/<repo\>/<manifest\>:<tag\>
+
+Beginning with zot release v2.1.0, garbage collection is allowed in the shared cluster storage.


Drop this line.

Removed mention of garbage collection

rchincha · 2024-05-14T16:23:41Z

docs/articles/scaleout.md

+
+A highly scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this.  
+
+When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor.


"in shared storage" ... drop this

The hash lookup determines if the request needs to be handled locally or forwarded to the right zot instance.

Note that we use siphash as our hashing algorithm for better collision and pre-image resistance.

Add this somewhere here ... note that the zot instances in the cluster can be exposed directly to clients as well without the need for a load balancer. For example, DNS based routing.

rchincha · 2024-05-14T16:30:02Z

docs/articles/scaleout.md

+  },
+  "http": {
+    "address": "127.0.0.1",
+    "port": "9001",


Changed port to 9000

rchincha · 2024-05-14T16:30:09Z

docs/articles/scaleout.md

+    }
+  },
+  "http": {
+    "address": "127.0.0.1",


Changed address to 0.0.0.0

rchincha · 2024-05-14T16:31:43Z

docs/articles/scaleout.md

+```
+
+</details>
+


"members" is a list of reachable addresses among each other and each zot instance owns one of these addresses.

rchincha · 2024-05-14T16:33:05Z

docs/articles/scaleout.md

+
+## When a replica fails
+
+Unlike the earlier [simple clustering scheme](clustering.md), the scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, you must bring down the cluster, repair the failed replica, and reestablish the cluster.


"Unlike the earlier" ... drop that part

Only those repositories that are mapped to a particular zot instance will be affected. If the error is not transient, then the cluster must be resized and restarted to exclude that node.

rchincha · 2024-05-14T16:35:26Z

docs/articles/scaleout.md

+
+## CVE repository in a zot cluster environment
+
+In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine.


CVE scanning is not supported for cloud deployments. When local scale-out lands, we should be able to do it.

Signed-off-by: mbshields <[email protected]>

rchincha · 2024-05-21T16:22:59Z

There are a couple of ways the "zot cluster" can be reached.

A single entry point via haproxy (load-balancer) via some DNS name and the cluster hidden behind it
Expose the members of the cluster via DNS based load-balancing (https://coredns.io/plugins/loadbalance/)

^ we should point these out.

Signed-off-by: mbshields <[email protected]>

vrajashkr

Thanks for the awesome article! Left a few minor comments.

vrajashkr · 2024-05-29T13:25:55Z

docs/articles/clustering.md


-### YAML configuration
+### HAProxy YAML configuration


I'm not sure if this config is actually YAML.

As far as I am aware, haproxy uses a custom config file format as mentioned here: https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#2.1

Changed to HAProxy configuration

vrajashkr · 2024-05-29T13:30:01Z

docs/articles/scaleout.md

+- All zot replicas must be running zot release v2.1.0 (or later) with identical configurations.
+- All zot replicas in the cluster use remote storage at a single shared S3 backend. There is no local caching in the zot replicas.
+- Each zot replica in the cluster has its own IP address, but all replicas use the same port number.
+- The URI format sent to the cluster must be /v2/<repo\>/<manifest\>:<tag\>


Not sure if this is a pre-requisite as such.
Only the requests having /v2/ would be proxied, but it's not a pre-requisite to scale instances as such. The other APIs continue to work as usual as the storage is shared.

Removed line 22 ("- The URI format sent to the cluster must be /v2/<repo>/<manifest>:<tag>")

vrajashkr · 2024-05-29T13:31:39Z

docs/articles/scaleout.md

+
+- If the hash indicates that another replica is responsible, the receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the response to the requestor. 
+- If the hash indicates that the current (receiving) replica is responsible, the request is handled locally.
+- If the hash indicates that no replica is responsible, the receiving replica becomes the responsible replica for that repo, and the request is handled locally.


With our implementation, there will always be a responsible replica as we identify a replica based on the list of available replicas mentioned in the config file.

Removed line 32 ("If the hash indicates that no replica is responsible,....")

vrajashkr · 2024-05-29T13:34:04Z

docs/articles/scaleout.md

+
+</details>
+
+### HAProxy YAML configuration


Similar comment as earlier regarding the fact that the HAProxy config doesn't appear to be a YAML config file.

Changed to HAProxy configuration

Signed-off-by: mbshields <[email protected]>

rchincha · 2024-05-29T21:27:50Z

@mbshields Also add a note at the end that the "sync" feature is compatible with this change in that whether it is on-demand or periodic, the repo names are hashed to a particular node and only that node will do the sync.

Signed-off-by: mbshields <[email protected]>

vrajashkr

Thanks for addressing the comments. Lgtm!

vrajashkr · 2024-05-30T14:26:19Z

Just curious - will the commits be squashed into a single one for merge?

rchincha

lgtm

mbshields requested a review from rchincha April 15, 2024 18:34

rchincha requested review from rchamarthy, andaaron and eusebiu-constantin-petu-dbk April 16, 2024 16:17

mbshields added 3 commits May 6, 2024 10:44

docs: add Scaling the cluster to clustering article

5388ac7

Signed-off-by: mbshields <[email protected]>

docs: add Scaling the cluster to clustering article - spellcheck

4584b33

Signed-off-by: mbshields <[email protected]>

docs: add scale-out info to clustering article

18be84a

Signed-off-by: mbshields <[email protected]>

mbshields force-pushed the docs_mishield_scaleout branch from 5438975 to 18be84a Compare May 6, 2024 18:09

docs: add words to wordlist

816e085

Signed-off-by: mbshields <[email protected]>

andaaron approved these changes May 7, 2024

View reviewed changes

docs: add info to scaleout article

77aef92

Signed-off-by: mbshields <[email protected]>

mbshields force-pushed the docs_mishield_scaleout branch from f316dd3 to 77aef92 Compare May 13, 2024 17:41

docs: add scaleout article to index

b488f1f

Signed-off-by: mbshields <[email protected]>

rchincha reviewed May 14, 2024

View reviewed changes

mbshields added 3 commits May 15, 2024 10:46

docs: revised for comments

7e17ec7

Signed-off-by: mbshields <[email protected]>

docs: spellcheck

471c96f

Signed-off-by: mbshields <[email protected]>

docs: spellcheck again

99ba5e8

Signed-off-by: mbshields <[email protected]>

docs: describe DNS-based routing for load balancing

7fbed46

Signed-off-by: mbshields <[email protected]>

vrajashkr suggested changes May 29, 2024

View reviewed changes

docs: address latest review comments

bd2d78b

Signed-off-by: mbshields <[email protected]>

docs: add Registry sync

99ec9b2

Signed-off-by: mbshields <[email protected]>

vrajashkr approved these changes May 30, 2024

View reviewed changes

rchincha approved these changes May 31, 2024

View reviewed changes

rchincha merged commit 242d0f4 into project-zot:main May 31, 2024
4 checks passed

		> - zot release v2.1.0 or later

		Beginning with zot release v2.1.0, a new "scale-out" architecture greatly reduces the configuration required when deploying large numbers of zot instances. As before, multiple identical zot replicas run simultaneously using the same shared reliable storage, but with improved scale and performance in large deployments.


		A highly scalable cluster can be architected by sharding on the repository name. In the cluster, each replica is the owner of a small subset of the repository. The load balancer does not need to know which replica owns which repo. The replicas themselves can determine this.

		When the load balancer receives an image push or pull request, it forwards the request to any replica in the cluster. The receiving replica hashes the repo path and consults a hash table in shared storage to determine which replica is responsible for the repo. The receiving replica forwards the request to the responsible replica and then acts as a proxy, returning the requested image to the requestor.


		## When a replica fails

		Unlike the earlier [simple clustering scheme](clustering.md), the scale-out scheme described in this article is not self-healing when a replica fails. In case of a replica failure, you must bring down the cluster, repair the failed replica, and reestablish the cluster.


		## CVE repository in a zot cluster environment

		In the scale-out clustering scheme described in this article, CVE scanning is disabled. In this case, we recommend implementing a CVE repository with a zot instance outside of the cluster using a local disk for storage and [Trivy](https://trivy.dev/) as the detection engine.

		```

		</details>

docs: add Scaling the cluster to clustering article #170

docs: add Scaling the cluster to clustering article #170

Conversation

mbshields commented Apr 15, 2024

rchincha commented Apr 16, 2024

andaaron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

route to a backend based on path's prefix

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rchincha commented May 21, 2024

vrajashkr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rchincha commented May 29, 2024

vrajashkr left a comment

Choose a reason for hiding this comment

vrajashkr commented May 30, 2024

rchincha left a comment

Choose a reason for hiding this comment