From 137b6c50640688b196106e64452620e28f56c8f2 Mon Sep 17 00:00:00 2001
From: Hank Donnay <hdonnay@redhat.com>
Date: Mon, 25 Mar 2024 17:46:44 -0500
Subject: [PATCH] docs: add mention of disk space path and usage

Also reworks the formatting and some grammar.

Signed-off-by: Hank Donnay <hdonnay@redhat.com>
---
 Documentation/howto/deployment.md | 73 ++++++++++++++++++++++---------
 1 file changed, 53 insertions(+), 20 deletions(-)

diff --git a/Documentation/howto/deployment.md b/Documentation/howto/deployment.md
index 03fa7779ee..0cd7a0f6de 100644
--- a/Documentation/howto/deployment.md
+++ b/Documentation/howto/deployment.md
@@ -1,22 +1,27 @@
 # Deploying Clair
 
-Clair v4 was designed with flexible deployment architectures in mind. The operator is free to choose a deployment model which scales to their use cases.
+Clair v4 was designed with flexible deployment architectures in mind.
+An operator is free to choose a deployment model which scales to their use cases.
 
 ## Configuration
 
-Before jumping directly into the models its important to note that Clair is designed to use a single configuration file across all node types. This design decision makes it very easy to deploy on systems like Kubernetes and OpenShift.
+Before jumping directly into the models, its important to note that Clair is designed to use a single configuration file across all node types.
+This design decision makes it very easy to deploy on systems like Kubernetes and OpenShift.
 
 See [Config Reference](../reference/config.md)
 
 ## Combined Deployment
 
-In a combined deployment, all the Clair processes run in a single OS process. This is by far the easiest deployment model to configure as it involves the least moving parts. 
+In a combined deployment, all the Clair processes run in a single OS process.
+This is by far the easiest deployment model to configure as it involves the least moving parts. 
 
-A load balancer is still recommended if you plan on performing TLS termination. Typically this will be a OpenShift route or a Kubernetes ingress.
+A load balancer is still recommended if you plan on performing TLS termination.
+Typically this will be a OpenShift route or a Kubernetes ingress.
 
 ![combo mode single db deployment diagran](./clairv4_combo_single_db.png)
 
-In the above diagram, Clair is running in combo mode and talking to a single database. To configure this model you will provide all node types the same database and start Clair in **combo** mode.
+In the above diagram, Clair is running in combo mode and talking to a single database.
+To configure this mode, you will provide all node types the same database and start Clair in **combo** mode.
 
 ```
 ...
@@ -29,15 +34,17 @@ notifier:
     connstring: "host=clairdb user=pqgotest dbname=pqgotest sslmode=verify-full"
     ...
 ```
-In this mode, any configuration informing Clair how to talk to other nodes is ignored, it is not needed as all intra-process communication is done directly.
+In this mode, any configuration informing Clair how to talk to other nodes is ignored;
+it is not needed as all intra-process communication is done directly.
 
 For added flexibility, it's also supported to split the databases while in combo mode.
 
 ![combo mode multiple db deployment diagran](./clairv4_combo_multi_db.png)
 
-In the above diagram, Clair is running in combo mode but database load is split between multiple databases. Since Clair is conceptually a set of micro-services, its processes do not share database tables even when combined into the same OS process.
+In the above diagram, Clair is running in combo mode but database load is split between multiple databases.
+Since Clair is conceptually a set of micro-services, its processes do not share database tables even when combined into the same OS process.
 
-To configure this model, you would provide each process its own "connstring" in the configuration. 
+To configure this mode, you would provide each process its own "connstring" in the configuration. 
 ```
 ...
 indexer:
@@ -54,18 +61,26 @@ notifier:
 
 If your application needs to asymmetrically scale or you expect high load you may want to consider a distributed deployment.
 
-In a distributed deployment, each Clair process runs in its own OS process. Typically this will be a Kubernetes or OpenShift Pod.
+In a distributed deployment, each Clair process runs in its own OS process.
+Typically this will be a Kubernetes or OpenShift Pod.
+
+A load balancer **must** be setup in this deployment model.
+The load balancer will route traffic between Clair nodes along with routing API requests via [path based routing] to the correct services.
+In a Kubernetes or OpenShift deployment this is usually handled with the `Service` and `Route` abstractions.
+If deploying on bare metal, a load balancer will need to be configured appropriately. 
 
-A load balancer **must** be setup in this deployment model. The load balancer will route traffic between Clair nodes along with routing API requests via [path based routing](https://devcentral.f5.com/s/articles/the-three-http-routing-patterns-you-should-know-30764) to the correct services. In a Kubernetes or OpenShift deployment this is usually handled with the Service and Routes abstractions. If deploying on bare metal, a load balancer will need to be configured appropriately. 
 
 ![distributed mode multiple db deployment diagran](./clairv4_distributed_multi_db.png)
 
-In the above diagram, a load balancer is configured to route traffic coming from the client to the correct service. This routing is path based routing and requires a layer 7 load balancer. Traefik, Nginx, and HAProxy are all capable of this. As mentioned above, this functionality is native to OpenShift and Kubernetes.
+In the above diagram, a load balancer is configured to route traffic coming from the client to the correct service.
+This routing is path based and requires a layer 7 load balancer.
+Traefik, Nginx, and HAProxy are all capable of this.
+As mentioned above, this functionality is native to OpenShift and Kubernetes.
 
-In this configuration, you'd supply each process with database connection strings and addresses for their dependent services. Each OS process will need to have its "mode" CLI flag or environment variable set to the appropriate value. 
+In this configuration, you'd supply each process with database connection strings and addresses for their dependent services.
+Each OS process will need to have its "mode" CLI flag or environment variable set to the appropriate value. 
 See [Config Reference](../reference/config.md)
 
-
 ```
 ...
 indexer:
@@ -81,19 +96,34 @@ notifier:
     ...
 ```
 
-Keep in mind a config file per process is not need. Processes only use the values necessary for their configured mode.
+Keep in mind a config file per process is not need.
+Processes only use the values necessary for their configured mode.
 
 ## TLS Termination
 
-Currently Clair offloads TLS termination to the load balancing infrastructure. This design choice is due to the ubiquity of Kubernetes and OpenShift infrastructure already providing this facility.
+It's recommended to offload TLS termination to the load balancing infrastructure.
+This design choice is due to the ubiquity of Kubernetes and OpenShift infrastructure already providing this facility.
+
+If this is not possible for some reason, it is possible to have processes terminate TLS by using the `$.tls` configuration key.
+A load balancer is still required.
+
+## Disk Usage Considerations
+
+By default, Clair will store container layers in `/var/tmp` while in use.
+This can be changed by setting the `TMPDIR` environment variable.
+There's currently no way to change this in the configuration file.
+
+The disk space needed depends on the precise layers being indexed at any one time,
+but a good approximation is twice as large as the largest (uncompressed size) layer in the corpus.
 
 ## More On Path Routing
 
-If you are considering a distributed deployment you will need more details on [path based routing](https://devcentral.f5.com/s/articles/the-three-http-routing-patterns-you-should-know-30764). 
+If you are considering a distributed deployment you will need more details on [path based routing]. 
 
-Learn how to grab our OpenAPI spec [here](./api.md) and either start up a local dev instance of the swagger editor or load the spec file into the [online editor](https://petstore.swagger.io/#/)
+Learn how to grab our OpenAPI spec [here](./api.md) and either start up a local dev instance of the swagger editor or load the spec file into the [online editor](https://petstore.swagger.io/#/).
 
-You will notice particular API paths are grouped by the services which implement them. This is your guide to configure your layer 7 load balancer correctly. 
+You will notice particular API paths are grouped by the services which implement them.
+This is your guide to configure your layer 7 load balancer correctly. 
 
 When the load balancer encounters a particular path prefix it must send those request to the correct set of Clair nodes. 
 
@@ -106,6 +136,9 @@ For example, this is how we configure Traefik in our local development environme
 - "traefik.http.services.notifier.loadbalancer.server.port=6000"
 ```
 
-This configuration is saying "take any paths prefixes of /notifier/ and send them to the notifier services on port 6000"
+This configuration is saying "take any paths prefixes of /notifier/ and send them to the notifier services on port 6000".
+
+Every load balancer will have their own way to perform path routing.
+Check the documentation for your infrastructure of choice.
 
-Every load balancer will have their own way to perform path routing. Check the documentation for your infrastructure of choice.
+[path based routing]: https://devcentral.f5.com/s/articles/the-three-http-routing-patterns-you-should-know-30764