Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPSEXP-2528: introduce KEDA scaling auto support for ATS and repo #1161

Merged
merged 48 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
50bf3ba
Add Keda cart as a dependency
alxgomz May 27, 2024
090e28c
Add Keda auth trigger for activemq scaler
alxgomz May 27, 2024
0708c4e
Add keda autoscaler for imagemagick
alxgomz May 27, 2024
0af43bd
Add keda autoscaler for libreoffice
alxgomz May 27, 2024
f7c72e6
Add keda autoscaler for misc tranformer
alxgomz May 27, 2024
135c4cf
Add keda autoscaler for pdf renderer
alxgomz May 27, 2024
7a4cc82
Add keda autoscaler for tika
alxgomz May 27, 2024
789c5bf
fixup for im scaler
alxgomz May 27, 2024
a963f41
fixup triggerAuth name
alxgomz May 28, 2024
63516ce
fix amq management endpoint to point ot right service
alxgomz May 28, 2024
785e9c4
fix brokerName in keda activemq scalers triggers
alxgomz May 28, 2024
f8a9cf0
add keda crds
alxgomz May 31, 2024
84efc62
add enterprise search auto scalers
alxgomz May 31, 2024
6b2f51d
avoid flapping & ensure smooth startup
alxgomz May 31, 2024
fabf785
reuse hpa definitions and values for ATS scalers
alxgomz May 31, 2024
9e80ce7
move keda scaler trigers to a named template
alxgomz Jun 3, 2024
9c8c5f9
add component based condition
alxgomz Jun 3, 2024
3db50d1
add vars for keda scaler basic params
alxgomz Jun 3, 2024
ab1ec31
support for external activemq when using activemq scaler
alxgomz Jun 3, 2024
70addbe
support for external activemq when using activemq scaler
alxgomz Jun 3, 2024
a1a03d9
add somes tests
alxgomz Jun 3, 2024
3cd68f2
move keda scaler basic options to a named template
alxgomz Jun 3, 2024
946ae09
remove enterprise search scalerobjs (out of scope)
alxgomz Jun 4, 2024
6b68319
remove Keda chart dependency and rename keda ATS scale objects
alxgomz Jun 4, 2024
e575dda
change keda scaled object conditions
alxgomz Jun 4, 2024
783d762
change labels and naming of ATS scaledobj
alxgomz Jun 4, 2024
2e15366
update chart metadata and doc
alxgomz Jun 4, 2024
257e973
add namespace to service address for multi ns deployments
alxgomz Jun 4, 2024
641d6dc
align tests
alxgomz Jun 4, 2024
03cd255
make ActiveMQ KEDA target value a variable (per tengine)
alxgomz Jun 4, 2024
242b4e1
fixup
alxgomz Jun 4, 2024
e41f57d
fix KEDA activemq scaler auth trigger
alxgomz Jun 4, 2024
2b02692
fixup restAPITemplate
alxgomz Jun 4, 2024
af527da
fixup targetQueueSize
alxgomz Jun 4, 2024
7ad509b
add keda scaling doc for ATS
alxgomz Jun 4, 2024
0ce32b5
fix comments
alxgomz Jun 4, 2024
0371d76
add repository prometheus KEDA scaledobject
alxgomz Jun 4, 2024
f60b282
support for external broker name
alxgomz Jun 4, 2024
5fe5519
add more doc and repo KEDA scaler
alxgomz Jun 4, 2024
6d00021
fixup
alxgomz Jun 5, 2024
1f1d47e
prevent scaling to zero for both ATS and repo
alxgomz Jun 5, 2024
c19f87b
document AWS AMQ restrictions
alxgomz Jun 5, 2024
a362994
fix tests
alxgomz Jun 5, 2024
9a6a575
Revert "prevent scaling to zero for both ATS and repo"
alxgomz Jun 5, 2024
3d90552
reintroduce idle scale down to 0 wit doc and tests
alxgomz Jun 5, 2024
4a6cb82
Apply suggestions from code review
alxgomz Jun 6, 2024
a72d918
revierw comments
alxgomz Jun 6, 2024
3dac2fa
typo
alxgomz Jun 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 195 additions & 5 deletions docs/helm/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,210 @@ parent: Guides
grand_parent: Helm
---

# Alfresco components auto-scaling
# Automatically scaling Alfresco Content Services

`alfresco-content-services` can leverage Kubernetes HorinzontalPodAutoscaling
provided by individual Alfresco components. This means that you can add more
instances of the same service to handle more load. This is a common pattern in
cloud environments, where you can add more instances of a service to handle
more load, and remove instances when the load decreases.
cloud environments, and also allows to remove instances when the load decreases.

`alfresco-content-services` can also leverage the [KEDA](https://keda.sh/)
framework to scale based on custom, more application centric metrics. To use
this more advanced scaling mechanism, you of course need to have KEDA installed.

This document aims at providing a details on configuring each of the components
which support HPA.
which support HPA, either using plain kKubernetes HPA or a [KEDA
scaler](https://keda.sh/docs/latest/scalers/).

Which type of auto scaling is best?
Well that depends of course. We can see the basic HPA based on CPU as a reactive
auto scaling strategy where the system spins up more pods because existing pods
are already quite loaded. Also the threshold which triggers scaling is
calculated based on resource reservation and hence result in different behavior
based on the resource allocation strategy you have chosen. For example, if you
prefer to rely on the cluster ability to over-commit resources, you might want
to set the threshold to a higher value, otherwise scaling will be triggered
sooner than expected. In this way, the CPU based autoscaling is a bit more
difficult to tune.
On the other hand, the KEDA based scaling is more proactive, as it can be
triggered by custom metrics from any other system. For example in the Alfresco
content platform, you could scale specific services based on the number of
messages in the message broker (This is what we document here for ATS pods), or
the number of active users (or a metric which is a representation of this).

## Prerequisites

All scaling capabilities requires Alfresco Enterprise Edition.
In order to use the autoscaling features, you need to have a Kubernetes cluster
with a metrics server installed.
If you're planning on using basic HPA, you need to have the Kubernetes "vanilla"
[metrics-server`](https://github.com/kubernetes-sigs/metrics-server).

Check the [official metric-server
documentation](https://github.com/kubernetes-sigs/metrics-server) for more
information on how to install the metrics server and which version is compatible
with your cluster.

If you prefer to use KEDA, you need to have KEDA installed in your cluster. You
can find the installation instructions in the [KEDA official
documentation](https://keda.sh/docs/latest/deploy/). Make sure to install the
appropriate Custom Resource Definitions (CRDs) for the scalers you want to use.

e.g:

```bash
helm install \
--repo https://kedacore.github.io/charts alfresco-keda keda \
--namespace keda \
--version 2.14.2
```

## Alfresco components auto-scaling

### Alfresco Repository

## Alfresco Repository
#### Basic (CPU based) scaling for Alfresco repository

Refer to the
[alfresco-repository auto-scaling
documentation](https://github.com/Alfresco/alfresco-helm-charts/blob/main/charts/alfresco-repository/docs/autoscaling.md)
for a detailed guide on Alfresco repository auto-scaling configuration and
implications.

#### KEDA based scaling for Alfrsco repository

To start with, make sure your Kubernetes cluster has KEDA & prometheus installed.
You must also make sure Alfresco repository is setup to expose prometheus
metrics and prometheus has the appropriate scrape configuration.

Refer to the [acs-packaging
doc](https://github.com/Alfresco/acs-packaging/tree/master/docs/micrometer)

The minimum configuration for the Alfresco repository to expose prometheus
metrics should be:

```yaml
alfresco-repository:
environment:
CATALINA_OPTS: >-
-Dmetrics.enabled=true
-Dmetrics.jvmMetricsReporter.enabled=true
...
```

##### Prometheus scaler

The KEDA based auto scaler relies on the number of Tomcat threads used. By
default the Alfresco repository image uses up to 200 threads. When the system
consistently uses more than 170 threads, the KEDA scaler will start to scale up
the number of pods. This can be tuned using the
`alfresco-repository.autoscaling.kedaTargetValue` if your image has a
configuration with more or less `maxThreads`.
In the same maner the parameters below can be set:

* `behavior.scaleUp.stabilizationWindowSeconds`: The number of threads used must
remain above target on average for 30 seconds before a scale up can happen.
* `kedaPollingInterval`: threads are checked every 15 seconds.
* `kedaInitialCoolDownPeriod`: KEDA will wait for 5 minutes before activating
the scaling object (before no scaling can happen).
* `minReplicas`: The default minimum number of replica count is 1.
* `maxReplicas`: The default maximum number of replica count is 3.

### Alfresco Transform Service

#### Basic (CPU based) scaling for ATS

Refer to the
[alfresco-repository auto-scaling
documentation](https://alfresco.github.io/alfresco-helm-charts/charts/alfresco-transform-service/docs/autoscaling.html)
for a detailed guide on Alfresco repository auto-scaling configuration and
implications.

#### KEDA based scaling for ATS

To start with, make sure your Kubernetes cluster has KEDA installed

##### Activemq scaler

Regular ActiveMQ instances exposes a rest API which can be used to get the
number of messages in a queue. This can be used to scale individual ATS T-engine
pods. This scaling mechanism is implemented directly in the
`alfresco-content-services` chart. To enable it you need to set the following:

```yaml
keda:
components:
- alfresco-transform-service
```

This will install the KEDA activemq scaler and configure it to scale all the
T-engine workloads (`imagemagick`, `libreoffice`, `transformmisc`, `pdfrenderer`
& `tika`) as described below:

* `kedaTargetValue`: new pods will be started when the corresponding message
queue has more than 10 messages.
* `behavior.scaleUp.stabilizationWindowSeconds`: The number of messages in the
queue must remain above target on average for 30 seconds before a scale up can
happen.
* `kedaPollingInterval`: Queues are checked every 15 seconds.
* `kedaInitialCoolDownPeriod`: KEDA will wait for 5 minutes before activating
the scaling object (before no scaling can happen).
* `kedaCooldownPeriod`: After KEDA has found there is no activity in the
monitored queue, it will wait for 15 minutes before scaling down the pods to
0.
* `kedaIdleReplicas`: The default idle replica count is 0 (tears down the
service).
* `minReplicas`: The default minimum number of replica count is 1.
* `maxReplicas`: The default maximum number of replica count is 3.

> Values mentioned above must be set for each tengine
> `alfresco-transform-service._TENGINE_NAME_.autoscaling` where `_TENGINE_NAME_`
> is one of the following: `imagemagick`, `libreoffice`, `transformmisc`,
> `pdfrenderer` & `tika`.

Scaling replicas down to zero is great when you have workload that is consistent
enough with long period of inactivity (e.g. during the night). But it can trigger a
delay for the first requests when the workload starts again (e.g. the morning
after). If you want to avoid scaling down you ATS deployments down to zero and
always have at least one pod up to deal quickly with "sparse" requests just
apply the yaml below for the appropriate scaler object (here for pdf
convertion):

```yaml
alfresco-transform-service:
pdfrenderer:
autoscaling:
kedaIdleReplicas: null
```

**Important**: If you're using a version of the ATS T-router prior to 5.1.3, you
need to set the `kedaIdleReplicas` to `0` for all tengines, otherwise the
T-router will eventually crash.

If you want to use an external ActiveMQ broker instead of the embedded one
(recommended), you can set the following values:

```yaml
messageBroker:
url: failover:(tcp://mybroker.domain.tld:61616)
webConsole: mybroker.domain.tld:8161
brokerName: mybroker
restAPITemplate: https://{{.ManagementEndpoint}}/api/jolokia/read/org.apache.activemq:type=Broker,brokerName={{.BrokerName}},destinationType=Queue,destinationName={{.DestinationName}}/QueueSize
```

To set the authentication you must ensure the broker user has web console access
too.

#### Using AWS AmazonMQ (ActiveMQ)

If you're running Alfresco on AWS you may be using AmazonMQ as your message
broker the jolokia restAPI which ActiveMQ normally provides is not available.
In order to use the KEDA and scale based on message queues size you will need to
use the [Cloudwatch scaler](https://keda.sh/docs/latest/scalers/aws-cloudwatch/)
, create your own
[scaledobject](https://keda.sh/docs/latest/concepts/scaling-deployments/#scaledobject-spec)
using [Cloudwatch scaler](https://keda.sh/docs/latest/scalers/aws-cloudwatch/)
as a `trigger` leveraging one of the [AWS authentication
provider](https://keda.sh/docs/2.14/authentication-providers/) and disable the
KEDA integration for ATS in this chart (which essentially creates the KEDA CRDs
for you).
2 changes: 1 addition & 1 deletion helm/alfresco-content-services/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,4 @@ dependencies:
repository: https://helm.elastic.co
version: 7.17.3
digest: sha256:4eacdf946479b47b7276bfdf86ebb6b513266f0e293ed10eae66c495abbf9b78
generated: "2024-05-29T18:01:29.438117+02:00"
generated: "2024-06-04T12:18:05.816648+02:00"
10 changes: 9 additions & 1 deletion helm/alfresco-content-services/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Please refer to the [documentation](https://github.com/Alfresco/acs-deployment/b
| alfresco-digital-workspace.ingress.hosts[0].paths[0].pathType | string | `"Prefix"` | |
| alfresco-digital-workspace.ingress.tls | list | `[]` | |
| alfresco-digital-workspace.nameOverride | string | `"alfresco-dw"` | |
| alfresco-repository.autoscaling.kedaDisableIdle | bool | `true` | |
| alfresco-repository.configuration.db.existingConfigMap.name | string | `"alfresco-infrastructure"` | |
| alfresco-repository.configuration.db.existingSecret.name | string | `"alfresco-cs-database"` | |
| alfresco-repository.configuration.messageBroker.existingConfigMap.name | string | `"alfresco-infrastructure"` | |
Expand Down Expand Up @@ -122,6 +123,7 @@ Please refer to the [documentation](https://github.com/Alfresco/acs-deployment/b
| alfresco-search-enterprise.liveIndexing.path.image.tag | string | `"4.0.1"` | |
| alfresco-search-enterprise.messageBroker.existingConfigMap.name | string | `"alfresco-infrastructure"` | |
| alfresco-search-enterprise.messageBroker.existingSecretName | string | `"acs-alfresco-cs-brokersecret"` | |
| alfresco-search-enterprise.nameOverride | string | `"alfresco-search-enterprise"` | |
| alfresco-search-enterprise.reindexing.db.existingConfigMap.name | string | `"alfresco-infrastructure"` | |
| alfresco-search-enterprise.reindexing.db.existingSecret.name | string | `"alfresco-cs-database"` | |
| alfresco-search-enterprise.reindexing.enabled | bool | `true` | |
Expand Down Expand Up @@ -174,6 +176,7 @@ Please refer to the [documentation](https://github.com/Alfresco/acs-deployment/b
| alfresco-transform-service.libreoffice.image.tag | string | `"5.1.2"` | |
| alfresco-transform-service.messageBroker.existingConfigMap.name | string | `"alfresco-infrastructure"` | Name of the configmap which holds the ATS shared filestore URL |
| alfresco-transform-service.messageBroker.existingSecret.name | string | `"acs-alfresco-cs-brokersecret"` | |
| alfresco-transform-service.nameOverride | string | `"alfresco-transform-service"` | |
| alfresco-transform-service.pdfrenderer.enabled | bool | `true` | Declares the alfresco-pdf-renderer service used by the content repository to transform pdf files |
| alfresco-transform-service.pdfrenderer.image.repository | string | `"alfresco/alfresco-pdf-renderer"` | |
| alfresco-transform-service.pdfrenderer.image.tag | string | `"5.1.2"` | |
Expand Down Expand Up @@ -243,11 +246,15 @@ Please refer to the [documentation](https://github.com/Alfresco/acs-deployment/b
| global.strategy.rollingUpdate.maxSurge | int | `1` | |
| global.strategy.rollingUpdate.maxUnavailable | int | `0` | |
| infrastructure.configMapName | string | `"alfresco-infrastructure"` | |
| messageBroker.existingSecretName | string | `nil` | Name of an existing secret that contains BROKER_USERNAME and BROKER_PASSWORD keys. |
| keda.components | list | `[]` | The list of components that will be scaled by KEDA (chart names) |
| messageBroker.brokerName | string | `nil` | name of the message broker as set in the Broker configuration |
| messageBroker.existingSecretName | string | `nil` | Name of an existing secret that contains BROKER_USERNAME and BROKER_PASSWORD keys. and optionally the credentials to the web console (can be the same as broker access). |
| messageBroker.password | string | `nil` | External message broker password |
| messageBroker.restAPITemplate | string | `nil` | the template used internally by KEDA ActiveMQ scaler to query the broker queue size the KEDA internal default is: http://{{.ManagementEndpoint}}/api/jolokia/read/org.apache.activemq:type=Broker,brokerName={{.BrokerName}},destinationType=Queue,destinationName={{.DestinationName}}/QueueSize |
| messageBroker.secretName | string | `"acs-alfresco-cs-brokersecret"` | Name of the secret managed by this chart |
| messageBroker.url | string | `nil` | Enable using an external message broker for Alfresco Content Services. Must disable `activemq.enabled`. |
| messageBroker.user | string | `nil` | External message broker user |
| messageBroker.webConsole | string | `nil` | URL of the web console interface for the external message broker Your broker we console interface should respond to URl built using the `restAPITemplate` below where `.ManagementEndpoint` evaluates to the `webConsole`value below. |
| postgresql-sync.auth.database | string | `"syncservice-postgresql"` | |
| postgresql-sync.auth.enablePostgresUser | bool | `false` | |
| postgresql-sync.auth.password | string | `"admin"` | |
Expand Down Expand Up @@ -277,6 +284,7 @@ Please refer to the [documentation](https://github.com/Alfresco/acs-deployment/b
| postgresql.primary.resources.limits.memory | string | `"8Gi"` | |
| postgresql.primary.resources.requests.cpu | string | `"500m"` | |
| postgresql.primary.resources.requests.memory | string | `"1Gi"` | |
| prometheus.url | string | `nil` | URL of the prometheus server (must be reachable by KEDA pods) |
| share.enabled | bool | `true` | toggle deploying Alfresco Share UI |
| share.image.repository | string | `"quay.io/alfresco/alfresco-share"` | |
| share.image.tag | string | `"23.2.1"` | |
Expand Down
47 changes: 47 additions & 0 deletions helm/alfresco-content-services/templates/keda/_helpers-keda.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{{/*
Render KEDA trigger for the ActiveMQ scaler

Usage: include "alfresco-content-services.mq.keda.scaler.trigger" $

*/}}
{{- define "alfresco-content-services.mq.keda.scaler.trigger" -}}
{{ $ctx := dict "Values" .Values.keda "Chart" .Chart "Release" .Release -}}
{{ $mqCtx := dict "Values" .Values.activemq "Chart" .Chart "Release" .Release -}}
{{ $mqAdminPort := default "8161" (.Values.activemq.services.webConsole.ports).external.webConsole -}}
{{ $hasAllBrokerProps := false }}
{{- with .Values.messageBroker }}
{{ $hasAllBrokerProps = and .webConsole .brokerName }}
{{- end }}
{{- if and (not $hasAllBrokerProps) (not .Values.activemq.enabled) }}
{{- fail "Enabling queue based autoscaling requires to provide the address of the web console and the broker name of your external broker or enable embeded ActiveMQ" }}
{{- end }}
- type: activemq
metadata:
managementEndpoint: {{ .Values.messageBroker.webConsole | default (printf "%s-web-console.%s.svc:%v" (include "activemq.fullname" $mqCtx) .Release.Namespace $mqAdminPort) }}
brokerName: {{ .Values.messageBroker.brokerName | default (include "activemq.fullname" $mqCtx) }}
{{- with .Values.messageBroker }}
restAPITemplate: {{ .restAPITemplate }}
{{- end }}
authenticationRef:
name: {{ printf "%s-activemq-auth-trigger" (include "alfresco-content-services.fullname" $ctx) | trunc 63 | trimSuffix "-" }}
{{- end -}}

{{/*
Render KEDA scaler options for the ActiveMQ scaler

Usage: include "alfresco-content-services.keda.scaler.options" $

*/}}
{{- define "alfresco-content-services.keda.scaler.options" -}}
pollingInterval: {{ .autoscaling.kedaPollingInterval | default 15 }}
initialCooldownPeriod: {{ .autoscaling.kedaInitialCooldownPeriod | default 300 }}
{{- if not .autoscaling.kedaIdleDisabled }}
cooldownPeriod: {{ .autoscaling.kedaCooldownPeriod | default 900 }}
idleReplicaCount: 0
{{- end }}
minReplicaCount: {{ .autoscaling.minReplicas }}
maxReplicaCount: {{ .autoscaling.maxReplicas }}
advanced:
horizontalPodAutoscalerConfig:
behavior: {{- toYaml .autoscaling.behavior | nindent 6 }}
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- if has "alfresco-transform-service" .Values.keda.components -}}
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: {{ printf "%s-activemq-auth-trigger" (include "alfresco-content-services.fullname" .) | trunc 63 | trimSuffix "-" }}
spec:
secretTargetRef:
- parameter: username
name: {{ .Values.messageBroker.existingSecretName | default .Values.messageBroker.secretName }}
key: BROKER_USERNAME
- parameter: password
name: {{ .Values.messageBroker.existingSecretName | default .Values.messageBroker.secretName }}
key: BROKER_PASSWORD
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{{- if (index .Values "alfresco-transform-service" "enabled") -}}
{{- $atsCtx := (dict "Values" (index .Values "alfresco-transform-service") "Chart" .Chart "Release" .Release) }}
{{- if and $atsCtx.Values.imagemagick.enabled (has (include "alfresco-transform-service.name" $atsCtx) .Values.keda.components) }}
{{- $mqCtx := dict "Values" .Values.activemq "Chart" .Chart "Release" .Release }}
{{- $mqAdminPort := default "8161" (.Values.activemq.services.webConsole.ports).external.webConsole }}
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels: {{- include "alfresco-content-services.labels" $atsCtx | nindent 4 }}
name: {{ printf "%s-tengine-im" (include "alfresco-content-services.fullname" $atsCtx) | trunc 63 | trimSuffix "-" }}
spec:
scaleTargetRef:
name: {{ template "alfresco-transform-service.imagemagick.name" $atsCtx }}
triggers:
{{- $destQ := "org.alfresco.transform.engine.imagemagick.acs" }}
{{- $targetQSize := $atsCtx.Values.imagemagick.autoscaling.kedaTargetValue | default 10 | toString }}
{{- $triggerOpts:= dict "metadata" (dict "targetQueueSize" $targetQSize "destinationName" $destQ ) }}
{{- range (include "alfresco-content-services.mq.keda.scaler.trigger" . | fromYamlArray) }}
{{- . | mustMerge $triggerOpts | list | toYaml | nindent 4 }}
{{- end }}
{{- include "alfresco-content-services.keda.scaler.options" (index .Values "alfresco-transform-service" "imagemagick") | nindent 2 }}
{{- end }}
{{- end -}}
Loading
Loading