How Kafka lag threshold is computed #6135
Unanswered
MakeshKathirvel
asked this question in
Q&A / Need Help
Replies: 1 comment
-
Your KEDA version is way to old, please update. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
We are using KEDA to scale the kubernetes replicas based on kafka lag threshold using scaled object resource
We are using kafka lag exporter to identify the lag on the consumergroup/metrics
SUM of lag - sum(kafka_consumergroup_lag{cluster="$Cluster",consumergroup="${consumerGroup}",topic="${topic}"})
AVG of lag - avg(kafka_consumergroup_lag{cluster="$Cluster",consumergroup="${consumerGroup}",topic="${topic}"})
We have 2 problems here
In this case, during the active lag for 1 hour the replicas were scaled up to 17 and not 18(we have 18 partition count for the topic)
As part of analysis, when try to hit the below endpoint we see different value (not sum of lag/avg of lag)
kubectl get --raw "/apis/[external.metrics.k8s.io/v1beta1/namespaces/YOUR_NAMESPACE/YOUR_METRIC_NAME?labelSelector=scaledobject.keda.sh%2Fname%3D{SCALED_OBJECT_NAME}]
{"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{},"items":[{"metricName":"s1-kafka-TEST","metricLabels":null,"timestamp":"2024-09-04T22:59:16Z","value":"678"}]}
We would like to understand how these values are computed so that we can set the appropriate threshold and understand the expected behavior
Is it the avg of all lag/sum of all lag/ any other computation?
CHART APP VERSION
keda-2.9.4 2.9.3
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
annotations:
meta.helm.sh/release-name: <microservice_name>
meta.helm.sh/release-namespace:
finalizers:
generation: 8
labels:
app.kubernetes.io/managed-by: Helm
scaledobject.keda.sh/name: <microservice_name>
name: <microservice_name>
namespace:
spec:
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
policies:
- periodSeconds: 60
type: Percent
value: 10
stabilizationWindowSeconds: 300
scaleUp:
policies:
- periodSeconds: 60
type: Percent
value: 10
stabilizationWindowSeconds: 0
cooldownPeriod: 120
maxReplicaCount: 18
minReplicaCount: 6
pollingInterval: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: <microservice_name>
triggers:
metricName: cpu_average
type: AverageValue
value: "85"
type: cpu
bootstrapServers:
consumerGroup:
lagThreshold: "200"
topic:
type: kafka
status:
conditions:
reason: ScaledObjectReady
status: "True"
type: Ready
reason: ScalerActive
status: "True"
type: Active
reason: NoFallbackFound
status: "False"
type: Fallback
externalMetricNames:
health:
s1-kafka-TEST:
numberOfFailures: 0
status: Happy
hpaName: keda-hpa-<microservice_name>
lastActiveTime: "2024-09-04T22:48:18Z"
originalReplicaCount: 3
resourceMetricNames:
scaleTargetGVKR:
group: apps
kind: Deployment
resource: deployments
version: v1
scaleTargetKind: apps/v1.Deployment
Beta Was this translation helpful? Give feedback.
All reactions