Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jsonnet: fix KEDA autoscaling metric errors during rollouts #10013

Merged
merged 1 commit into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,7 @@
* [ENHANCEMENT] Add `_config.autoscaling_querier_predictive_scaling_enabled` to scale querier based on inflight queries 7 days ago. #7775
* [ENHANCEMENT] Add support to autoscale ruler-querier replicas based on in-flight queries too (in addition to CPU and memory based scaling). #8060 #8188
* [ENHANCEMENT] Distributor: improved distributor HPA scaling metric to only take in account ready pods. This requires the metric `kube_pod_status_ready` to be available in the data source used by KEDA to query scaling metrics (configured via `_config.autoscaling_prometheus_url`). #8251
* [BUGFIX] Guard against missing samples in KEDA queries. #7691
* [BUGFIX] Guard against missing samples in KEDA queries. #7691 #10013
* [BUGFIX] Alertmanager: Set -server.http-idle-timeout to avoid EOF errors in ruler. #8192

### Mimirtool
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1974,14 +1974,6 @@ spec:
max by (pod) (up{container="alertmanager",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="alertmanager",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1780"
name: cortex_alertmanager_cpu_hpa_default
Expand All @@ -2008,14 +2000,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="alertmanager", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="alertmanager",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "9556302233"
name: cortex_alertmanager_memory_hpa_default
Expand Down Expand Up @@ -2062,14 +2046,6 @@ spec:
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true"}[1m])) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="distributor",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1780"
name: cortex_distributor_cpu_hpa_default
Expand All @@ -2096,14 +2072,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="distributor",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "3058016714"
name: cortex_distributor_memory_hpa_default
Expand Down Expand Up @@ -2193,14 +2161,6 @@ spec:
max by (pod) (up{container="query-frontend",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "2225"
name: query_frontend_cpu_hpa_default
Expand All @@ -2227,14 +2187,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="query-frontend", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "559939584"
name: query_frontend_memory_hpa_default
Expand Down Expand Up @@ -2271,14 +2223,6 @@ spec:
max by (pod) (up{container="ruler",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "890"
name: ruler_cpu_hpa_default
Expand All @@ -2305,14 +2249,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "5733781340"
name: ruler_memory_hpa_default
Expand Down Expand Up @@ -2349,14 +2285,6 @@ spec:
max by (pod) (up{container="ruler-querier",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler-querier",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "178"
name: ruler_querier_cpu_hpa_default
Expand All @@ -2383,14 +2311,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-querier", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler-querier",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "955630223"
name: ruler_querier_memory_hpa_default
Expand Down Expand Up @@ -2435,14 +2355,6 @@ spec:
max by (pod) (up{container="ruler-query-frontend",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler-query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1780"
name: ruler_query_frontend_cpu_hpa_default
Expand All @@ -2469,14 +2381,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-query-frontend", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler-query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "559939584"
name: ruler_query_frontend_memory_hpa_default
Expand Down
96 changes: 0 additions & 96 deletions operations/mimir-tests/test-autoscaling-generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1974,14 +1974,6 @@ spec:
max by (pod) (up{container="alertmanager",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="alertmanager",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "2000"
name: cortex_alertmanager_cpu_hpa_default
Expand All @@ -2008,14 +2000,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="alertmanager", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="alertmanager",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "10737418240"
name: cortex_alertmanager_memory_hpa_default
Expand Down Expand Up @@ -2062,14 +2046,6 @@ spec:
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true"}[1m])) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="distributor",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "2000"
name: cortex_distributor_cpu_hpa_default
Expand All @@ -2096,14 +2072,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="distributor",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "3435973836"
name: cortex_distributor_memory_hpa_default
Expand Down Expand Up @@ -2193,14 +2161,6 @@ spec:
max by (pod) (up{container="query-frontend",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1875"
name: query_frontend_cpu_hpa_default
Expand All @@ -2227,14 +2187,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="query-frontend", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "629145600"
name: query_frontend_memory_hpa_default
Expand Down Expand Up @@ -2271,14 +2223,6 @@ spec:
max by (pod) (up{container="ruler",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1000"
name: ruler_cpu_hpa_default
Expand All @@ -2305,14 +2249,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "6442450944"
name: ruler_memory_hpa_default
Expand Down Expand Up @@ -2349,14 +2285,6 @@ spec:
max by (pod) (up{container="ruler-querier",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler-querier",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "200"
name: ruler_querier_cpu_hpa_default
Expand All @@ -2383,14 +2311,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-querier", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler-querier",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "1073741824"
name: ruler_querier_memory_hpa_default
Expand Down Expand Up @@ -2435,14 +2355,6 @@ spec:
max by (pod) (up{container="ruler-query-frontend",namespace="default"}) > 0
)[15m:]
) * 1000
and
count (
count_over_time(
present_over_time(
container_cpu_usage_seconds_total{container="ruler-query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "2000"
name: ruler_query_frontend_cpu_hpa_default
Expand All @@ -2469,14 +2381,6 @@ spec:
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-query-frontend", namespace="default", reason="OOMKilled"})
or vector(0)
)
and
count (
count_over_time(
present_over_time(
container_memory_working_set_bytes{container="ruler-query-frontend",namespace="default"}[1m]
)[15m:1m]
) >= 15
)
serverAddress: http://prometheus.default:9090/prometheus
threshold: "629145600"
name: ruler_query_frontend_memory_hpa_default
Expand Down
Loading
Loading