Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission controller panics on ScaledObject when a k8s custom-metrics exists #6379

Open
TallFurryMan opened this issue Nov 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@TallFurryMan
Copy link

TallFurryMan commented Nov 29, 2024

Report

I am deploying Keda 2.16.0 in a v1.23.17+k3s1 cluster to scale components of a proprietary application.

While v1.23.17+k3s1 is not in the list of supported kubernetes deployments, I can confirm Keda 2.16.0 behaves properly and efficiently in this version (I am working in upgrading to a more recent k3s release in parallel).

However, when in the cluster a (application-proprietary) APIService exists that controls v1beta1.custom.metrics.k8s.io and v1beta2.custom.metrics.k8s.io, the Keda admission controller panics when receiving the very first ScaledObject during the Helm upgrade of a new version of the application providing such objects.

I know the custom metrics engine is based on the k8s example.

The workaround I have for now is to remove the application-proprietary APIServices *.custom.metrics.k8s.io in a pre-upgrade Helm hook.

EDIT: as long as the cluster was deployed with those *.custom.metrics.k8s.io APIServices, even if those are deleted before the first ScaledObject appears on the cluster, the Keda admission controller panics and does not work anymore afterwards.

Expected Behavior

I expected that keda-admission-controller would process the ScaledObjects provided by the application Helm chart, or output an error for each of them.

I had no particular expectations on whether Keda would be able to "replace" the existing scaling objects in the HPA, nor on whether the APIServices would be deleted with the removal of the application-proprietary custom-metrics engine.

Actual Behavior

Pod keda-admission-controller outputs a panic and does not process any ScaledObject:

2024/11/29 15:16:47 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2024-11-29T15:16:47+01:00    INFO    setup    Starting admission webhooks
2024-11-29T15:16:47+01:00    INFO    setup    KEDA Version: 2.16.0
2024-11-29T15:16:47+01:00    INFO    setup    Git Commit: 5c52d032931b8ecf855d0c298f8d5e48937aecd7
2024-11-29T15:16:47+01:00    INFO    setup    Go Version: go1.23.3
2024-11-29T15:16:47+01:00    INFO    setup    Go OS/Arch: linux/amd64
2024-11-29T15:16:47+01:00    INFO    setup    Running on Kubernetes 1.23    {"version": "v1.23.17+k3s1"}
2024-11-29T15:16:47+01:00    INFO    setup    WARNING: KEDA 2.16.0 hasn't been tested on Kubernetes v1.23.17+k3s1
2024-11-29T15:16:47+01:00    INFO    setup    You can check recommended versions on https://keda.sh
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "keda.sh/v1alpha1, Kind=ScaledObject"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "keda.sh/v1alpha1, Kind=ScaledObject", "path": "/validate-keda-sh-v1alpha1-scaledobject"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-keda-sh-v1alpha1-scaledobject"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "keda.sh/v1alpha1, Kind=ScaledJob"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "keda.sh/v1alpha1, Kind=ScaledJob", "path": "/validate-keda-sh-v1alpha1-scaledjob"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-keda-sh-v1alpha1-scaledjob"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "keda.sh/v1alpha1, Kind=TriggerAuthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "keda.sh/v1alpha1, Kind=TriggerAuthentication", "path": "/validate-keda-sh-v1alpha1-triggerauthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-keda-sh-v1alpha1-triggerauthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "keda.sh/v1alpha1, Kind=ClusterTriggerAuthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "keda.sh/v1alpha1, Kind=ClusterTriggerAuthentication", "path": "/validate-keda-sh-v1alpha1-clustertriggerauthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-keda-sh-v1alpha1-clustertriggerauthentication"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "eventing.keda.sh/v1alpha1, Kind=CloudEventSource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "eventing.keda.sh/v1alpha1, Kind=CloudEventSource", "path": "/validate-eventing-keda-sh-v1alpha1-cloudeventsource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-eventing-keda-sh-v1alpha1-cloudeventsource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called    {"GVK": "eventing.keda.sh/v1alpha1, Kind=ClusterCloudEventSource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.builder    Registering a validating webhook    {"GVK": "eventing.keda.sh/v1alpha1, Kind=ClusterCloudEventSource", "path": "/validate-eventing-keda-sh-v1alpha1-clustercloudeventsource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Registering webhook    {"path": "/validate-eventing-keda-sh-v1alpha1-clustercloudeventsource"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.metrics    Starting metrics server
2024-11-29T15:16:47+01:00    INFO    controller-runtime.metrics    Serving metrics server    {"bindAddress": ":8080", "secure": false}
2024-11-29T15:16:47+01:00    INFO    starting server    {"name": "health probe", "addr": ":8081"}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Starting webhook server
2024-11-29T15:16:47+01:00    INFO    controller-runtime.certwatcher    Updated current TLS certificate
2024-11-29T15:16:47+01:00    INFO    controller-runtime.webhook    Serving webhook server    {"host": "", "port": 9443}
2024-11-29T15:16:47+01:00    INFO    controller-runtime.certwatcher    Starting certificate watcher
2024-11-29T15:44:24+01:00    ERROR    admission    Observed a panic    {"webhookGroup": "keda.sh", "webhookKind": "ScaledObject", "ScaledObject": {"name":"<snipped>"","namespace":"<snipped>"}, "namespace": "<snipped>", "name": "<snipped>"", "resource": {"group":"keda.sh","version":"v1alpha1","resource":"scaledobjects"}, "user": "system:admin", "requestID": "08724bfa-d289-458e-bf45-b5721e24b6ff", "panic": "runtime error: invalid memory address or nil pointer dereference", "panicGoValue": "\"invalid memory address or nil pointer dereference\"", "stacktrace": "goroutine 1505 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dd7be8, 0xc000026450}, {0x18e0460, 0x2b4c070})\n\t/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle.func1()\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:163 +0x103\npanic({0x18e0460?, 0x2b4c070?})\n\t/usr/local/go/src/runtime/panic.go:785 +0x132\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.verifyHpas(0xc000274b48, {0x1b17232,0x6}, 0x6?)\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:268 +0xd17\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.validateWorkload(0xc000274b48, {0x1b17232, 0x6}, 0x0)\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:160 +0xdf\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObject).ValidateCreate(0xc000274b48, 0xc0004ac946)\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:118 +0x10f\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.ScaledObjectCustomValidator.ValidateCreate({}, {0x1dd7be8?, 0xc0000264b0?}, {0x1dc23a0, 0xc000274b48})\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:90 +0xe5\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle(_, {_, _}, {{{0xc0000469f0, 0x24}, {{0xc0004ac8a0, 0x7}, {0xc0004ac8a8, 0x8}, {0xc0004ac8b0, ...}}, ...}})\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/validator_custom.go:91 +0x2b6\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc0000469f0, 0x24}, {{0xc0004ac8a0, 0x7}, {0xc0004ac8a8, 0x8}, {0xc0004ac8b0, ...}}, ...}})\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:181 +0x224\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc0000ea5a0, {0x7fd1fcc10270, 0xc000523e50}, 0xc000169040)\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:119 +0xaf0\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1({0x7fd1fcc10270, 0xc000523e50}, 0xc000169040)\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60 +0xbc\nnet/http.HandlerFunc.ServeHTTP(0x2b54280?, {0x7fd1fcc10270?, 0xc000523e50?}, 0xc00065d8d8?)\n\t/usr/local/go/src/net/http/server.go:2220 +0x29\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1dc9fa0?, 0xc0003b09a0?}, 0xc000169040)\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147 +0xc3\nnet/http.HandlerFunc.ServeHTTP(0x1a14580?, {0x1dc9fa0?, 0xc0003b09a0?}, 0xc00065da20?)\n\t/usr/local/go/src/net/http/server.go:2220 +0x29\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1dc9fa0, 0xc0003b09a0}, 0xc000169040)\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109 +0xc2\nnet/http.HandlerFunc.ServeHTTP(0xc0001d9880?, {0x1dc9fa0?, 0xc0003b09a0?}, 0x3?)\n\t/usr/local/go/src/net/http/server.go:2220 +0x29\nnet/http.(*ServeMux).ServeHTTP(0x410b25?, {0x1dc9fa0, 0xc0003b09a0}, 0xc000169040)\n\t/usr/local/go/src/net/http/server.go:2747 +0x1ca\nnet/http.serverHandler.ServeHTTP({0x1dc2db8?}, {0x1dc9fa0?, 0xc0003b09a0?}, 0x6?)\n\t/usr/local/go/src/net/http/server.go:3210 +0x8e\nnet/http.(*conn).serve(0xc000034900, {0x1dd7be8, 0xc000310bd0})\n\t/usr/local/go/src/net/http/server.go:2092 +0x5d0\ncreated by net/http.(*Server).Serve in goroutine 70\n\t/usr/local/go/src/net/http/server.go:3360 +0x485\n"}
runtime.sigpanic
    /usr/local/go/src/runtime/signal_unix.go:917
github.com/kedacore/keda/v2/apis/keda/v1alpha1.verifyHpas
    /workspace/apis/keda/v1alpha1/scaledobject_webhook.go:268
github.com/kedacore/keda/v2/apis/keda/v1alpha1.validateWorkload
    /workspace/apis/keda/v1alpha1/scaledobject_webhook.go:160
github.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObject).ValidateCreate
    /workspace/apis/keda/v1alpha1/scaledobject_webhook.go:118
github.com/kedacore/keda/v2/apis/keda/v1alpha1.ScaledObjectCustomValidator.ValidateCreate
    /workspace/apis/keda/v1alpha1/scaledobject_webhook.go:90
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle
    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/validator_custom.go:91
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle
    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:181
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP
    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:119
sigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1
    /workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
    /workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2
    /workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220
net/http.(*ServeMux).ServeHTTP
    /usr/local/go/src/net/http/server.go:2747
net/http.serverHandler.ServeHTTP
    /usr/local/go/src/net/http/server.go:3210
net/http.(*conn).serve
    /usr/local/go/src/net/http/server.go:2092

The admission controller processes no other ScaledObject after this failure.

When no *.custom.metrics.k8s.io APIService exists, the ScaledObject is parsed properly and Keda runs its scaling procedure without errors.

For reference, the ScaledObjects that are streamed in the admission controller all have the following format:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <snipped>
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <snipped>
  pollingInterval: 5
  cooldownPeriod: 30
  minReplicaCount: 1
  maxReplicaCount: 100
  triggers:
  - type: metrics-api
    metadata:
      targetValue: "100"
      activationTargetValue: "0"
      url: "http://<snipped>/api/v1/status"
      valueLocation: "QueueSize"
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 60

Steps to Reproduce the Problem

This is what I think could be used to reproduce the problem, although I did not test the minimal example described below. As there is a proprietary application in the mix, that could be different:

  1. Deploy a v1.23.17+k3s1 cluster
  2. Deploy a custom metrics engine providing *.custom.metrics.k8s.io based on k8s example
  3. Deploy Keda 2.16.0 as a Helm chart
  4. Deploy a ScaledObject resource

Logs from KEDA operator

There is no additional log in the operator when the application is upgraded.

KEDA Version

2.16.0

Kubernetes Version

< 1.28

Platform

Other

Scaler Details

No response

Anything else?

No response

@TallFurryMan TallFurryMan added the bug Something isn't working label Nov 29, 2024
@TallFurryMan
Copy link
Author

Actually, the workaround does not work reliably. Keda just panics even if removing the APIService beforehand. I'll edit the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

1 participant