[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown #1277

c-kruse · 2023-10-09T19:17:22Z

The k8sprocessor uses a cache to store cluster resources and to update its model on change. In a degraded state, this cache can become disconnected from the apiserver and can miss deletion watch events. The cache will eventually reconcile this and notify the processor of the deletion and the resource's last known state using the tombstone cache.DeletedFinalStateUnknown. The processor is mishandling this notification.

Uncovered in #1267.

It is unclear to me if this particular issue is a primary contributing factor to the memory consumption problems observed in 1267.

c-kruse · 2023-10-10T20:33:06Z

FWIW, I can reliably reproduce this error for Pods, but I can't seem to hit the similar case for the (owning) resources. If it is possible, I'm pretty sure it'd cause a panic.

EDIT confirmed. Just needed to be a bit more patient.

panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Endpoints

goroutine 75 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).genericEndpointOp(0x721d664?, {0x669e460?, 0xc003c3a3e0?}, 0xc002da7100?)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:417 +0x267
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).deleteEndpoint(0xc0029b6c60, {0x669e460, 0xc003c3a3e0})
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:435 +0x1ca
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).addOwnerInformer.func3({0x669e460?, 0xc003c3a3e0?})
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:320 +0x25
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).addOwnerInformer.(*OwnerCache).deferredDelete.func4.1()
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:298 +0x22
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).deleteLoop(0xc0029b6c60, 0x0?, 0x0?)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:514 +0x110
created by github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.newOwnerProvider in goroutine 1
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/kube/owner.go:121 +0x1d3

c-kruse added the bug Something isn't working label Oct 9, 2023

c-kruse mentioned this issue Oct 10, 2023

Handle DeletedFinalStateUnknown events form cache #1278

Merged

c-kruse closed this as completed in #1278 Oct 11, 2023

c-kruse mentioned this issue Oct 11, 2023

[processor/k8sattribute] Handle resource deletion on DeletedFinalStateUnknown open-telemetry/opentelemetry-collector-contrib#27632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown #1277

[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown #1277

c-kruse commented Oct 9, 2023 •

edited

Loading

c-kruse commented Oct 10, 2023 •

edited

Loading

[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown #1277

[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown #1277

Comments

c-kruse commented Oct 9, 2023 • edited Loading

c-kruse commented Oct 10, 2023 • edited Loading

c-kruse commented Oct 9, 2023 •

edited

Loading

c-kruse commented Oct 10, 2023 •

edited

Loading