Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for cache service on startup #1349

Open
mariomac opened this issue Nov 11, 2024 · 1 comment
Open

Wait for cache service on startup #1349

mariomac opened this issue Nov 11, 2024 · 1 comment

Comments

@mariomac
Copy link
Contributor

The following messages look pretty common during Beyla startup.

time=2024-11-11T16:47:47.838Z level=INFO msg="waiting for K8s metadata synchronization" component=kube.CacheSvcClient timeout=30s
time=2024-11-11T16:48:07.850Z level=INFO msg="K8s cache service connection lost. Reconnecting..." component=kube.CacheSvcClient error="could not subscribe: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.189.58.80:50055: i/o timeout\""
time=2024-11-11T16:48:17.839Z level=WARN msg="timed out while waiting for K8s metadata synchronization. Some metadata might be temporarily missing. If this is expected due to the size of your cluster, you might want to increase the timeout via the BEYLA_KUBE_INFORMERS_SYNC_TIMEOUT configuration option" component=kube.CacheSvcClient
time=2024-11-11T16:48:17.846Z level=INFO msg="Flows agent successfully started" component=agent.Flows

The cause is that, since both the cache service and beyla are deployed at the same time, it might happen that Beyla tries to connect before the service is up and running.

There are two possible solutions:

  • In the Beyla code, keep waiting during the startup time until the Beyla cache instance is reachable.
  • Make sure that Beyla is deployed only when the cache service is healthy. I don't know if Kubernetes provides some mechanisms for it, similar to what docker-compose allows.
@marevers
Copy link
Contributor

In regards to option 2, I am not aware of a Kubernetes mechanism that allows you to 'delay' the deployment of a component based on the health of another, e.g. having the Beyla DaemonSet only deployed when the cache service is up and healthy even if you apply the manifests at the same time. Helm also does not provide a mechanism like that.

Therefore I think only option 1 is viable. Maybe the timeout for waiting for the cache to become available could be made configurable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants