Root and query coordinators failed after helm installation #36824
-
Installed milvus using helm chart 4.2.8 with some custom configurations (external pulsar, s3, ALB, etc) on AWS EKS. Liveness and readiness probes always failed in root and query coordinators. The root coordinator was unable to establish a connection to something then got killed. {"level":"WARN","time":"2024/10/12 16:02:05.337 +00:00","caller":"retry/retry.go:46","message":"retry func failed","retried":0,"error":"connection error"}
{"level":"WARN","time":"2024/10/12 16:03:43.202 +00:00","caller":"roles/roles.go:307","message":"Get signal to exit","signal":"terminated"} The data coordinator is not able to find the root coordinator. {"level":"WARN","time":"2024/10/12 16:04:13.012 +00:00","caller":"client/client.go:90","message":"RootCoordClient mess key not exist","key":"rootcoord"}
{"level":"WARN","time":"2024/10/12 16:04:13.012 +00:00","caller":"grpcclient/client.go:249","message":"failed to get client address","error":"find no available rootcoord, check rootcoord state"}
{"level":"WARN","time":"2024/10/12 16:04:13.012 +00:00","caller":"grpcclient/client.go:464","message":"fail to get grpc client in the retry state","client_role":"rootcoord","error":"find no available rootcoord, check rootcoord state"} The vanilla version of milvus helm chart 4.2.8 works fine on this EKS cluster. Can I have some insights on what configurations may go wrong? Thanks. Logs are attached. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
{"level":"WARN","time":"2024/10/12 16:02:05.337 +00:00","caller":"retry/retry.go:46","message":"retry func failed","retried":0,"error":"connection error"} unluckily there is no stack about what connect error is. |
Beta Was this translation helpful? Give feedback.
-
external pulsar host was wrong |
Beta Was this translation helpful? Give feedback.
{"level":"WARN","time":"2024/10/12 16:02:05.337 +00:00","caller":"retry/retry.go:46","message":"retry func failed","retried":0,"error":"connection error"}
unluckily there is no stack about what connect error is.
but from the log position I guess it's kafka or pulsar