Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]ERROR: bad etcdctl args occurred when creating etcd cluster using the latest 1.0 yaml #8382

Closed
tianyue86 opened this issue Nov 1, 2024 · 4 comments · Fixed by apecloud/kubeblocks-addons#1153
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@tianyue86
Copy link

Describe the bug

Kubernetes: v1.30.4-eks-a737599
KubeBlocks: 1.0.0-beta.0
kbcli: 1.0.0-alpha.0

To Reproduce
Steps to reproduce the behavior:

  1. Generate etcd cluster yaml
helm template etcdclu02 kubeblocks-addons/etcd-cluster --version 1.0.0-alpha.0
---
# Source: etcd-cluster/templates/cluster.yaml
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: etcdclu02
  namespace: default
  labels: 
    helm.sh/chart: etcd-cluster-1.0.0-alpha.0
    app.kubernetes.io/version: "3.5.15"
    app.kubernetes.io/instance: etcdclu02
spec:
  terminationPolicy: Delete
  componentSpecs:
    - name: etcd      
      componentDef: etcd
      serviceVersion: 3.5.15
      tls: false
      replicas: 3      
      volumeClaimTemplates:
        - name: data # ref clusterDefinition components.containers.volumeMounts.name
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi      
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"      
      disableExporter: false
  1. Apply this yaml to create etcd cluster

  2. check the cluster status : Failed
    NAMESPACE NAME CLUSTER-DEFINITION TERMINATION-POLICY STATUS AGE
    default etcdclu02 Delete Failed 5m

  3. Check pod: CrashLoopBackOff
    tianyue@apeclouds-MacBook-Pro kbcli % k get pod
    NAME READY STATUS RESTARTS AGE
    etcdclu02-etcd-0 1/2 CrashLoopBackOff 6 (94s ago) 7m47s

  4. describe pod
    k descirbe pod etcdclu02-etcd-0
    Events:
    Type Reason Age From Message


Normal Scheduled 8m11s default-scheduler Successfully assigned default/etcdclu02-etcd-0 to ip-172-31-7-55.ap-northeast-1.compute.internal
Normal SuccessfulAttachVolume 8m9s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-98c5b9f0-5178-44a7-bf35-c197509fd236"
Normal Pulling 8m3s kubelet Pulling image "docker.io/apecloud/debian:bullseye-20241016"
Normal Pulled 7m58s kubelet Successfully pulled image "docker.io/apecloud/debian:bullseye-20241016" in 5.011s (5.011s including waiting). Image size: 55083586 bytes.
Normal Created 7m58s kubelet Created container inject-bash
Normal Started 7m58s kubelet Started container inject-bash
Normal Pulled 7m57s kubelet Container image "docker.io/apecloud/kubeblocks-tools:1.0.0-beta.0" already present on machine
Normal Created 7m57s kubelet Created container init-kbagent
Normal Started 7m57s kubelet Started container init-kbagent
Normal Pulling 7m56s kubelet Pulling image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/etcd:v3.5.15"
Normal Pulled 7m55s kubelet Successfully pulled image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/etcd:v3.5.15" in 734ms (734ms including waiting). Image size: 21293597 bytes.
Normal Pulling 7m55s kubelet Pulling image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/etcd:v3.5.6"
Normal Started 7m54s kubelet Started container kbagent
Normal Created 7m54s kubelet Created container kbagent
Normal Pulled 7m54s kubelet Successfully pulled image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/etcd:v3.5.6" in 642ms (642ms including waiting). Image size: 74540158 bytes.
Normal Started 7m36s (x3 over 7m55s) kubelet Started container etcd
Normal Created 7m36s (x3 over 7m55s) kubelet Created container etcd
Normal Pulled 7m36s (x2 over 7m54s) kubelet Container image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/etcd:v3.5.15" already present on machine
Normal roleProbe 6m54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Normal roleProbe 5m54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Normal roleProbe 4m54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Normal roleProbe 3m54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Warning BackOff 3m2s (x24 over 7m53s) kubelet Back-off restarting failed container etcd in pod etcdclu02-etcd-0_default(ae4861b8-c6fb-44cc-ae8d-3974e3e6a819)
Normal roleProbe 2m54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Normal roleProbe 114s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}
Normal roleProbe 54s kbagent {"probe":"roleProbe","code":-1,"message":"exec exit 1 and stderr: grep: /var/run/etcd/etcd.conf: No such file or directory\nERROR: bad etcdctl args: clientProtocol:, endpoints:127.0.0.1:2379, tlsDir:/etc/pki/tls, please check!\nbad role, please check!\n: failed"}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@tianyue86 tianyue86 added the kind/bug Something isn't working label Nov 1, 2024
@loomts
Copy link
Contributor

loomts commented Nov 4, 2024

kb version
Kubernetes: v1.30.0
KubeBlocks: 1.0.0-beta.0
kbcli: 1.0.0-alpha.0

use the same kb version, and deploy etcd-cluster using helm install etcd-cluster kubeblocks-addons/etcd-cluster --version 1.0.0-alpha.0

everything is ok~
image

@loomts
Copy link
Contributor

loomts commented Nov 5, 2024

My fault, due to the bash in x86_64 links to /lib64/ld-linux-x86-64.so.2, which isn't necessary in aarch64, etcd cluster can not start up in EKS.

@Y-Rookie
Copy link
Collaborator

Y-Rookie commented Nov 5, 2024

My fault, due to the bash in x86_64 links to /lib64/ld-linux-x86-64.so.2, which isn't necessary in aarch64, etcd cluster can not start up in EKS.

Consider using statically linked shebang?

@loomts
Copy link
Contributor

loomts commented Nov 7, 2024

Already build image with static bash, plz check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants