This doc describes how to debug a running instance of Bank of Anthos on a GKE cluster.
The conatiner images used in kubernetes-manifests/
correspond to a tagged, stable release (v0.x.x
) that is ready for public consumption. Per the README deploy instructions, we highly recommend using these stable image tags, not the latest
tag. The latest
tag corresponds to latest commit to the main branch and may be less stable.
No matter what image tags you're using, you may encounter errors when running the Bank of Anthos app. Use the following steps to debug and fix problems.
- Make sure the pods are running. Make sure you have
kubectl
access to your cluster, then runkubectl get pods
. When the app is healthy, you should see 9 pods:
NAME READY STATUS RESTARTS AGE
accounts-db-0 1/1 Running 0 14m
balancereader-d887fdb78-gdrsd 1/1 Running 1 15m
contacts-7d559f5444-5j7nj 1/1 Running 0 15m
frontend-78f948f946-qc6m9 1/1 Running 0 15m
ledger-db-0 1/1 Running 0 14m
ledgerwriter-7d667cf86f-tvssn 1/1 Running 1 15m
loadgenerator-777bd57f48-6642p 1/1 Running 0 15m
transactionhistory-dd999969f-dgjth 1/1 Running 0 15m
userservice-5765f7bf44-7rs2r 1/1 Running 0 15m
One or two RESTARTS
in the pods is expected, as the services sometimes start up before skaffold
can deploy necessary dependencies (eg. jwt-secret
mount). You're looking for STATUS: Running
and READY: 1/1
. If your cluster's namespace has Istio or Anthos Service Mesh, you would see READY: 2/2
, since each pod would have a sidecar proxy container.
- Make sure all the Kubernetes services are present. Run
kubectl get service
. You should see a service per pod, except for the loadgen (8 services total). Thefrontend
service should have anEXTERNAL_IP
. Try to reach thatEXTERNAL_IP
in a web browser, or usingcurl
. You should see the Bank of Anthos login screen.
accounts-db ClusterIP 10.48.23.153 <none> 5432/TCP 23d
balancereader ClusterIP 10.48.26.169 <none> 8080/TCP 23d
contacts ClusterIP 10.48.29.96 <none> 8080/TCP 23d
frontend LoadBalancer 10.48.19.116 35.xxx.xx.xxx 80:31279/TCP 23d
ledger-db ClusterIP 10.48.23.102 <none> 5432/TCP 23d
ledgerwriter ClusterIP 10.48.28.89 <none> 8080/TCP 23d
transactionhistory ClusterIP 10.48.20.206 <none> 8080/TCP 23d
userservice ClusterIP 10.48.19.11 <none> 8080/TCP 23d
- The next step to verify your issue is to clean deploy the Bank of Anthos app to your cluster:
kubectl delete -f kubernetes-manifests
kubectl apply -f kubernetes-manifests
If your problem persists, proceed to the Common Problems section below.
If a pod is crash-looping, this means the process inside the container has exited with an error. Run kubectl logs <pod-name>
to get the container logs. It is likely that a Java or Python exception caused the service to crash. File a Github issue if this is happening, as it could correspond to a widespread outage (or an environment problem that could affect other users). When filing your issue, include the crash logs for the failing pods.
Run kubectl describe pod <pod-name>
to get details about the state of the Pod. At the bottom of the output, you should see a set of events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 73s default-scheduler Successfully assigned default/balancereader-fb6784fc-9fw2k to gke-toggles-default-pool-28882412-xljt
Warning FailedMount 72s (x2 over 72s) kubelet, gke-toggles-default-pool-28882412-xljt MountVolume.SetUp failed for volume "publickey" : secret "jwt-key" not found
Normal Pulling 70s kubelet, gke-toggles-default-pool-28882412-xljt Pulling image "gcr.io/my-cool-project/bank-of-anthos/gcr.io/bank-of-anthos-ci/balancereader:ver.0-171-gd459ddb-dirty@sha256:5b178bd029d04e25bf68df57096b961a28dfb243717d380524a89de994d81ff6"
Normal Pulled 69s kubelet, gke-toggles-default-pool-28882412-xljt Successfully pulled image "gcr.io/my-cool-projectt/bank-of-anthos/gcr.io/bank-of-anthos-ci/balancereader:ver.0-171-gd459ddb-dirty@sha256:5b178bd029d04e25bf68df57096b961a28dfb243717d380524a89de994d81ff6"
Normal Created 69s kubelet, gke-toggles-default-pool-28882412-xljt Created container balancereader
Normal Started 69s kubelet, gke-toggles-default-pool-28882412-xljt Started container balancereader
Warning Unhealthy 4s (x2 over 9s) kubelet, gke-toggles-default-pool-28882412-xljt Readiness probe failed: Get http://10.0.1.141:8080/ready: dial tcp 10.0.1.141:8080: connect: connection refused
In this case, see the FailedMount
error that has occured twice (x2)
. This means that the jwt-key
-- the JWT public key necessary for the balancereader to authenticate requests -- is not present in the cluster. skaffold
should automatically deploy this secret - if you have manually deployed the app, follow the README instructions to add the jwt-key
Secret to your cluster.
Another common problem that you may see in the Events
is Insufficient Memory
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12s (x3 over 13s) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
This means that your cluster's Node Pools do not have enough capacity to host all the Bank of Anthos workloads, and you either need to use a different cluster, or increase your existing cluster's capacity (see the GKE docs).
You may see a 503: Service Unavailble
error if you have added Istio or Anthos Service Mesh to the cluster namespace where Bank of Anthos is deployed. A 503 error typically comes from an Envoy proxy - either the IngressGateway proxy, or the sidecar proxy for a service pod. For the frontend specifically, the 503 is likely coming from the IngressGateway.
Make sure you've deployed the VirtualService
and Gateway
resources provided in istio-manifests/
, and that they're deployed into the namespace where the app is running. If you've modified the frontend's Service or Deployment port, make sure the VirtualService
port is updated, too.
See the Istio troubleshooting docs for more support.
You may see a 404: Not Found
error if you've added a Kubernetes Ingress resource pointing to the frontend
service, but have misconfigured that resource. Note that for GKE Ingress to work, the service must be of type NodePort
. By default in kubernetes-manifests
, the frontend service is of type LoadBalancer
, so you'd have to change the service type. See the GKE docs for more info.