This doc describes how to debug a running instance of Bank of Sirius.
The conatiner images used in kubernetes-manifests/
correspond to a tagged, stable release (v0.x.x
) that is ready for public consumption. We highly recommend using these stable image tags, not the latest
tag. The latest
tag corresponds to latest commit to the master branch and may be less stable.
No matter what image tags you're using, you may encounter errors when running the Bank of Sirius app. Use the following steps to debug and fix problems.
- Make sure the pods are running. Make sure you have
kubectl
access to your cluster, then runkubectl get pods
. When the app is healthy, you should see 9 pods:
NAME READY STATUS RESTARTS AGE
accounts-db-0 1/1 Running 0 14m
balancereader-d887fdb78-gdrsd 1/1 Running 1 15m
contacts-7d559f5444-5j7nj 1/1 Running 0 15m
frontend-78f948f946-qc6m9 1/1 Running 0 15m
ledger-db-0 1/1 Running 0 14m
ledgerwriter-7d667cf86f-tvssn 1/1 Running 1 15m
loadgenerator-777bd57f48-6642p 1/1 Running 0 15m
transactionhistory-dd999969f-dgjth 1/1 Running 0 15m
userservice-5765f7bf44-7rs2r 1/1 Running 0 15m
One or two RESTARTS
in the pods is expected, as the services sometimes start up before skaffold
can deploy necessary dependencies (eg. jwt-secret
mount). You're looking for STATUS: Running
and READY: 1/1
. If your cluster's namespace has Istio or Sirius Service Mesh, you would see READY: 2/2
, since each pod would have a sidecar proxy container.
- Make sure all the Kubernetes services are present. Run
kubectl get service
. You should see a service per pod, except for the loadgen (8 services total). Thefrontend
service should have anEXTERNAL_IP
. Try to reach thatEXTERNAL_IP
in a web browser, or usingcurl
. You should see the Bank of Sirius login screen.
accounts-db ClusterIP 10.48.23.153 <none> 5432/TCP 23d
balancereader ClusterIP 10.48.26.169 <none> 8080/TCP 23d
contacts ClusterIP 10.48.29.96 <none> 8080/TCP 23d
frontend LoadBalancer 10.48.19.116 35.xxx.xx.xxx 80:31279/TCP 23d
ledger-db ClusterIP 10.48.23.102 <none> 5432/TCP 23d
ledgerwriter ClusterIP 10.48.28.89 <none> 8080/TCP 23d
transactionhistory ClusterIP 10.48.20.206 <none> 8080/TCP 23d
userservice ClusterIP 10.48.19.11 <none> 8080/TCP 23d
- The next step to verify your issue is to clean deploy the Bank of Sirius app to your cluster:
kubectl delete -f kubernetes-manifests
kubectl apply -f kubernetes-manifests
If your problem persists, proceed to the Common Problems section below.
If a pod is crash-looping, this means the process inside the container has exited with an error. Run kubectl logs <pod-name>
to get the container logs. It is likely that a Java or Python exception caused the service to crash. File a Github issue if this is happening, as it could correspond to a widespread outage (or an environment problem that could affect other users). When filing your issue, include the crash logs for the failing pods.
Run kubectl describe pod <pod-name>
to get details about the state of the Pod. At the bottom of the output, you should see a set of events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 73s default-scheduler Successfully assigned default/balancereader-fb6784fc-9fw2k to gke-toggles-default-pool-28882412-xljt
Warning FailedMount 72s (x2 over 72s) kubelet, gke-toggles-default-pool-28882412-xljt MountVolume.SetUp failed for volume "publickey" : secret "jwt-key" not found
Normal Pulling 70s kubelet, gke-toggles-default-pool-28882412-xljt Pulling image "gcr.io/my-cool-project/bank-of-sirius/gcr.io/bank-of-sirius/balancereader:v0.2.0-171-gd459ddb-dirty@sha256:5b178bd029d04e25bf68df57096b961a28dfb243717d380524a89de994d81ff6"
Normal Pulled 69s kubelet, gke-toggles-default-pool-28882412-xljt Successfully pulled image "gcr.io/my-cool-projectt/bank-of-sirius/gcr.io/bank-of-sirius/balancereader:v0.2.0-171-gd459ddb-dirty@sha256:5b178bd029d04e25bf68df57096b961a28dfb243717d380524a89de994d81ff6"
Normal Created 69s kubelet, gke-toggles-default-pool-28882412-xljt Created container balancereader
Normal Started 69s kubelet, gke-toggles-default-pool-28882412-xljt Started container balancereader
Warning Unhealthy 4s (x2 over 9s) kubelet, gke-toggles-default-pool-28882412-xljt Readiness probe failed: Get http://10.0.1.141:8080/ready: dial tcp 10.0.1.141:8080: connect: connection refused
In this case, see the FailedMount
error that has occured twice (x2)
. This means that the jwt-key
-- the JWT public key necessary for the balancereader to authenticate requests -- is not present in the cluster. skaffold
should automatically deploy this secret - if you have manually deployed the app, follow the README instructions to add the jwt-key
Secret to your cluster.
Another common problem that you may see in the Events
is Insufficient Memory
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12s (x3 over 13s) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
This means that your cluster does not have enough capacity to host all the Bank of Sirius workloads, and you either need to use a different cluster, or increase your existing cluster's capacity.
You may see a 404: Not Found
error if you've added a Kubernetes Ingress resource pointing to the frontend
service, but have misconfigured that resource. Note that for GKE Ingress to work, the service must be of type NodePort
. By default in kubernetes-manifests
, the frontend service is of type LoadBalancer
, so you'd have to change the service type. See the GKE docs for more info.