Click "Start" when you are done.
gcloud config set project <walkthrough-project-name/>
Run this script to enable the GKE API and create a GKE Autopilot cluster named "ap-demo-cluster":
. ./bootstrap/init.sh
Cluster creation can take a few minutes. Grab a coffee and come back in a few mins.
Now that your cluster is up and running, the first step is deploying the sample app, the Online Boutique microservices demo. This is a microservices demo with several services, spanning various language platforms. Check out the manifests in demo-01-deploy-sample-app
.
kubectl apply -f demo-01-deploy-sample-app/
Note that we have not yet provisioned node pools or nodes, as Autopilot will do that for you.
Monitor the rollout progress of both pods and nodes:
watch -d kubectl get pods,nodes
(Use Ctrl-C to exit the watch command)
Inspect the nodes Autopilot provisioned under the hood. Get the machine type provisioned by default:
kubectl get nodes -o json|jq -Cjr '.items[] | .metadata.name," ",.metadata.labels."beta.kubernetes.io/instance-type"," ",.metadata.labels."beta.kubernetes.io/arch", "\n"'|sort -k3 -r
Note that Autopilot defaults to the e2 series machine for each node by default with Autopilot.
After a few minutes the ingress IP will get assigned. Confirm everything is up in a different browser tab.
Get the ingress URL:
echo http://$(kubectl get svc frontend-external -o=jsonpath={.status.loadBalancer.ingress[0].ip})
Now let's tune our application by specifying compute classes for our workloads. Compute classes allow us to customize hardware requirements and over a curated subset of Compute Engine machine series.
Note: This is a fictional example with arbitrary compute classes so do not read into specific class choices. The point is show you how to select compute classes.
In this demo, the adservice
workload uses the Balanced compute class (currently N2/N2D machine types):
Open the file: demo-02-compute-classes/adservice.yaml and locate the compute class line.
And the checkoutservice
workload use the Scale-Out compute class (currently T2/T2D machine types):
Open the file: demo-02-compute-classes/checkoutservice.yaml and locate the compute class line.
kubectl apply -f demo-02-compute-classes/
Watch new nodes spin up (may take a few minutes):
watch -n 1 kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
List node machine types and architectures:
kubectl get nodes -o json|jq -Cjr '.items[] | .metadata.name," ",.metadata.labels."beta.kubernetes.io/instance-type"," ",.metadata.labels."beta.kubernetes.io/arch", "\n"'|sort -k2 -r
The cartservice
workload has now been configured to use Spot Pod resources:
Open the file: demo-02-compute-classes/cartservice.yaml
List nodes looking for spot
kubectl get nodes -o json|jq -Cjr '.items[] | .metadata.name," ",.metadata.labels."cloud.google.com/gke-spot"," ",.metadata.labels."beta.kubernetes.io/arch", "\n"'|sort -k2 -r
Our store has some AI/ML models as well. GKE Autopilot supports the provisioning of hardware accelerators like A100 and T4 GPUs to make machine learning tasks much faster.
Open the config file: demo-03-GPU/tensorflow.yaml and note the GPU configurations.
This demo creates a Tensorflow environment with a Jupyter notebook.
kubectl apply -f demo-03-GPU/
Watch the Tensorflow pod and GPU node spin up:
watch -n 1 kubectl get pods,nodes
Confirm we're using GPU (and spot, if selected)
kubectl get nodes -o json|jq -Cjr '.items[] | .metadata.name," ",.metadata.labels."cloud.google.com/gke-spot"," ",.metadata.labels."cloud.google.com/gke-accelerator", "\n"'|sort -k3 -r
After a few minutes, ingress should be aligned for your Jupyter notebook. Get the ingress IP:
kubectl get svc tensorflow-jupyter -o=jsonpath={.status.loadBalancer.ingress[0].ip}
Refer to William Denniss's blog post detailing the TensorFlow demo.
The GPU workload we just created will not be used in the rest of the demos and so you can tear it down now to save costs:
kubectl delete -f demo-03-GPU/
One common Kubernetes pattern is overprovision node resources for spare capacity. Scaling up manually or via HPA will provision new pods, but if there is no spare capacity this may result in a delay as new hardware gets provisioned. In GKE Standard, you can simply spin extra nodes to act as spare capacity.
Remember that with Autopilot though, Google manages the nodes. So how do you spin up spare capacity for scaling up quickly with Autopilot mode? The answer is balloon pods (see William Denniss's blog post on this topic details this strategy).
Open the priority class file: demo-04-spare-capacity-balloon/balloon-priority.yaml.
Create balloon priority class
kubectl apply -f demo-04-spare-capacity-balloon/balloon-priority.yaml
Open the balloon deployment file: demo-04-spare-capacity-balloon/balloon-deploy.yaml .
Create balloon pods
kubectl apply -f demo-04-spare-capacity-balloon/balloon-deploy.yaml
Watch scale up of balloon pods
watch -d kubectl get pods,nodes
Now let's simulate a scaling event where the frontend
service goes from 1 to 8 replicas. Then watch as the balloon pods yield to the frontend
pods for rapid scale up.
Scale up frontend:
kubectl scale --replicas=8 deployment frontend
Watch scale up of frontend, displacing the balloon pods. Recreation of low priority balloon pods.
watch -n 1 kubectl get pods,nodes
You should see three things happening:
- The original balloon pods will start terminating immediately because they are low priority, making way for...
- Frontend scaling up quickly, with most pods up and running in ~30s
- There are also new balloon pods spinning up on newly provisioned infrastructure
If we were to scale up again, the latest balloon pods would get displaced and we'd continue buffering headroom this way.
Another common Kubernetes use case is workload separation: running specific services on separate nodes. Workload separation is achieved on GKE Autopilot using node labels and tolerations.
In this demo, we want to ensure that both frontend
and paymentservice.yaml
workloads run on their own nodes, with no other workloads co-mingled. We'll achieve this by setting node labels using nodeSelector and a corresponding toleration.
Open the file: demo-05-workload-separation/frontend.yaml and look for the toleration and nodeSelector. In this case, the node label is "frontend-servers".
Scale frontend service to 8 replicas
kubectl scale --replicas=8 deployment frontend
Open the file: demo-05-workload-separation/paymentservice.yaml and look for the toleration and nodeSelector. In this case, the node label is "PCI" (say we're trying to isolate these workloads for PCI reasons).
Scale up paymentservice to 2 replicas
kubectl scale --replicas=2 deployment paymentservice
Notice the current "co-mingled" distribution of workloads on nodes:
kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
Redeploy the workloads with workload separation
kubectl apply -f demo-05-workload-separation
Watch the separation happen, which may take several minutes:
watch -n 1 kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
Sometimes you want to run a service in a specific availability zone. Perhaps we have persistent data there and we want close proximity.
Open the file: demo-06-single-zone/productcatalogservice.yaml and look for the nodeSelector section. In this case, us-west1-b is preset as the zone but you can change this if desired.
kubectl get nodes --label-columns topology.kubernetes.io/zone
You'll see a mix of zones a, b, and possibly others.
Find productcatalogservice
and make note of the zone this pod is in by referencing the previous command's output.
kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
Redeploy with the selected zone.
kubectl apply -f demo-06-single-zone/
Watch the pod move to another node (make note of the node name):
watch -n 1 kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
Confirm the pod landed on a pod in zone b:
kubectl get nodes --label-columns kubectl get nodes --label-columns topology.kubernetes.io/zone
For a more thorough discussion, see William Denniss's blog post on this topic.
That's it! You've made it through all the demos. You can now remove the GKE Autopilot cluster used in this demo as follows:
. ./bootstrap/teardown.sh