All the commands in this guide require both the Azure CLI and aks-engine
. Follow the quickstart guide before continuing.
This guide assumes you already have deployed a cluster using aks-engine
. For more details on how to do that see deploy.
This document provides guidance on how to upgrade the Kubernetes version for an existing AKS Engine cluster and recommendations for adopting aks-engine upgrade
as a tool.
In order to ensure that your aks-engine upgrade
operation runs smoothly, there are a few things you should be aware of before getting started.
-
You will need access to the
apimodel.json
that was generated byaks-engine deploy
oraks-engine generate
in the_output/<clustername>/
directory.aks-engine
will use the--api-model
argument to introspect theapimodel.json
file in order to determine the cluster's current Kubernetes version, as well as all other cluster configuration data as defined byaks-engine
during the last time thataks-engine
was used to deploy, scale, or upgrade the cluster. -
aks-engine upgrade
expects a cluster configuration that conforms to the current state of the cluster. In other words, the Azure resources inside the resource group deployed byaks-engine
should be in the same state as when they were originally created byaks-engine
. If you perform manual operations on your Azure IaaS resources (other thanaks-engine scale
andaks-engine upgrade
) DO NOT useaks-engine upgrade
, as the aks-engine-generated ARM template won't be reconcilable against the state of the Azure resources that reside in the resource group. This includes naming of resources;aks-engine upgrade
relies on some resources (such as VMs) to be named in accordance with the originalaks-engine
deployment. In summary, the set of Azure resources in the resource group are mutually reconcilable byaks-engine upgrade
only if they have been exclusively created and managed as the result of a series of successive ARM template deployments originating fromaks-engine
. -
aks-engine upgrade
allows upgrading the Kubernetes version to any AKS Engine-supported patch release in the current minor release channel that is greater than the current version on the cluster (e.g., from1.12.7
to1.12.8
), or to the next aks-engine-supported minor version (e.g., from1.12.8
to1.13.5
). In practice, the next AKS Engine-supported minor version will commonly be a single minor version ahead of the current cluster version. However, if the cluster has not been upgraded in a significant amount of time, the "next" minor version may have actually been deprecated by aks-engine. In such a case, your long-lived cluster will be upgradable to the nearest, supported minor version thataks-engine
supports at the time of upgrade (e.g., from1.7.16
to1.9.11
).To get the list of all available Kubernetes versions and upgrades, run the
get-versions
command:./bin/aks-engine get-versions
To get the versions of Kubernetes that your particular cluster version is upgradable to, provide its current Kubernetes version in the
version
arg:./bin/aks-engine get-versions --version 1.12.8
-
If using
aks-engine upgrade
in production, it is recommended to stage an upgrade test on an cluster that was built to the same specifications (built with the same cluster configuration + the same version of theaks-engine
binary) as your production cluster before performing the upgrade, especially if the cluster configuration is "interesting", or in other words differs significantly from defaults. The reason for this is that AKS Engine supports many different cluster configurations and the extent of E2E testing that the AKS Engine team runs cannot practically cover every possible configuration. Therefore, it is recommended that you ensure in a staging environment that your specific cluster configuration is upgradable usingaks-engine upgrade
before attempting this potentially destructive operation on your production cluster. -
aks-engine upgrade
is backwards compatible. If you deployed withaks-engine
version0.27.x
, you can run upgrade with version0.29.y
. In fact, it is recommended that you use the latest availableaks-engine
version when running an upgrade operation. This will ensure that you get the latest available software and bug fixes in your upgraded cluster.
During the upgrade, aks-engine successively visits virtual machines that constitute the cluster (first the master nodes, then the agent nodes) and performs the following operations:
Master nodes:
- cordon the node and drain existing workloads
- delete the VM
- create new VM and install desired Kubernetes version
- add the new VM to the cluster (custom annotations, labels and taints etc are retained automatically)
Agent nodes:
- create new VM and install desired Kubernetes version
- add the new VM to the cluster
- evict any pods might be scheduled onto this node by Kubernetes before copying custom node properties
- copy the custom annotations, labels and taints of old node to new node.
- cordon the node and drain existing workloads
- delete the VM
Once you have read all the requirements, run aks-engine upgrade
with the appropriate arguments:
./bin/aks-engine upgrade \
--subscription-id <subscription id> \
--api-model <generated apimodel.json> \
--location <resource group location> \
--resource-group <resource group name> \
--upgrade-version <desired Kubernetes version> \
--auth-method client_secret \
--client-id <service principal id> \
--client-secret <service principal secret>
For example,
./bin/aks-engine upgrade \
--subscription-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--api-model _output/mycluster/apimodel.json \
--location westus \
--resource-group test-upgrade \
--upgrade-version 1.8.7 \
--client-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--client-secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
The upgrade operation is a long-running, successive set of ARM deployments, and for large clusters, more susceptible to one of those deployments failing. This is based on the design principle of upgrade enumerating, one-at-a-time, through each node in the cluster. A transient Azure resource allocation error could thus interrupt the successful progression of the overall transaction. At present, the upgrade operation is implemented to "fail fast"; and so, if a well formed upgrade operation fails before completing, it can be manually retried by invoking the exact same command line arguments as were sent originally. The upgrade operation will enumerate through the cluster nodes, skipping any nodes that have already been upgraded to the desired Kubernetes version. Those nodes that match the original Kubernetes version will then, one-at-a-time, be cordon and drained, and upgraded to the desired version. Put another way, an upgrade command is designed to be idempotent across retry scenarios.
There are known limitations with VMSS cluster-autoscaler scenarios and upgrade. Our current guidance is not to use aks-engine upgrade
on clusters with cluster-autoscaler
functionality. See here to get more information and to track progress of the issues related to these limitations.
We don't recommend using aks-engine upgrade
on clusters that have Availability Set (non-VMSS) agent pools cluster-autoscaler
at this time.
The upgrade operation takes an optional --force
argument:
-f, --force
force upgrading the cluster to desired version. Allows same version upgrades and downgrades.
In some situations, you might want to bypass the AKS-Engine validation of your apimodel versions and cluster nodes versions. This is at your own risk and you should assess the potential harm of using this flag.
The --force
parameter instructs the upgrade process to:
- bypass the usual version validation
- include all your cluster's nodes (masters and agents) in the upgrade process; nodes that are already on the target version will not be skipped.
- allow any Kubernetes versions, including the ones that have not been whitelisted, or deprecated
- accept downgrade operations
Note: If you pass in a version that AKS-Engine literally cannot install (e.g., a version of Kubernetes that does not exist), you may break your cluster.
For each node, the cluster will follow the same process described in the section above: Under the hood