Skip to content

Commit

Permalink
AKS min node improvements #2683 (#2697)
Browse files Browse the repository at this point in the history
* AKS min node improvements #2683

* Bump docs
  • Loading branch information
BernieWhite authored Feb 21, 2024
1 parent 46f537b commit f29a483
Show file tree
Hide file tree
Showing 12 changed files with 791 additions and 44 deletions.
1 change: 1 addition & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
"GREATEROREQUALS",
"Hashtable",
"inheritdoc",
"konnectivity",
"kube",
"kubelet",
"kubenet",
Expand Down
14 changes: 14 additions & 0 deletions docs/CHANGELOG-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,20 @@ See [upgrade notes][1] for helpful information when upgrading from previous vers

What's changed since v1.33.2:

- New rules:
- Azure Kubernetes Service:
- Check that user mode pools have a minimum number of nodes by @BernieWhite.
[#2683](https://github.com/Azure/PSRule.Rules.Azure/issues/2683)
- Added configuration to support changing the minimum number of node and to exclude node pools.
- Set `AZURE_AKS_CLUSTER_USER_POOL_MINIMUM_NODES` to set the minimum number of user nodes.
- Set `AZURE_AKS_CLUSTER_USER_POOL_EXCLUDED_FROM_MINIMUM_NODES` to exclude a specific node pool by name.
- Updated rules:
- Azure Kubernetes Service:
- Updated `Azure.AKS.MinNodeCount` the count nodes system node pools by @BernieWhite.
[#2683](https://github.com/Azure/PSRule.Rules.Azure/issues/2683)
- Improved guidance and examples specifically for system node pools.
- Added configuration to support changing the minimum number of node.
- Set `AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES` to set the minimum number of system nodes.
- Engineering:
- Bump Microsoft.NET.Test.Sdk to v17.9.0.
[#2680](https://github.com/Azure/PSRule.Rules.Azure/pull/2680)
Expand Down
266 changes: 257 additions & 9 deletions docs/en/rules/Azure.AKS.MinNodeCount.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,277 @@
---
reviewed: 2024-02-21
severity: Important
pillar: Reliability
category: Load balancing and failover
category: RE:05 Redundancy
resource: Azure Kubernetes Service
online version: https://azure.github.io/PSRule.Rules.Azure/en/rules/Azure.AKS.MinNodeCount/
ms-content-id: 320afea5-5c19-45ad-b9a5-c1a63ae6e114
---

# Azure.AKS.MinNodeCount
# Minimum number of system nodes in an AKS cluster

## SYNOPSIS

AKS clusters should have minimum number of nodes for failover and updates.
AKS clusters should have minimum number of system nodes for failover and updates.

## DESCRIPTION

Kubernetes clusters should have minimum number of three (3) nodes for high availability and planned maintenance.
Azure Kubernetes (AKS) clusters support multiple nodes and node pools.
Each node is a virtual machine (VM) that runs Kubernetes components and a container runtime.
A node pool is a grouping of nodes that run the same configuration.
Application or system pods can be scheduled to run across multiple nodes to ensure resiliency and high availability.
AKS supports configuring one or more system node pools, and zero or more user node pools.

System node pools are intended for pods that perform important management and infrastructure functions for cluster operation.
This includes CoreDNS, konnectivity, and Azure Policy to name a few.
The number of pods that are scheduled to run on system node pools varies based on the configuration of your cluster.

User node pools are intended for application pods.
In general, schedule application workloads to run on user node pools to avoid disrupting the operation of system pods.

A minimum number of nodes in each node pool should be maintained to ensure resiliency during node failures or disruptions.
Also consider how your nodes are distributed across availability zones when deploying to a supported region.
Understanding that adding new nodes to a node pool can take time.

For example, in a three-node node pool:

- If one node fails ~33% capacity is lost until a new node is created to replace the failed node.
- The pods running on the failed node may be rescheduled to run on the remaining two nodes if there is enough capacity.
However, there is a number of factors that affect which pods will be scheduled to run on the two remaining nodes.

For example, in a 2x two-node node pool:

- If 2x two node pools are deployed both with availability zones `1`, `2`.
AKS will automatically spread the nodes across the two availability zones as it scales out.
- If availability zone `1` fails, 50% capacity on the remaining nodes in availability zone `2` will continue to run pods.
- Pods running on the failed nodes in availability zone `1` will be rescheduled to run pending enough capacity.

## RECOMMENDATION

Use at least three (3) agent nodes.
Consider deploying additional nodes as required to provide enough resiliency during nodes failures or planned maintenance.
Consider configuring AKS clusters with at least three (3) agent nodes in system node pools.

## EXAMPLES

### Configure with Azure template

To deploy AKS clusters that pass this rule:

To deploy AKS clusters that pass this rule:

- For a single system mode node pool `properties.agentPoolProfiles`:
- Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_
- Set the `count` property to at least `3` for node pools without auto-scale. _OR_
- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools.
For example, two node pools with `minCount` set to `2` totalling _4_ nodes.

For example:

```json
{
"type": "Microsoft.ContainerService/managedClusters",
"apiVersion": "2023-11-01",
"name": "[parameters('name')]",
"location": "[parameters('location')]",
"identity": {
"type": "UserAssigned",
"userAssignedIdentities": {
"[format('{0}', resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName')))]": {}
}
},
"properties": {
"kubernetesVersion": "[parameters('kubernetesVersion')]",
"disableLocalAccounts": true,
"enableRBAC": true,
"dnsPrefix": "[parameters('dnsPrefix')]",
"agentPoolProfiles": [
{
"name": "system",
"osDiskSizeGB": 0,
"minCount": 3,
"maxCount": 5,
"enableAutoScaling": true,
"maxPods": 50,
"vmSize": "Standard_D4s_v5",
"type": "VirtualMachineScaleSets",
"vnetSubnetID": "[parameters('clusterSubnetId')]",
"mode": "System",
"osDiskType": "Ephemeral"
},
{
"name": "user",
"osDiskSizeGB": 0,
"minCount": 3,
"maxCount": 20,
"enableAutoScaling": true,
"maxPods": 50,
"vmSize": "Standard_D4s_v5",
"type": "VirtualMachineScaleSets",
"vnetSubnetID": "[parameters('clusterSubnetId')]",
"mode": "User",
"osDiskType": "Ephemeral"
}
],
"aadProfile": {
"managed": true,
"enableAzureRBAC": true,
"adminGroupObjectIDs": "[parameters('clusterAdmins')]",
"tenantID": "[subscription().tenantId]"
},
"networkProfile": {
"networkPlugin": "azure",
"networkPolicy": "azure",
"loadBalancerSku": "standard",
"serviceCidr": "[variables('serviceCidr')]",
"dnsServiceIP": "[variables('dnsServiceIP')]"
},
"apiServerAccessProfile": {
"authorizedIPRanges": [
"0.0.0.0/32"
]
},
"autoUpgradeProfile": {
"upgradeChannel": "stable"
},
"oidcIssuerProfile": {
"enabled": true
},
"addonProfiles": {
"azurepolicy": {
"enabled": true
},
"omsagent": {
"enabled": true,
"config": {
"logAnalyticsWorkspaceResourceID": "[parameters('workspaceId')]"
}
},
"azureKeyvaultSecretsProvider": {
"enabled": true,
"config": {
"enableSecretRotation": "true"
}
}
}
},
"dependsOn": [
"[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName'))]"
]
}
```

### Configure with Bicep

To deploy AKS clusters that pass this rule:

- For a single system mode node pool `properties.agentPoolProfiles`:
- Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_
- Set the `count` property to at least `3` for node pools without auto-scale. _OR_
- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools.
For example, two node pools with `minCount` set to `2` totalling _4_ nodes.

For example:

```bicep
resource clusterWithPools 'Microsoft.ContainerService/managedClusters@2023-11-01' = {
location: location
name: name
identity: {
type: 'UserAssigned'
userAssignedIdentities: {
'${identity.id}': {}
}
}
properties: {
kubernetesVersion: kubernetesVersion
disableLocalAccounts: true
enableRBAC: true
dnsPrefix: dnsPrefix
agentPoolProfiles: [
{
name: 'system'
osDiskSizeGB: 0
minCount: 3
maxCount: 5
enableAutoScaling: true
maxPods: 50
vmSize: 'Standard_D4s_v5'
type: 'VirtualMachineScaleSets'
vnetSubnetID: clusterSubnetId
mode: 'System'
osDiskType: 'Ephemeral'
}
{
name: 'user'
osDiskSizeGB: 0
minCount: 3
maxCount: 20
enableAutoScaling: true
maxPods: 50
vmSize: 'Standard_D4s_v5'
type: 'VirtualMachineScaleSets'
vnetSubnetID: clusterSubnetId
mode: 'User'
osDiskType: 'Ephemeral'
}
]
aadProfile: {
managed: true
enableAzureRBAC: true
adminGroupObjectIDs: clusterAdmins
tenantID: subscription().tenantId
}
networkProfile: {
networkPlugin: 'azure'
networkPolicy: 'azure'
loadBalancerSku: 'standard'
serviceCidr: serviceCidr
dnsServiceIP: dnsServiceIP
}
apiServerAccessProfile: {
authorizedIPRanges: [
'0.0.0.0/32'
]
}
autoUpgradeProfile: {
upgradeChannel: 'stable'
}
oidcIssuerProfile: {
enabled: true
}
addonProfiles: {
azurepolicy: {
enabled: true
}
omsagent: {
enabled: true
config: {
logAnalyticsWorkspaceResourceID: workspaceId
}
}
azureKeyvaultSecretsProvider: {
enabled: true
config: {
enableSecretRotation: 'true'
}
}
}
}
}
```

## NOTES

### Rule configuration

<!-- module:config rule AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES -->

This rule fails by default if you have less than three (3) nodes in the cluster across all system node pools.
To change the default, set the `AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES` configuration option.

## LINKS

- [Baseline architecture for an Azure Kubernetes Service (AKS) cluster](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks/secure-baseline-aks)
- [Create an AKS cluster](https://docs.microsoft.com/azure/aks/use-multiple-node-pools#create-an-aks-cluster)
- [Azure deployment reference](https://docs.microsoft.com/azure/templates/microsoft.containerservice/managedclusters)
- [RE:05 Redundancy](https://learn.microsoft.com/azure/well-architected/reliability/redundancy)
- [Azure Well-Architected Framework review - Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/well-architected/service-guides/azure-kubernetes-service)
- [Manage node pools for a cluster in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/manage-node-pools)
- [Manage system node pools in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/use-system-pools)
- [Azure deployment reference](https://learn.microsoft.com/azure/templates/microsoft.containerservice/managedclusters)
Loading

0 comments on commit f29a483

Please sign in to comment.