Our recommended HA setup for production is:
- Galera with at least 3 nodes. Always an odd number of nodes.
- Load balance requests using MaxScale as database proxy.
- Use dedicated nodes to avoid noisy neighbours.
- Define pod disruption budgets.
Refer to the following sections for further detail.
- Topologies
- Kubernetes Services
- MaxScale
- Pod Anti-Affinity
- Dedicated Nodes
- Pod Disruption Budgets
- Reference
- Multi master HA via Galera: All nodes support reads and writes. We have a designated primary where the writes are performed.
- Single master HA via SemiSync Replication: The primary node allows both reads and writes, while secondary nodes only allow reads.
In order to address nodes, mariadb-operator
provides you with the following Kubernetes Services
:
<mariadb-name>
: To be used for read requests. It will point to all nodes.<mariadb-name>-primary
: To be used for write requests. It will point to a single node, the primary.<mariadb-name>-secondary
: To be used for read requests. It will point to all nodes, except the primary.
Whenever the primary changes, either by the user or by the operator, both the <mariadb-name>-primary
and <mariadb-name>-secondary
Services
will be automatically updated by the operator to address the right nodes.
The primary may be manually changed by the user at any point by updating the spec.[replication|galera].primary.podIndex
field. Alternatively, automatic primary failover can be enabled by setting spec.[replication|galera].primary.automaticFailover
, which will make the operator to switch primary whenever the primary Pod
goes down.
While Kubernetes Services
can be utilized to dynamically address primary and secondary instances, the most robust high availability configuration we recommend relies on MaxScale. Please refer to MaxScale docs for further details.
Warning
Bear in mind that, when enabling this, you need to have at least as many Nodes
available as the replicas specified. Otherwise your Pods
will be unscheduled and the cluster won't bootstrap.
To achieve real high availability, we need to run each MariaDB
Pod
in different Kubernetes Nodes
. This practice, known as anti-affinity, helps reducing the blast radius of Nodes
being unavailable.
By default, anti-affinity is disabled, which means that multiple Pods
may be scheduled in the same Node
, something not desired in HA scenarios.
You can selectively enable anti-affinity in all the different Pods
managed by the MariaDB
resource:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-galera
spec:
bootstrapFrom:
restoreJob:
affinity:
antiAffinityEnabled: true
...
galera:
initJob:
affinity:
antiAffinityEnabled: true
...
metrics:
exporter:
affinity:
antiAffinityEnabled: true
...
affinity:
antiAffinityEnabled: true
Anti-affinity may also be enabled in the the resources that have a reference to MariaDB
, resulting in their Pods
being scheduled in Nodes
where MariaDB
is not running. For instance, the Backup
and Restore
processes can run in different Nodes
:
apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
name: backup
spec:
mariaDbRef:
name: mariadb-galera
...
affinity:
antiAffinityEnabled: true
apiVersion: k8s.mariadb.com/v1alpha1
kind: Restore
metadata:
name: restore
spec:
mariaDbRef:
name: mariadb-galera
...
affinity:
antiAffinityEnabled: true
In the case of MaxScale
, the Pods
will also be placed in Nodes
isolated in terms of compute, ensuring isolation not only among themselves but also from the MariaDB
Pods
. For example, if you run a MariaDB
and MaxScale
with 3 replicas each, you will need 6 Nodes
in total:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MaxScale
metadata:
name: maxscale-galera
spec:
mariaDbRef:
name: mariadb-galera
...
metrics:
exporter:
affinity:
antiAffinityEnabled: true
...
affinity:
antiAffinityEnabled: true
Default anti-affinity rules generated by the operator might not satisfy your needs, but you can always define your own rules. For example, if you want the MaxScale
Pods
to be in different Nodes
, but you want them to share Nodes
with MariaDB
:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MaxScale
metadata:
name: maxscale-galera
spec:
mariaDbRef:
name: mariadb-galera
...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- maxscale-galera
# 'mariadb-galera' instance omitted (default anti-affinity rule)
topologyKey: kubernetes.io/hostname
If you want to avoid noisy neighbours running in the same Kubernetes Nodes
as your MariaDB
, you may consider using dedicated Nodes
. For achieving this, you will need:
- Taint your
Nodes
and add the counterpart toleration in yourPods
.
Important
Tainting your Nodes
is not covered by this operator, it is something you need to do by yourself beforehand. You may take a look at the Kubernetes documentation to understand how to achieve this.
- Select the
Nodes
to schedule in via anodeSelector
in yourPods
.
Note
Although you can use the default Node
labels, you may consider adding more significative labels to your Nodes
, as you will have to refer to them in your Pod
nodeSelector
. Refer to the Kubernetes documentation.
- Add
podAntiAffinity
to yourPods
as described in the Pod Anti-Affinity section.
Once you have completed the previous steps, you can configure your MariaDB
as follows:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-galera
spec:
...
tolerations:
- key: "k8s.mariadb.com/ha"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
"k8s.mariadb.com/node": "ha"
affinity:
antiAffinityEnabled: true
Important
Take a look at the Kubernetes documentation if you are unfamiliar to PodDisruptionBudgets
By defining a PodDisruptionBudget
, you are telling Kubernetes how many Pods
your database tolerates to be down. This quite important for planned maintenance operations such as Node
upgrades.
mariadb-operator
creates a default PodDisruptionBudget
if you are running in HA, but you are able to define your own by setting:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-galera
spec:
...
podDisruptionBudget:
maxUnavailable: 33%