Skip to content

Commit

Permalink
Do not start galera as joiner with 1-replica cluster
Browse files Browse the repository at this point in the history
The mariadb operator checks for available pods in the galera statefulset
to determine whether to start mysqld as a bootstrap or a joiner node on
all the pods that remain to be started.

When galera is deployed as a 1-replica cluster (e.g. in CI), there is
a small time window after the statefulset has been probed and galera marked
as 'bootstrapped', where the single pod can crash before being probed. If so,
the operator will try to restart the pod as a 'joiner', which is invalid.

Add a specific check for 1-replica deployments, so that the operator bails out
and requeue the event when a pod is identified as a joiner. This allows the
operator to reprobe the galera state restart the pod correctly, in order to
avoid an unecessary error in the logs.

Jira: OSPRH-7821
  • Loading branch information
dciabrin committed Aug 5, 2024
1 parent ff694b3 commit 4beccf3
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions controllers/galera_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -691,7 +691,7 @@ func (r *GaleraReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res
// Note:
// . A pod is available in the statefulset if the pod's readiness
// probe returns true (i.e. galera is running in the pod and clustered)
// . Cluster is bootstrapped if as soon as one pod is available
// . Cluster is bootstrapped as soon as one pod is available
instance.Status.Bootstrapped = statefulset.Status.AvailableReplicas > 0

if instance.Status.Bootstrapped {
Expand All @@ -708,8 +708,17 @@ func (r *GaleraReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res
}
}

runningPods := getRunningPodsMissingGcomm(ctx, podList.Items, instance, helper, r.config)
// Special case for 1-node deployment: if the statefulset reports 1 node is available
// but the pod shows up in runningPods (i.e. NotReady), do not consider it a joiner.
// Wait for the two statuses to re-sync after another k8s probe is run.
if *instance.Spec.Replicas == 1 && len(runningPods) == 1 {
log.Info("Galera node no longer running. Requeuing")
return ctrl.Result{RequeueAfter: time.Duration(3) * time.Second}, nil
}

// The other 'Running' pods can join the existing cluster.
for _, pod := range getRunningPodsMissingGcomm(ctx, podList.Items, instance, helper, r.config) {
for _, pod := range runningPods {
name := pod.Name
joinerURI := buildGcommURI(instance)
log.Info("Pushing gcomm URI to joiner", "pod", name)
Expand Down

0 comments on commit 4beccf3

Please sign in to comment.