How to Bootstrap a Cluster

If you bootstrap from any other node than the one that has proceeded the furthest in transactions, this will result in data loss. Always take a backup of the data directory before taking any further steps!

If the cluster lost a majority of nodes, it won’t be able to start again on it’s own. Since it can’t know which node has the latest data, you need to force the bootstrap.

Find the node with the highest number of wsrep_last_committed. You can use the MariaDB Galera dashboard to do so.

  1. Save the instance ID of the affected cluster:

export INSTANCE_ID=<instance-id>

Force bootstrap from running node

If the node with the most up-to-date data is still running you can force it to become a primary by running the following command:

POD_NAME=<pod>
kubectl exec -n $INSTANCE_ID $POD_NAME \
    -it -c mariadb-galera \
    -- bash -c "mysql -uroot --password=\$MARIADB_ROOT_PASSWORD -e \"SET GLOBAL wsrep_provider_options='pc.bootstrap=1';\""

If this doesn’t work because the node is crashing or for some other reason not running, you might need to use one of the following steps to force it to become a primary.

Force bootstrap from node 0

To force the bootstrap form node 0 follow the following steps.

Scale cluster to 1 replica
kubectl -n $INSTANCE_ID scale statefulset mariadb \
    --replicas 1
Enable force bootstrap
kubectl -n $INSTANCE_ID set env statefulset/mariadb \
    -c mariadb-galera \
    MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP="yes"
If the nodes are already crashing or not ready, delete all pods
kubectl -n $INSTANCE_ID delete pods -l app.kubernetes.io/name=mariadb-galera
Disable force bootstrap again
kubectl -n $INSTANCE_ID set env statefulset/mariadb -c mariadb-galera \
    MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP="no"
Once the single node is bootstrapped and running, scale the cluster to 3
kubectl -n $INSTANCE_ID scale statefulset mariadb \
    --replicas 3

Always make sure to disable force bootstrap before scaling a cluster up.

Force bootstrap from other node than 0

To restart a Galera Cluster, you might need to bootstrap the cluster from another node than 0 (after an ungraceful shutdown). You can basically follow the steps described in the Helm Chart: Bootstraping a node other than 0.

Find Helm release

In order to change Helm Values, you first need to find the Helm release object:

Find the release name on the control cluster
release_name=$(kubectl get compositemariadbinstances.syn.tools $INSTANCE_ID \
    -ojsonpath='{.spec.resourceRefs[1].name}')
Verify if it’s the release of the MariaDB Galera cluster
kubectl get releases.helm.crossplane.io $release_name

NAME                                         CHART            VERSION   SYNCED   READY   STATE      REVISION   DESCRIPTION        AGE
f1600418-cf59-4ec0-b4d9-d0270559dbcf-4xp5n   mariadb-galera   5.2.1     True     True    deployed   19         Upgrade complete   14d

Change Helm release values

To change values for the safe bootstraping edit this release and set the values based on your findings (node number to bootstrap from and the podManagementPolicy. As you can’t change a StatefulSet, first delete the mariadb StatefulSet and then apply your changes to the release object (this will recreate the StatefulSet again).

kubectl -n $INSTANCE_ID delete statefulset mariadb --cascade=orphan

Now edit the release:

kubectl edit releases.helm.crossplane.io $release_name

Add the following elements to the helm release. Node 2 is used an example to bootstrap from:

galera:
  bootstrap:
    bootstrapFromNode: 2
    forceSafeToBootstrap: true
podManagementPolicy: Parallel

When your cluster is up and running again (all 3 pods ready), first scale down to 1 replica and wait until only one Pod remains and then scale down to 0 replicas. This makes sure that you can bootstrap from the first node again:

# Scale to 1 replica
kubectl -n $INSTANCE_ID scale statefulset mariadb \
    --replicas 1

# Wait
# Scale to 0 replica
kubectl -n $INSTANCE_ID scale statefulset mariadb \
    --replicas 0

Delete the StatefulSet again:

kubectl -n $INSTANCE_ID delete statefulset mariadb --cascade=orphan

Reverse your changes in the helm release. Now the Galera service should start again without errors.