How to Bootstrap a Cluster
If you bootstrap from any other node than the one that has proceeded the furthest in transactions, this will result in data loss. Always take a backup of the data directory before taking any further steps! |
If the cluster lost a majority of nodes, it won’t be able to start again on it’s own. Since it can’t know which node has the latest data, you need to force the bootstrap.
Find the node with the highest number of wsrep_last_committed
.
You can use the MariaDB Galera dashboard to do so.
-
Save the instance ID of the affected cluster:
export INSTANCE_ID=<instance-id>
Force bootstrap from running node
If the node with the most up-to-date data is still running you can force it to become a primary by running the following command:
POD_NAME=<pod>
kubectl exec -n $INSTANCE_ID $POD_NAME \
-it -c mariadb-galera \
-- bash -c "mysql -uroot --password=\$MARIADB_ROOT_PASSWORD -e \"SET GLOBAL wsrep_provider_options='pc.bootstrap=1';\""
If this doesn’t work because the node is crashing or for some other reason not running, you might need to use one of the following steps to force it to become a primary.
Force bootstrap from node 0
To force the bootstrap form node 0 follow the following steps.
kubectl -n $INSTANCE_ID scale statefulset mariadb \
--replicas 1
kubectl -n $INSTANCE_ID set env statefulset/mariadb \
-c mariadb-galera \
MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP="yes"
kubectl -n $INSTANCE_ID delete pods -l app.kubernetes.io/name=mariadb-galera
kubectl -n $INSTANCE_ID set env statefulset/mariadb -c mariadb-galera \
MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP="no"
kubectl -n $INSTANCE_ID scale statefulset mariadb \
--replicas 3
Always make sure to disable force bootstrap before scaling a cluster up. |
Force bootstrap from other node than 0
To restart a Galera Cluster, you might need to bootstrap the cluster from another node than 0 (after an ungraceful shutdown). You can basically follow the steps described in the Helm Chart: Bootstraping a node other than 0.
Find Helm release
In order to change Helm Values, you first need to find the Helm release object:
release_name=$(kubectl get compositemariadbinstances.syn.tools $INSTANCE_ID \
-ojsonpath='{.spec.resourceRefs[1].name}')
kubectl get releases.helm.crossplane.io $release_name
NAME CHART VERSION SYNCED READY STATE REVISION DESCRIPTION AGE
f1600418-cf59-4ec0-b4d9-d0270559dbcf-4xp5n mariadb-galera 5.2.1 True True deployed 19 Upgrade complete 14d
Change Helm release values
To change values for the safe bootstraping edit this release and set the values based on your findings (node number to bootstrap from and the podManagementPolicy
.
As you can’t change a StatefulSet
, first delete the mariadb StatefulSet
and then apply your changes to the release object (this will recreate the StatefulSet
again).
kubectl -n $INSTANCE_ID delete statefulset mariadb --cascade=orphan
Now edit the release:
kubectl edit releases.helm.crossplane.io $release_name
Add the following elements to the helm release. Node 2 is used an example to bootstrap from:
galera:
bootstrap:
bootstrapFromNode: 2
forceSafeToBootstrap: true
podManagementPolicy: Parallel
When your cluster is up and running again (all 3 pods ready), first scale down to 1 replica and wait until only one Pod remains and then scale down to 0 replicas. This makes sure that you can bootstrap from the first node again:
# Scale to 1 replica
kubectl -n $INSTANCE_ID scale statefulset mariadb \
--replicas 1
# Wait
# Scale to 0 replica
kubectl -n $INSTANCE_ID scale statefulset mariadb \
--replicas 0
Delete the StatefulSet
again:
kubectl -n $INSTANCE_ID delete statefulset mariadb --cascade=orphan
Reverse your changes in the helm release. Now the Galera service should start again without errors.