Replace a master node

Steps to replace a master node of an OpenShift 4 cluster on cloudscale.

Starting situation

You already have an OpenShift 4 cluster on cloudscale
You have admin-level access to the cluster
You want to replace a master node of the cluster

Prerequisites

The following CLI utilities need to be available locally:

kubectl
oc
git
jq
yq
vault CLI

Preparation

Update the master node etcd-N DNS records to have a lower TTL so the Puppet LB HAproxy backends get updated quicker
```
etcd-0    300 IN A     172.18.200.a
etcd-1    300 IN A     172.18.200.b
etcd-2    300 IN A     172.18.200.c
          ^ (1)
```
1 Per record TTL can be added before the record type. The value is in seconds.

Ideally this is done at least an hour (one regular TTL for the zone) before actually starting to replace nodes, so the records with the regular TTL have time to expire.
Set downtime for the "HAProxy socket" check for the cluster’s Puppet LBs in Icinga2.

Setup Terraform credentials

Ensure your KUBECONFIG points to the target cluster
```
kubectl cluster-info
kubectl get nodes
```

Setup access to VSHN systems

Access to VSHN GitLab

# From https://git.vshn.net/-/user_settings/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>

Access to VSHN Lieutenant

# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)

Configuration for hieradata commits

export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens

Extract cloudscale token from the cluster

You can also fetch the token from Vault:

export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
export CLOUDSCALE_API_TOKEN=$(vault kv get -format=json \
  clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale | \
  jq -r '.data.data.token')

export CLOUDSCALE_API_TOKEN=$(kubectl --as=system:admin -n openshift-machine-api \
  get secret cloudscale-rw-token -ogo-template='{{.data.token|base64decode}}')

Compile catalog

We use commodore catalog compile here to fetch the cluster catalog. You can also use an existing up-to-date catalog checkout.
```
commodore catalog compile "${CLUSTER_ID}"
```

Configure Terraform

cat >terraform.env <<EOF
CLOUDSCALE_API_TOKEN
GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL
EOF

Setup Terraform

Prepare Terraform execution environment

# Set terraform image and tag to be used
tf_image=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.image" \
  dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
  dependencies/openshift4-terraform/class/defaults.yml)

# Generate the terraform alias
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run -it --rm \
  -e REAL_UID=$(id -u) \
  -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
  --env-file ${base_dir}/terraform.env \
  -w /tf \
  -v $(pwd):/tf \
  --ulimit memlock=-1 \
  "${tf_image}:${tf_tag}" /tf/terraform.sh'

export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"

pushd catalog/manifests/openshift4-terraform/

Initialize Terraform

terraform init \
  "-backend-config=address=${GITLAB_STATE_URL}" \
  "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=username=${GITLAB_USER}" \
  "-backend-config=password=${GITLAB_TOKEN}" \
  "-backend-config=lock_method=POST" \
  "-backend-config=unlock_method=DELETE" \
  "-backend-config=retry_wait_min=5"

Replace the master node

You can repeat the steps in this section if you need to replace multiple master nodes.

Select the master node you want to replace

node=<master-XXXX>
kubectl get node "${node}"

Drain the master node

kubectl --as=system:admin drain --ignore-daemonsets \
  --delete-emptydir-data --force "${node}"

Stop the master node

You can also stop the node via the cloudscale control panel.

oc debug "node/${node}" -n syn-debug-nodes --as=system:admin \
  -- chroot /host shutdown -h now
kubectl wait --for condition=ready=unknown --timeout=600s \
  "node/${node}"

Remove the master node object in the cluster

kubectl --as=system:admin delete node "${node}"

Remove the master node from the etcd cluster

As far as we know, if you don’t do this step, things will go wrong later on!

etcd_pod=$(kubectl -n openshift-etcd get pods -l app=etcd -oname | grep -v "${node}" | head -n1)
member_id=$(kubectl --as=system:admin -n openshift-etcd exec "${etcd_pod}" \
  -- etcdctl member list | grep "${node}" | cut -d, -f1)
kubectl --as=system:admin -n openshift-etcd exec "${etcd_pod}" \
  -- etcdctl member remove "${member_id}"

Delete etcd secrets for the removed node

kubectl --as=system:admin -n openshift-etcd \
  delete $(kubectl -n openshift-etcd get secrets --as=system:admin -oname | grep ${node})

Remove node and id from terraform state

terraform state pull > state.json
state_index=$(jq --arg node "${node}" -r '.resources[]
  | select(.module == "module.cluster.module.master" and .type == "random_id")
  | .instances[]
  | select(.attributes.hex==$node).index_key' \
  state.json)
terraform state rm "module.cluster.module.master.random_id.node[$state_index]"
terraform state rm "module.cluster.module.master.cloudscale_server.node[$state_index]"
rm state.json

Create new node via terraform
```
terraform apply
```
Check the output of the Terraform run and update the appropriate etcd-N DNS record with the IP of the new master node

Wait for the new master node to come online and approve CSRs

# Once CSRs in state Pending show up, approve them
# Needs to be run three times, three CSRs for each node need to be approved

kubectl --as=system:admin get csr -w

oc --as=system:admin get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
  xargs oc --as=system:admin adm certificate approve

kubectl --as=system:admin get nodes

Wait until the etcd and kube-apiserver cluster-operators are healthy again

It’s especially important to wait here when needing to replace multiple master nodes in succession to ensure we never destroy the etcd quorum!
```
kubectl wait --for condition=progressing=false --timeout=15m co etcd kube-apiserver
```
If you’re observing the etcd update, it’s normal that the new etcd pod crashes initially before it gets added to the cluster correctly.
Delete the old master node in the cloudscale control panel.

Finalize replacement

Once you’re done with all master nodes that need to be replaced, revert the short TTL for the etcd-N DNS records
Double-check that you’ve deleted all the old nodes in the cloudscale control panel
Remove downtime for the "HAProxy socket" check for the cluster’s Puppet LBs in Icinga2.