Replace a master node

Steps to replace a master node of an OpenShift 4 cluster on cloudscale.

Starting situation

  • You already have an OpenShift 4 cluster on cloudscale

  • You have admin-level access to the cluster

  • You want to replace a master node of the cluster

Prerequisites

The following CLI utilities need to be available locally:

  • kubectl

  • oc

  • git

  • jq

  • yq

  • vault CLI

Preparation

  1. Update the master node etcd-N DNS records to have a lower TTL so the Puppet LB HAproxy backends get updated quicker

    etcd-0    300 IN A     172.18.200.a
    etcd-1    300 IN A     172.18.200.b
    etcd-2    300 IN A     172.18.200.c
              ^ (1)
    1 Per record TTL can be added before the record type. The value is in seconds.
    Ideally this is done at least an hour (one regular TTL for the zone) before actually starting to replace nodes, so the records with the regular TTL have time to expire.
  2. Set downtime for the "HAProxy socket" check for the cluster’s Puppet LBs in Icinga2.

Setup Terraform credentials

  1. Ensure your KUBECONFIG points to the target cluster

    kubectl cluster-info
    kubectl get nodes
  2. Setup access to VSHN systems

    Access to VSHN GitLab
    # From https://git.vshn.net/-/user_settings/personal_access_tokens, "api" scope is sufficient
    export GITLAB_TOKEN=<gitlab-api-token>
    export GITLAB_USER=<gitlab-user-name>
    Access to VSHN Lieutenant
    # For example: https://api.syn.vshn.net
    # IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
    export COMMODORE_API_URL=<lieutenant-api-endpoint>
    
    # Set Project Syn cluster and tenant ID
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
    Configuration for hieradata commits
    export GIT_AUTHOR_NAME=$(git config --global user.name)
    export GIT_AUTHOR_EMAIL=$(git config --global user.email)
    export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens
  3. Extract cloudscale token from the cluster

    You can also fetch the token from Vault:

    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=oidc
    export CLOUDSCALE_API_TOKEN=$(vault kv get -format=json \
      clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale | \
      jq -r '.data.data.token')
    export CLOUDSCALE_API_TOKEN=$(kubectl --as=system:admin -n openshift-machine-api \
      get secret cloudscale-rw-token -ogo-template='{{.data.token|base64decode}}')
  4. Compile catalog

    We use commodore catalog compile here to fetch the cluster catalog. You can also use an existing up-to-date catalog checkout.

    commodore catalog compile "${CLUSTER_ID}"
  5. Configure Terraform

    cat >terraform.env <<EOF
    CLOUDSCALE_API_TOKEN
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    EOF
  6. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='touch .terraformrc; docker run -it --rm \
      -e REAL_UID=$(id -u) \
      -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"

Replace the master node

You can repeat the steps in this section if you need to replace multiple master nodes.
  1. Select the master node you want to replace

    node=<master-XXXX>
    kubectl get node "${node}"
  2. Drain the master node

    kubectl --as=system:admin drain --ignore-daemonsets \
      --delete-emptydir-data --force "${node}"
  3. Stop the master node

    You can also stop the node via the cloudscale control panel.
    oc debug "node/${node}" -n syn-debug-nodes --as=system:admin \
      -- chroot /host shutdown -h now
    kubectl wait --for condition=ready=unknown --timeout=600s \
      "node/${node}"
  4. Remove the master node object in the cluster

    kubectl --as=system:admin delete node "${node}"
  5. Remove the master node from the etcd cluster

    As far as we know, if you don’t do this step, things will go wrong later on!
    etcd_pod=$(kubectl -n openshift-etcd get pods -l app=etcd -oname | grep -v "${node}" | head -n1)
    member_id=$(kubectl --as=system:admin -n openshift-etcd exec "${etcd_pod}" \
      -- etcdctl member list | grep "${node}" | cut -d, -f1)
    kubectl --as=system:admin -n openshift-etcd exec "${etcd_pod}" \
      -- etcdctl member remove "${member_id}"
  6. Delete etcd secrets for the removed node

    kubectl --as=system:admin -n openshift-etcd \
      delete $(kubectl -n openshift-etcd get secrets --as=system:admin -oname | grep ${node})
  7. Remove node and id from terraform state

    terraform state pull > state.json
    state_index=$(jq --arg node "${node}" -r '.resources[]
      | select(.module == "module.cluster.module.master" and .type == "random_id")
      | .instances[]
      | select(.attributes.hex==$node).index_key' \
      state.json)
    terraform state rm "module.cluster.module.master.random_id.node[$state_index]"
    terraform state rm "module.cluster.module.master.cloudscale_server.node[$state_index]"
    rm state.json
  8. Create new node via terraform

    terraform apply
  9. Check the output of the Terraform run and update the appropriate etcd-N DNS record with the IP of the new master node

  10. Wait for the new master node to come online and approve CSRs

    # Once CSRs in state Pending show up, approve them
    # Needs to be run three times, three CSRs for each node need to be approved
    
    kubectl --as=system:admin get csr -w
    
    oc --as=system:admin get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc --as=system:admin adm certificate approve
    
    kubectl --as=system:admin get nodes
  11. Wait until the etcd and kube-apiserver cluster-operators are healthy again

    It’s especially important to wait here when needing to replace multiple master nodes in succession to ensure we never destroy the etcd quorum!
    kubectl wait --for condition=progressing=false --timeout=15m co etcd kube-apiserver

    If you’re observing the etcd update, it’s normal that the new etcd pod crashes initially before it gets added to the cluster correctly.

  12. Delete the old master node in the cloudscale control panel.

Finalize replacement

  1. Once you’re done with all master nodes that need to be replaced, revert the short TTL for the etcd-N DNS records

  2. Double-check that you’ve deleted all the old nodes in the cloudscale control panel

  3. Remove downtime for the "HAProxy socket" check for the cluster’s Puppet LBs in Icinga2.