Remove a worker node
Steps to remove a worker node of an OpenShift 4 cluster on cloudscale.ch.
Starting situation
-
You already have a OpenShift 4 cluster on cloudscale.ch
-
You have admin-level access to the cluster
-
You want to remove an existing worker node in the cluster
High-level overview
-
First we identify the correct node to remove and drain it.
-
Then we remove it from Kubernetes.
-
Finally we remove the associated VMs.
Prerequisites
The following CLI utilities need to be available locally:
-
docker
-
curl
-
kubectl
-
oc
-
vault
Vault CLI -
commodore
, see Running Commodore -
jq
-
yq
yq YAML processor (version 4 or higher) -
macOS:
gdate
from GNU coreutils,brew install coreutils
Prepare local environment
-
Create local directory to work in
We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.
export WORK_DIR=/path/to/work/dir mkdir -p "${WORK_DIR}" pushd "${WORK_DIR}"
-
Configure API access
Access to cloud API# From https://control.cloudscale.ch/service/<your-project>/api-token export CLOUDSCALE_API_TOKEN=<cloudscale-api-token>
Access to VSHN GitLab# From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient export GITLAB_TOKEN=<gitlab-api-token> export GITLAB_USER=<gitlab-user-name>
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens
-
Get required tokens from Vault
Connect with Vaultexport VAULT_ADDR=https://vault-prod.syn.vshn.net vault login -method=oidc
Grab the LB hieradata repo token from Vaultexport HIERADATA_REPO_SECRET=$(vault kv get \ -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data') export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user') export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
Get Floaty credentialsexport TF_VAR_lb_cloudscale_api_secret=$(vault kv get \ -format=json "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty" | jq -r '.data.data.iam_secret')
-
Compile the catalog for the cluster. Having the catalog available locally enables us to run Terraform for the cluster to make any required changes.
commodore catalog compile "${CLUSTER_ID}"
Prepare Terraform environment
-
Configure Terraform secrets
cat <<EOF > ./terraform.env CLOUDSCALE_API_TOKEN TF_VAR_ignition_bootstrap TF_VAR_lb_cloudscale_api_secret TF_VAR_control_vshn_net_token GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='docker run -it --rm \ -e REAL_UID=$(id -u) \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
Drain and Remove Node
-
Find the node you want to remove. It has to be the one with the highest terraform index.
# Grab JSON copy of current Terraform state terraform state pull > .tfstate.json export NODE_TO_REMOVE=$(jq -r \ '.resources[] | select(.module=="module.cluster.module.worker" and .type=="cloudscale_server") | .instances[.instances|length-1] | .attributes.name | split(".") | first' \ .tfstate.json) echo $NODE_TO_REMOVE
-
If you are working on a production cluster, you need to schedule the node drain for the next maintenance.
-
If you are working on a non-production cluster, you may drain and remove the node immediately.
Schedule node drain (production clusters)
-
Create an adhoc-config for the UpgradeJobHook that will drain the node.
pushd "../../../inventory/classes/$TENANT_ID" cat > manifests/$CLUSTER_ID/drain_node_hook <<EOF --- apiVersion: managedupgrade.appuio.io/v1beta1 kind: UpgradeJobHook metadata: name: drain-node namespace: appuio-openshift-upgrade-controller spec: events: - Finish selector: matchLabels: appuio-managed-upgrade: "true" run: Next template: spec: template: spec: containers: - args: - -c - | #!/bin/sh set -e oc adm drain ${NODE_TO_REMOVE} --delete-emptydir-data --ignore-daemonsets command: - sh image: quay.io/appuio/oc:v4.13 name: remove-nodes env: - name: HOME value: /export volumeMounts: - mountPath: /export name: export workingDir: /export restartPolicy: Never volumes: - emptyDir: {} name: export --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: drain-nodes-upgrade-controller roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: default namespace: appuio-openshift-upgrade-controller EOF git commit -am "Schedule drain of node ${NODE_TO_REMOVE} on cluster $CLUSTER_ID" git push popd
-
Wait until after the next maintenance window.
-
Confirm the node has been drained.
kubectl get node ${NODE_TO_REMOVE}
-
Clean up UpgradeJobHook
# after redoing the local environment and preparation of terraform: pushd "../../../inventory/classes/$TENANT_ID" rm manifests/$CLUSTER_ID/drain_node_hook git commit -am "Remove UpgradeJobHook to drain node ${NODE_TO_REMOVE} on cluster $CLUSTER_ID" git push popd
-
Delete the node(s) from the cluster
for node in $(echo -n ${NODE_TO_REMOVE}); do kubectl --as=cluster-admin delete node "${node}" done
Drain and remove node immediately
-
Drain the node(s)
for node in $(echo -n ${NODE_TO_REMOVE}); do kubectl --as=cluster-admin drain "${node}" \ --delete-emptydir-data --ignore-daemonsets done
-
Delete the node(s) from the cluster
for node in $(echo -n ${NODE_TO_REMOVE}); do kubectl --as=cluster-admin delete node "${node}" done
Update Cluster Config
-
Update cluster config.
pushd "inventory/classes/${TENANT_ID}/" yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_count -= 1" \ ${CLUSTER_ID}.yml
-
Review and commit
# Have a look at the file ${CLUSTER_ID}.yml. git commit -a -m "Remove worker node from cluster ${CLUSTER_ID}" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i