Change worker node type (instance pool)

Steps to change the instance type of an OpenShift 4 cluster on Exoscale with instance pools.

Starting situation

You already have a OpenShift 4 cluster on Exoscale
Your cluster uses Exoscale instance pools for the worker and infra nodes
You have admin-level access to the cluster
Your kubectl context points to the cluster you’re modifying
You want to change the node type (size) of the worker or infra nodes

High-level overview

Update the instance pool with the new desired type
Replace each existing node with a new node

Prerequisites

The following CLI utilities need to be available locally:

docker
curl
kubectl
oc
exo >= v1.28.0 Exoscale CLI
vault Vault CLI
commodore, see Running Commodore
jq
yq yq YAML processor (version 4 or higher)
macOS: gdate from GNU coreutils, brew install coreutils

Prepare local environment

Create local directory to work in

We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

export WORK_DIR=/path/to/work/dir
mkdir -p "${WORK_DIR}"
pushd "${WORK_DIR}"

Configure API access

Access to cloud API

export EXOSCALE_API_KEY=<exoscale-key> (1)
export EXOSCALE_API_SECRET=<exoscale-secret>
export EXOSCALE_ZONE=<exoscale-zone> (2)
export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"

1	We recommend using the IAMv3 role called `Owner` for the API Key. This role gives full access to the project.
2	All lower case. For example `ch-dk-2`.

Access to VSHN GitLab

# From https://git.vshn.net/-/user_settings/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>

Access to VSHN Lieutenant

# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)

Get required tokens from Vault

Connect with Vault

export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc

Grab the LB hieradata repo token from Vault

export HIERADATA_REPO_SECRET=$(vault kv get \
  -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')

Compile the catalog for the cluster. Having the catalog available locally enables us to run Terraform for the cluster to make any required changes.
```
commodore catalog compile "${CLUSTER_ID}"
```

Update Cluster Config

Set new desired node type
```
new_type=<exoscale instance type> (1)
```
1 An Exoscale instance type, for example standard.huge.

Update cluster config

pushd "inventory/classes/${TENANT_ID}/"

yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_type = \"${new_type}\"" \
  ${CLUSTER_ID}.yml

Review and commit

# Have a look at the file ${CLUSTER_ID}.yml.

git commit -a -m "Update worker nodes of cluster ${CLUSTER_ID} to ${new_type}"
git push

popd

Compile and push cluster catalog

commodore catalog compile ${CLUSTER_ID} --push -i

Run Terraform

Configure Terraform secrets

cat <<EOF > ./terraform.env
EXOSCALE_API_KEY
EXOSCALE_API_SECRET
TF_VAR_control_vshn_net_token
GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL
HIERADATA_REPO_TOKEN
EOF

Setup Terraform

Prepare Terraform execution environment

# Set terraform image and tag to be used
tf_image=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.image" \
  dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
  dependencies/openshift4-terraform/class/defaults.yml)

# Generate the terraform alias
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run -it --rm \
  -e REAL_UID=$(id -u) \
  -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
  --env-file ${base_dir}/terraform.env \
  -w /tf \
  -v $(pwd):/tf \
  --ulimit memlock=-1 \
  "${tf_image}:${tf_tag}" /tf/terraform.sh'

export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"

pushd catalog/manifests/openshift4-terraform/

Initialize Terraform

terraform init \
  "-backend-config=address=${GITLAB_STATE_URL}" \
  "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=username=${GITLAB_USER}" \
  "-backend-config=password=${GITLAB_TOKEN}" \
  "-backend-config=lock_method=POST" \
  "-backend-config=unlock_method=DELETE" \
  "-backend-config=retry_wait_min=5"

Run Terraform

This doesn’t make changes to existing instances. However, after this step, any new instances created for the instance pool will use the new configuration.
```
terraform apply
```

Apply new instance pool configuration

Double-check that your kubectl context points to the cluster you’re working on

Depending on the number of nodes you’re updating, you may want to execute the steps in this section for a subset of the nodes at a time.

On clusters with dedicated hypervisors, you’ll need to execute the steps for each worker instance pool. You can list the worker instance pools with

exo compute instance-pool list -Ojson | jq -r '.[]|select(.name|contains("worker"))|.name'

If you’re using this how-to for changing the instance type of the infra nodes, you must run Terraform again after replacing nodes to ensure that the LB hieradata is updated with the new infra node IPs.

When replacing infra nodes, we strongly recommend doing so in two batches to ensure availability of the cluster ingress.

Select the instance pool
```
pool_name="${CLUSTER_ID}_worker-0" (1)
```

Compute the new instance count

new_count=$(exo compute instance-pool show "${pool_name}" -Ojson | jq -r '.size * 2')

For larger clusters, you’ll probably want to do something like the following to replace nodes in batches. If you do this, you’ll need to repeat the steps below this one for each batch.

batch_size=3 (1)
new_count=$(exo compute instance-pool show "${pool_name}" -Ojson | \
  jq --argjson batch "$batch_size" -r '.size + $batch')

1	Replace with the desired batch size. Please ensure that you adjust the last batch size to not provision extra nodes if your node count isn’t divisible by your selected batch size.

Get the list of old nodes

NODES_TO_REMOVE=$(exo compute instance-pool show "${pool_name}" -Ojson | \
  jq -r '.instances|join(" ")')

If you’re replacing nodes in batches, save the list of old nodes in a file:

exo compute instance-pool show "${pool_name}" -Ojson | jq -r '.instances' > old-nodes.json (1)

1	Run this only once before starting to replace nodes.

Compute a batch of old nodes to remove and drop those from the file:

NODES_TO_REMOVE=$(jq --argjson batch "$batch_size" -r '.[:$batch]|join(" ")' old-nodes.json)
jq -r '.[$batch:]' old-nodes.json > old-nodes-rem.json && \
  mv old-nodes-rem.json old-nodes.json

Scale up the instance pool to create new instances with the new desired type

exo compute instance-pool scale "${pool_name}" "${new_count}" -z "${EXOSCALE_ZONE}"

Approve CSRs of new nodes

# Once CSRs in state Pending show up, approve them
# Needs to be run twice, two CSRs for each node need to be approved

kubectl --as=cluster-admin get csr -w

oc --as=cluster-admin get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
  xargs oc --as=cluster-admin adm certificate approve

kubectl --as=cluster-admin get nodes

Label nodes

kubectl get node -ojson | \
  jq -r '.items[] | select(.metadata.name | test("infra|master|storage-")|not).metadata.name' | \
  xargs -I {} kubectl label node {} node-role.kubernetes.io/app=

Drain and remove old nodes

If you are working on a production cluster, you need to schedule the node drain for the next maintenance.

Schedule node drain (production clusters)

Create an adhoc-config for the UpgradeJobHook that will drain the node.

pushd "../../../inventory/classes/$TENANT_ID"
cat > manifests/$CLUSTER_ID/drain_node_hook.yaml <<EOF
---
apiVersion: managedupgrade.appuio.io/v1beta1
kind: UpgradeJobHook
metadata:
  name: drain-node
  namespace: appuio-openshift-upgrade-controller
spec:
  events:
    - Finish
  selector:
    matchLabels:
      appuio-managed-upgrade: "true"
  run: Next
  template:
    spec:
      template:
        spec:
          containers:
            - args:
                - -c
                - |
                  #!/bin/sh
                  set -e
                  oc adm drain ${NODES_TO_REMOVE} --delete-emptydir-data --ignore-daemonsets
              command:
                - sh
              image: quay.io/appuio/oc:v4.13
              name: remove-nodes
              env:
                - name: HOME
                  value: /export
              volumeMounts:
                - mountPath: /export
                  name: export
              workingDir: /export
          restartPolicy: Never
          volumes:
            - emptyDir: {}
              name: export
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: drain-nodes-upgrade-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: default
    namespace: appuio-openshift-upgrade-controller
EOF

git commit -am "Schedule drain of node ${NODES_TO_REMOVE} on cluster $CLUSTER_ID"
git push
popd

Wait until after the next maintenance window.
Confirm the node has been drained.
```
kubectl get node ${NODES_TO_REMOVE}
```

Clean up UpgradeJobHook

# after redoing the local environment and preparation of terraform:
pushd "../../../inventory/classes/$TENANT_ID"
rm manifests/$CLUSTER_ID/drain_node_hook
git commit -am "Remove UpgradeJobHook to drain node ${NODES_TO_REMOVE} on cluster $CLUSTER_ID"
git push
popd

Delete the node(s) from the cluster

for node in $(echo -n ${NODES_TO_REMOVE}); do
  kubectl --as=cluster-admin delete node "${node}"
done

If you are working on a non-production cluster, you may drain and remove the nodes immediately.

Drain and remove node immediately

Drain the node(s)

for node in $(echo -n ${NODES_TO_REMOVE}); do
  kubectl --as=cluster-admin drain "${node}" \
    --delete-emptydir-data --ignore-daemonsets
done

Delete the node(s) from the cluster

for node in $(echo -n ${NODES_TO_REMOVE}); do
  kubectl --as=cluster-admin delete node "${node}"
done

Remove old VMs from instance pool

Only do this after the previous step is completed. On production clusters this must happen after the maintenance.
```
for node in "$NODES_TO_REMOVE"; do
  exo compute instance-pool evict "${pool_name}" "${node}" -z "${EXOSCALE_ZONE}"
done
```