Remove a worker node

Steps to remove a worker node of an OpenShift 4 cluster on Exoscale.

Starting situation

  • You already have a OpenShift 4 cluster on Exoscale

  • You have admin-level access to the cluster

  • You want to remove an existing worker node in the cluster

Prerequisites

The following CLI utilities need to be available locally:

Prepare local environment

  1. Create local directory to work in

    We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

    export WORK_DIR=/path/to/work/dir
    mkdir -p "${WORK_DIR}"
    pushd "${WORK_DIR}"
  2. Configure API access

    Access to cloud API
    export EXOSCALE_API_KEY=<exoscale-key>
    export EXOSCALE_API_SECRET=<exoscale-secret>
    export EXOSCALE_ZONE=<exoscale-zone> (1)
    export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"
    1 All lower case. For example ch-dk-2.
    Access to various API
    # From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient
    export GITLAB_TOKEN=<gitlab-api-token>
    export GITLAB_USER=<gitlab-user-name>
    
    # For example: https://api.syn.vshn.net
    # IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
    export COMMODORE_API_URL=<lieutenant-api-endpoint>
    
    # Set Project Syn cluster and tenant ID
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
  3. Get required tokens from Vault

    Connect with Vault
    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=oidc
    Grab the LB hieradata repo token from Vault
    export HIERADATA_REPO_SECRET=$(vault kv get \
      -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
    export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
    export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
    Get Floaty credentials
    export TF_VAR_lb_exoscale_api_user=$(vault kv get \
      -format=json "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty" | jq '.data.data')
    export TF_VAR_lb_exoscale_api_key=$(echo "${TF_VAR_lb_exoscale_api_user}" | jq -r '.iam_key')
    export TF_VAR_lb_exoscale_api_secret=$(echo "${TF_VAR_lb_exoscale_api_user}" | jq -r '.iam_secret')
  4. Compile the catalog for the cluster. Having the catalog available locally enables us to run Terraform for the cluster to make any required changes.

    commodore catalog compile "${CLUSTER_ID}"

Update Cluster Config

  1. Update cluster config.

    pushd "inventory/classes/${TENANT_ID}/"
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_count -= 1" \
      ${CLUSTER_ID}.yml
  2. Review and commit

    # Have a look at the file ${CLUSTER_ID}.yml.
    
    git commit -a -m "Remove worker node from cluster ${CLUSTER_ID}"
    git push
    
    popd
  3. Compile and push cluster catalog

    commodore catalog compile ${CLUSTER_ID} --push -i

Prepare Terraform environment

  1. Configure Terraform secrets

    cat <<EOF > ./terraform.env
    EXOSCALE_API_KEY
    EXOSCALE_API_SECRET
    TF_VAR_lb_exoscale_api_key
    TF_VAR_lb_exoscale_api_secret
    TF_VAR_control_vshn_net_token
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    HIERADATA_REPO_TOKEN
    EOF
  2. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='docker run -it --rm \
      -e REAL_UID=$(id -u) \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"

Remove Node

  • Find the node you want to remove. It has to be the one with the highest terraform index.

    # Grab JSON copy of current Terraform state
    terraform state pull > .tfstate.json
    
    node_count=$(jq  -r \
      '.resources[] |
       select(.module=="module.cluster.module.worker" and .type=="exoscale_compute") |
       .instances | length' \
       .tfstate.json)
    # Verify that the number of nodes is one more than we configured earlier.
    echo $node_count
    
    export NODE_TO_REMOVE=$(jq --arg index "$node_count" -r \
      '.resources[] |
       select(.module=="module.cluster.module.worker" and .type=="exoscale_compute") |
       .instances[$index|tonumber-1] |
       .attributes.hostname' \
       .tfstate.json)
    echo $NODE_TO_REMOVE

Remove VM

  1. Drain the node(s)

    for node in $(echo -n ${NODE_TO_REMOVE}); do
      kubectl --as=cluster-admin drain "${node}" \
        --delete-emptydir-data --ignore-daemonsets
    done
  2. Delete the node(s) from the cluster

    for node in $(echo -n ${NODE_TO_REMOVE}); do
      kubectl --as=cluster-admin delete node "${node}"
    done
  3. Remove the node(s) by applying Terraform

    Verify that the hostname of the to be deleted node(s) matches ${NODE_TO_REMOVE}

    Ensure that you’re still in directory ${WORK_DIR}/catalog/manifests/openshift4-terraform before executing this command.
    terraform apply