Update compute flavors
Steps to change compute flavors for one or more node groups on cloudscale.ch.
Starting situation
-
You already have an OpenShift 4 cluster on cloudscale.ch
-
You have admin-level access to the cluster
-
You want to change compute flavors of one or more node groups
Prerequisites
The following CLI utilities need to be available locally:
-
oc
-
jq
-
yq
-
docker
-
commodore
, see Running Commodore
Update Cluster Config
-
Update cluster config in syn-tenant-repo on a new branch.
export CLUSTER_ID= export INFRA_FLAVOR=plus-X-Y export WORKER_FLAVOR=plus-X-Y export STORAGE_FLAVOR=plus-X-Y git checkout -b update-compute-flavors yq eval -i ".parameters.openshift4_terraform.terraform_variables.infra_flavor = \"${INFRA_FLAVOR}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_flavor = \"${WORKER_FLAVOR}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.additional_worker_groups.storage.flavor = \"${STORAGE_FLAVOR}\"" \ ${CLUSTER_ID}.yml
Ensure at least version 3.12
of terraform-openshift4-cloudscale is used to make sure the flavors for master and loadbalancer nodes get updated as well. -
Commit and create MR to review
git commit -a -m "Update compute flavors on cluster ${CLUSTER_ID}" git push -u origin update-compute-flavors
-
Compile and push the cluster catalog.
Prepare Terraform environment
-
Configure API access
export COMMODORE_API_URL=https://api.syn.vshn.net (1) # Set Project Syn cluster and tenant ID export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-cluster-id-1234 export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
1 Replace with the API URL of the desired Lieutenant instance. -
Create a local directory to work in and compile the cluster catalog
export WORK_DIR=/path/to/work/dir mkdir -p "${WORK_DIR}" pushd "${WORK_DIR}" commodore catalog compile "${CLUSTER_ID}"
We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.
-
Configure Terraform environment
export CLOUDSCALE_API_TOKEN= export GITLAB_USER= export GITLAB_TOKEN=
-
Configure Terraform secrets
cat <<EOF > ./terraform.env CLOUDSCALE_API_TOKEN TF_VAR_ignition_bootstrap TF_VAR_lb_cloudscale_api_secret TF_VAR_control_vshn_net_token GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='docker run --rm \ -e REAL_UID=$(id -u) \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
Update compute flavors
-
Verify output of the Terraform plan step (for example, check the output of the Terraform CI/CD pipeline in cluster catalog)
-
Make sure you are logged in to the correct cluster
oc cluster-info
-
Allow terraform to stop nodes
sed -i '/^resource "cloudscale_server" "node"/a allow_stopping_for_update = true' .terraform/modules/cluster/modules/node-group/main.tf sed -i '/^resource "cloudscale_server" "lb"/a allow_stopping_for_update = true' .terraform/modules/cluster.lb/modules/vshn-lbaas-cloudscale/main.tf
Update master nodes
-
Update master node flavors one at a time
terraform state list module.cluster.module.master.cloudscale_server.node | while read node; do terraform show -json | jq --raw-output --arg node "$node" '.values.root_module.child_modules[].child_modules[].resources[] | select(.address == $node) | .values.name | split(".")[0] ' | while read name; do oc --as=cluster-admin adm drain $name --delete-emptydir-data --ignore-daemonsets --force; terraform apply -target $node -auto-approve; oc --as=cluster-admin adm uncordon $name; oc wait --timeout=300s node --all --for condition=ready; done; done
Update infra nodes
-
Update infra node flavors one at a time
terraform state list module.cluster.module.infra.cloudscale_server.node | while read node; do terraform show -json | jq --raw-output --arg node "$node" '.values.root_module.child_modules[].child_modules[].resources[] | select(.address == $node) | .values.name | split(".")[0] ' | while read name; do oc --as=cluster-admin adm drain $name --delete-emptydir-data --ignore-daemonsets --force; terraform apply -target $node -auto-approve; oc --as=cluster-admin adm uncordon $name; oc wait --timeout=300s node --all --for condition=ready; done; done
Update worker nodes
-
Update worker node flavors one at a time
terraform state list module.cluster.module.worker.cloudscale_server.node | while read node; do terraform show -json | jq --raw-output --arg node "$node" '.values.root_module.child_modules[].child_modules[].resources[] | select(.address == $node) | .values.name | split(".")[0] ' | while read name; do oc --as=cluster-admin adm drain $name --delete-emptydir-data --ignore-daemonsets --force; terraform apply -target $node -auto-approve; oc --as=cluster-admin adm uncordon $name; oc wait --timeout=300s node --all --for condition=ready; done; done
Update storage nodes
-
Because the storage cluster needs time to recover between the node restarts, the storage cluster health needs to be checked manually between the flavor updates.
terraform state list 'module.cluster.module.additional_worker["storage"].cloudscale_server.node' | while read node; do terraform show -json | jq --raw-output --arg node "$node" '.values.root_module.child_modules[].child_modules[].resources[] | select(.address == $node) | .values.name | split(".")[0]'; echo $node; done # For each of the storage nodes do: oc --as=cluster-admin adm drain $name --delete-emptydir-data --ignore-daemonsets --force terraform apply -target $node -auto-approve oc --as=cluster-admin adm uncordon $name # Now make sure storage cluster is healthy and proceed with the next node