Migrate worker nodes to instance pools
Steps to migrate an existing OpenShift 4 cluster on Exoscale to use instance pools for all worker nodes.
Starting situation
-
You already have an OpenShift 4 cluster on Exoscale
-
Your cluster doesn’t use Exoscale instance pools for the worker nodes
-
You have admin-level access to the cluster
-
Your
kubectl
context points to the cluster you’re modifying -
You want to migrate the cluster to use instance pools for all worker nodes
High-level overview
-
Enable instance pool provisioning in Terraform
-
Remove the existing worker nodes from the Terraform state
-
Deploy the instance pool
-
Drain and delete the non-instance pool worker nodes
Prerequisites
The following CLI utilities need to be available locally:
-
docker
-
curl
-
kubectl
-
oc
-
exo
>= v1.28.0 Exoscale CLI -
vault
Vault CLI -
commodore
, see Running Commodore -
jq
-
yq
yq YAML processor (version 4 or higher) -
macOS:
gdate
from GNU coreutils,brew install coreutils
Prepare local environment
-
Create local directory to work in
We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.
export WORK_DIR=/path/to/work/dir mkdir -p "${WORK_DIR}" pushd "${WORK_DIR}"
-
Configure API access
Access to cloud APIexport EXOSCALE_API_KEY=<exoscale-key> (1) export EXOSCALE_API_SECRET=<exoscale-secret> export EXOSCALE_ZONE=<exoscale-zone> (2) export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"
1 We recommend using the IAMv3 role called Owner
for the API Key. This role gives full access to the project.2 All lower case. For example ch-dk-2
.Access to VSHN GitLab# From https://git.vshn.net/-/user_settings/personal_access_tokens, "api" scope is sufficient export GITLAB_TOKEN=<gitlab-api-token> export GITLAB_USER=<gitlab-user-name>
Access to VSHN Lieutenant# For example: https://api.syn.vshn.net # IMPORTANT: do NOT add a trailing `/`. Commands below will fail. export COMMODORE_API_URL=<lieutenant-api-endpoint> # Set Project Syn cluster and tenant ID export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something> export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
-
Get required tokens from Vault
Connect with Vaultexport VAULT_ADDR=https://vault-prod.syn.vshn.net vault login -method=oidc
Grab the LB hieradata repo token from Vaultexport HIERADATA_REPO_SECRET=$(vault kv get \ -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data') export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user') export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
-
Compile the catalog for the cluster. Having the catalog available locally enables us to run Terraform for the cluster to make any required changes.
commodore catalog compile "${CLUSTER_ID}"
Update Cluster Config
Make sure that the cluster you want to migrate uses terraform-openshift4-exoscale v7 or newer.
|
-
Update cluster config
pushd "inventory/classes/${TENANT_ID}/" yq eval -i ".parameters.openshift4_terraform.terraform_variables.use_instancepools = true" \ ${CLUSTER_ID}.yml
-
Review and commit
# Have a look at the file ${CLUSTER_ID}.yml. git commit -a -m "Migrate cluster ${CLUSTER_ID} to instance pool worker nodes" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i
Run Terraform
-
Configure Terraform secrets
cat <<EOF > ./terraform.env EXOSCALE_API_KEY EXOSCALE_API_SECRET TF_VAR_control_vshn_net_token GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='touch .terraformrc; docker run -it --rm \ -e REAL_UID=$(id -u) \ -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
-
Download a copy of the state before removing non-instance pool worker nodes
terraform state pull > state.json
-
Remove non-instance pool worker nodes from Terraform state
terraform state rm "module.cluster.module.worker.random_id.node_id" terraform state rm "module.cluster.module.worker.exoscale_compute_instance.nodes"
-
Remove non-instance pool additional worker nodes from Terraform state
This step can be skipped if there’s no additional worker groups This step only operates on additional worker groups which don’t explicitly enable or disable instance pool provisioning. for name in $( jq '.module.cluster.additional_worker_groups | to_entries[] | select(.value.use_instancepool == null) | .key' \ main.tf.json ); do terraform state rm "module.cluster.module.additional_worker[$name].random_id.node_id" terraform state rm "module.cluster.module.additional_worker[$name].exoscale_compute_instance.nodes" done
-
Deploy instance pool(s)
terraform apply
Join instance pool nodes to the cluster
-
Approve CSRs for new nodes
# Once CSRs in state Pending show up, approve them # Needs to be run twice, two CSRs for each node need to be approved kubectl --as=cluster-admin get csr -w oc --as=cluster-admin get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \ xargs oc --as=cluster-admin adm certificate approve kubectl --as=cluster-admin get nodes
-
Label new worker nodes
kubectl get node -ojson | \ jq -r '.items[] | select(.metadata.name | test("app|infra|master|storage-")|not) | select(.metadata.labels["node-role.kubernetes.io/app"] == null) .metadata.name' | \ xargs -I {} kubectl label --as=cluster-admin \ node {} node-role.kubernetes.io/app=
Drain and remove non-instance pool nodes
-
Drain non-instance pool worker nodes
for node in $( jq -r '.resources[] | select(.module == "module.cluster.module.worker" and .type == "random_id") | .instances[].attributes.hex' \ state.json ); do kubectl drain --as=cluster-admin --ignore-daemonsets \ --delete-emptydir-data --force node "$node" done
-
Drain non-instance pool additional worker nodes
This step can be skipped if there’s no additional worker groups This step only operates on additional worker groups which don’t explicitly enable or disable instance pool provisioning. for name in $( jq -r '.module.cluster.additional_worker_groups | to_entries[] | select(.value.use_instancepool == null) | .key' \ main.tf.json ); do echo "Draining nodes for additional worker group $name" for node in $( jq --arg name "$name" -r \ '.resources[] | select(.module == "module.cluster.module.additional_worker[\""+$name+"\"]" and .type == "random_id") | .instances[].attributes.hex' \ state.json ); do echo $node done done
kubectl drain --as=cluster-admin --ignore-daemonsets \ --delete-emptydir-data --force node "$node"
Deploy Exoscale Cloud Controller Manager
-
Create restricted API key for the Exoscale cloud-controller-manager
# Create Exoscale CCM Exoscale IAM role, if it doesn't exist yet in the organization ccm_role_id=$(exo iam role list -O json | \ jq -r '.[] | select(.name=="ccm-exoscale") | .key') if [ -z "${ccm_role_id}" ]; then cat <<EOF | exo iam role create ccm-exoscale \ --description "Exoscale CCM: Allow managing NLBs and reading instances/instance pools" \ --policy - { "default-service-strategy": "deny", "services": { "compute": { "type": "rules", "rules": [ { "expression": "operation in ['add-service-to-load-balancer', 'create-load-balancer', 'delete-load-balancer', 'delete-load-balancer-service', 'get-load-balancer', 'get-load-balancer-service', 'get-operation', 'list-load-balancers', 'reset-load-balancer-field', 'reset-load-balancer-service-field', 'update-load-balancer', 'update-load-balancer-service']", "action": "allow" }, { "expression": "operation in ['get-instance', 'get-instance-pool', 'get-instance-type', 'list-instances', 'list-instance-pools', 'list-zones']", "action": "allow" } ] } } } EOF fi # Create access key ccm_credentials=$(exo iam api-key create -O json \ "${CLUSTER_ID}_ccm-exoscale" ccm-exoscale) export CCM_ACCESSKEY=$(echo "${ccm_credentials}" | jq -r '.key') export CCM_SECRETKEY=$(echo "${ccm_credentials}" | jq -r '.secret')
-
Store CCM API key in Vault
# Set the CCM Exoscale Credentials vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/ccm \ access_key=${CCM_ACCESSKEY} \ secret_key=${CCM_SECRETKEY}
-
Deploy Exoscale CCM via Project Syn and enable default instance pool annotation injector
pushd "inventory/classes/${TENANT_ID}/" # Enable component yq eval -i ".applications += [\"exoscale-cloud-controller-manager\"]" \ ${CLUSTER_ID}.yml # Configure default instance pool annotation injector curl -fsu "${GITLAB_USER}:${GITLAB_TOKEN}" "$GITLAB_STATE_URL" |\ jq '[.resources[] | select(.module == "module.cluster.module.worker" and .type == "exoscale_instance_pool")][0].instances[0].attributes.id' |\ yq ea -i 'select(fileIndex == 0) * (select(fileIndex == 1) | {"parameters": {"exoscale_cloud_controller_manager":{"serviceLoadBalancerDefaultAnnotations":{"service.beta.kubernetes.io/exoscale-loadbalancer-service-instancepool-id": .}}}}) ' \ "$CLUSTER_ID.yml" - git commit -a -m "Deploy Exoscale CCM on cluster ${CLUSTER_ID}" git push popd