Adopt worker nodes with the cloudscale Machine API Provider

Steps to adopt worker nodes on cloudscale with the cloudscale Machine API Provider.

Starting situation

  • You already have an OpenShift 4 cluster on cloudscale

  • You have admin-level access to the cluster

  • You want the nodes adopted by the cloudscale Machine API Provider

Prerequisites

The following CLI utilities need to be available locally:

Prepare local environment

  1. Create local directory to work in

    We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

    export WORK_DIR=/path/to/work/dir
    mkdir -p "${WORK_DIR}"
    pushd "${WORK_DIR}"
  2. Configure API access

    Access to cloud API
    # From https://control.cloudscale.ch/service/<your-project>/api-token
    export CLOUDSCALE_API_TOKEN=<cloudscale-api-token>
    Access to VSHN GitLab
    # From https://git.vshn.net/-/user_settings/personal_access_tokens, "api" scope is sufficient
    export GITLAB_TOKEN=<gitlab-api-token>
    export GITLAB_USER=<gitlab-user-name>
Access to VSHN Lieutenant
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
Configuration for hieradata commits
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens
  1. Get required tokens from Vault

    Connect with Vault
    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=oidc
    Grab the LB hieradata repo token from Vault
    export HIERADATA_REPO_SECRET=$(vault kv get \
      -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
    export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
    export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
    Get Floaty credentials
    export TF_VAR_lb_cloudscale_api_secret=$(vault kv get \
      -format=json "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty" | jq -r '.data.data.iam_secret')
  2. Compile the catalog for the cluster. Having the catalog available locally enables us to run Terraform for the cluster to make any required changes.

    commodore catalog compile "${CLUSTER_ID}"

Update Cluster Config

  1. Update cluster config

    pushd inventory/classes/"${TENANT_ID}"
    
    yq -i '.applications += "machine-api-provider-cloudscale"' \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.make_worker_adoptable_by_provider = true" \
      ${CLUSTER_ID}.yml
    yq eval -i '.parameters.machine_api_provider_cloudscale.secrets["cloudscale-user-data"].stringData.ignitionCA = "${openshift4_terraform:terraform_variables:ignition_ca}"' \
      ${CLUSTER_ID}.yml
    
    git commit -m "Allow adoption of worker nodes" "${CLUSTER_ID}.yml"
    git push
    popd
  2. Compile and push the cluster catalog.

    commodore catalog compile "${CLUSTER_ID}" --push

Prepare Terraform environment

  1. Configure Terraform secrets

    cat <<EOF > ./terraform.env
    CLOUDSCALE_API_TOKEN
    TF_VAR_ignition_bootstrap
    TF_VAR_lb_cloudscale_api_secret
    TF_VAR_control_vshn_net_token
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    HIERADATA_REPO_TOKEN
    EOF
  2. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='touch .terraformrc; docker run -it --rm \
      -e REAL_UID=$(id -u) \
      -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"

Run terraform

  1. Verify terraform output and apply the changes if everything looks good.

    Terraform will tag the nodes as preparation for the adoption by the cloudscale Machine API Provider.

    terraform apply

Apply Machine and MachineSet manifests

Please ensure the terraform apply has completed successfully before proceeding with this step. Without the tags applied by Terraform, nodes will be duplicated with the same name and weird stuff might happen.

Please be careful to not apply the MachineSet before the Machine manifests.

  1. Copy worker-machines_yml from the Terraform output and apply it to the cluster.

    terraform output -raw worker-machines_yml | yq -P > worker-machines.yml
    head worker-machines.yml
    kubectl apply -f worker-machines.yml
  2. Check that all machines are in the Running state.

    kubectl get -f worker-machines.yml
  3. Copy worker-machineset_yml from the Terraform output and apply it to the cluster.

    terraform output -raw worker-machineset_yml | yq -P > worker-machineset.yml
    head worker-machineset.yml
    kubectl apply -f worker-machineset.yml
  4. Copy infra-machines_yml from the Terraform output and apply it to the cluster.

    terraform output -raw infra-machines_yml | yq -P > infra-machines.yml
    head infra-machines.yml
    kubectl apply -f infra-machines.yml
  5. Check that all machines are in the Running state.

    kubectl get -f infra-machines.yml
  6. Copy infra-machineset_yml from the Terraform output and apply it to the cluster.

    terraform output -raw infra-machineset_yml | yq -P > infra-machineset.yml
    head infra-machineset.yml
    kubectl apply -f infra-machineset.yml
  7. Check for additional worker groups and apply them if necessary.

    terraform output -raw additional-worker-machines_yml > /dev/null 2>&1 || echo "No additional worker groups"
  8. If the output shows "No additional worker groups," jump to Remove nodes from the Terraform state.

  9. Copy additional-worker-machines_yml from the Terraform output and apply it to the cluster.

    terraform output -raw additional-worker-machines_yml | yq -P > additional-worker-machines.yml
    head additional-worker-machines.yml
    kubectl apply -f additional-worker-machines.yml
  10. Check that all machines are in the Running state.

    kubectl get -f additional-worker-machines.yml
  11. Copy additional-worker-machinesets_yml from the Terraform output and apply it to the cluster.

    terraform output -raw additional-worker-machinesets_yml | yq -P > additional-worker-machinesets.yml
    head additional-worker-machinesets.yml
    kubectl apply -f additional-worker-machinesets.yml

Remove nodes from the Terraform state

  1. Remove the nodes from the Terraform state.

    terraform state rm module.cluster.module.worker
    terraform state rm module.cluster.module.infra
    terraform state rm module.cluster.module.additional_worker
    cat > override.tf <<EOF
    module "cluster" {
      infra_count              = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
  2. Check the terraform plan output and apply the changes. There should be no server recreation. Hieradata changes must be ignored, otherwise the cluster ingress controller will become unavailable.

    terraform plan
    terraform apply

Cleanup

  1. Persist the Terraform changes and start managing the machine sets.

    popd
    pushd "inventory/classes/${TENANT_ID}"
    
    yq -i e '.parameters.openshift4_terraform.terraform_variables.additional_worker_groups= {}' \
      "${CLUSTER_ID}.yml"
    yq -i e '.parameters.openshift4_terraform.terraform_variables.infra_count = 0' \
      "${CLUSTER_ID}.yml"
    yq -i e '.parameters.openshift4_terraform.terraform_variables.worker_count = 0' \
      "${CLUSTER_ID}.yml"
    
    yq -i ea 'select(fileIndex == 0) as $cluster |
              $cluster.parameters.openshift4_nodes.machineSets =
                ([select(fileIndex > 0)][] as $ms ireduce ({};
                  $ms.metadata.name as $msn |
                  del($ms.apiVersion) |
                  del($ms.kind) |
                  del($ms.metadata.name) |
                  del($ms.metadata.labels.name) |
                  del($ms.metadata.namespace) |
                  . * {$msn: $ms}
                )) |
              $cluster' \
      "${CLUSTER_ID}.yml" ../../../catalog/manifests/openshift4-terraform/*machineset*.yml
    
    git commit -am "Persist provider adopted machine and terraform state for ${CLUSTER_ID}"
    git push origin master
    popd
    
    commodore catalog compile "${CLUSTER_ID}" --push