Uninstallation on cloudscale.ch

Steps to remove an OpenShift 4 cluster from cloudscale.ch.

  • The commands are idempotent and can be retried if any of the steps fail.

  • In the future, this procedure will be mostly automated

Prerequisites

The following CLI utilities need to be available locally:

Cluster Decommission

  1. Export the following vars

    export GITLAB_TOKEN=<gitlab-api-token> # From https://git.vshn.net/-/user_settings/personal_access_tokens
    export GITLAB_USER=<gitlab-user-name>
  2. Grab cluster tokens and facts from Vault and Lieutenant

    Connect with Vault
    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=oidc
  3. Configure API access

    export COMMODORE_API_URL=https://api.syn.vshn.net (1)
    
    # Set Project Syn cluster and tenant ID
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-cluster-id-1234
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
    1 Replace with the API URL of the desired Lieutenant instance.
  4. Create a local directory to work in and compile the cluster catalog

    export WORK_DIR=/path/to/work/dir
    mkdir -p "${WORK_DIR}"
    pushd "${WORK_DIR}"
    
    commodore catalog compile "${CLUSTER_ID}"

    We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

    export CLOUDSCALE_API_TOKEN=$(vault kv get -format=json clusters/kv/$TENANT_ID/$CLUSTER_ID/cloudscale | jq -r .data.data.token)
    export REGION=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region)
    export BACKUP_REGION=$(curl -H "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" https://api.cloudscale.ch/v1/regions | jq -r '.[].slug' | grep -v $REGION)
    export HIERADATA_REPO_SECRET=$(vault kv get \
      -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq -r '.data.data.token')
  5. Configure Terraform secrets

    cat <<EOF > ./terraform.env
    CLOUDSCALE_API_TOKEN
    HIERADATA_REPO_TOKEN
    EOF
  6. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='touch .terraformrc; docker run -it --rm \
      -e REAL_UID=$(id -u) \
      -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"
  7. Grab location of LB backups and potential Icinga2 satellite host before decommissioning VMs.

    declare -a LB_FQDNS
    for id in 1 2; do
      LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[$(expr $id - 1)]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n')
    done
    for lb in ${LB_FQDNS[*]}; do
      ssh "${lb}" "sudo grep 'server =' /etc/burp/burp.conf && sudo grep 'ParentZone' /etc/icinga2/constants.conf" | tee "../../../$lb.info"
    done
  8. Set downtimes for both LBs in Icinga2.

  9. Remove APPUiO hieradata Git repository resource from Terraform state

    terraform state rm module.cluster.module.lb.module.hiera[0].gitfile_checkout.appuio_hieradata
    This step is necessary to ensure the subsequent terraform destroy completes without errors.
  10. Delete resources from clouscale.ch using Terraform

    # The first time it will fail
    terraform destroy
    # Destroy a second time to delete private networks
    terraform destroy
    popd
  11. After all resources are deleted we need to remove the buckets

    # Use already exiting bucket user
    response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" \
      https://api.cloudscale.ch/v1/objects-users | \
      jq -e ".[] | select(.display_name == \"${CLUSTER_ID}\")")
    
    # configure minio client to use the bucket
    mc config host add \
      "${CLUSTER_ID}" "https://objects.${REGION}.cloudscale.ch" \
      $(echo $response | jq -r '.keys[0].access_key') \
      $(echo $response | jq -r '.keys[0].secret_key')
    
    # delete bootstrap-ignition bucket (should already be deleted after setup)
    mc rb "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition" --force
    
    # delete image-registry object
    mc rb "${CLUSTER_ID}/${CLUSTER_ID}-image-registry" --force
  12. Delete the cluster-backup bucket in the cloudscale.ch project

    Verify that the cluster backups aren’t needed anymore before cleaning up the backup bucket. Consider extracting the most recent cluster objects and etcd backups before deleting the bucket. See the Recover objects from backup how-to for instructions. At this point in the decommissioning process, you’ll have to extract the Restic configuration from Vault instead of the cluster itself.

    # configure minio client to use the bucket
    mc config host add \
      "${CLUSTER_ID}_backup" "https://objects.${BACKUP_REGION}.cloudscale.ch" \
      $(echo $response | jq -r '.keys[0].access_key') \
      $(echo $response | jq -r '.keys[0].secret_key')
    
    mc rb "${CLUSTER_ID}_backup/${CLUSTER_ID}-cluster-backup" --force
    
    # delete cloudscale.ch user object
    curl -i -H "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" -X DELETE $(echo $response | jq -r '.href')
  13. Delete vault entries:

    for secret in $(find catalog/refs/ -type f -printf "clusters/kv/%P\n" \
        | sed -r 's#(.*)/.*#\1#' | grep -v '__shared__/__shared__' \
        | sort -u);
    do
      vault kv delete "$secret"
    done
  14. Decommission Puppet-managed LBs according to the VSHN documentation (Internal link).

    The documentation linked above requires some information to be retrieved from the already-deleted load balancers.

    If you’ve been following these instructions, you stored this information into a file earlier:

    for lb in ${LB_FQDNS[*]}; do
      echo "$lb"
      cat "$lb.info"
    done
    Don’t forget to remove the LB configuration in the APPUiO hieradata and the nodes hieradata.
  15. Delete cluster from Lieutenant API (via portal)

    • Select the Lieutenant API Endpoint

    • Search cluster name

    • Delete cluster entry using the delete button

  16. Delete all remaining volumes which were associated with the cluster in the cloudscale.ch project.

    This step is required because the csi-cloudscale driver doesn’t have time to properly cleanup PVs when the cluster is decommissioned with terraform destroy.
  17. Delete the cluster’s API tokens in the cloudscale UI

  18. Delete Keycloak service (via portal)

    • Search cluster name

    • Delete cluster entry service using the delete button

  19. Delete all DNS records related with cluster (zonefiles)

  20. Update any related documentation