Uninstallation on Exoscale

Steps to remove an OpenShift 4 cluster from Exoscale.

  • The commands are idempotent and can be retried if any of the steps fail.

  • In the future, this procedure will be mostly automated

Always follow the 4-eye data deletion (Internal link) principle when decommissioning productive clusters.

Prerequisites

Cluster Decommission

  1. Create a new API key with role Owner in the project of the cluster

    The Owner role is created automatically for each Exoscale project
  2. Setup environment variables for Exoscale, GitLab and Vault credentials

    export GITLAB_TOKEN=<gitlab-api-token> # From https://git.vshn.net/-/user_settings/personal_access_tokens
    export GITLAB_USER=<gitlab-user-name>
    export EXOSCALE_API_KEY=<exoscale api key>
    export EXOSCALE_API_SECRET=<exoscale api secret>
    Connect with Vault
    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=oidc
  3. Configure API access

    export COMMODORE_API_URL=https://api.syn.vshn.net (1)
    
    # Set Project Syn cluster and tenant ID
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-cluster-id-1234
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
    1 Replace with the API URL of the desired Lieutenant instance.
  4. Create a local directory to work in and compile the cluster catalog

    export WORK_DIR=/path/to/work/dir
    mkdir -p "${WORK_DIR}"
    pushd "${WORK_DIR}"
    
    commodore catalog compile "${CLUSTER_ID}"

    We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

  5. Setup environment variables for Exoscale credentials

    export EXOSCALE_ZONE=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region)
    export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"
  6. Get emergency credentials for cluster access

    API_URL=$(yq '.spec.dnsNames[0]' catalog/manifests/openshift4-api/00_certs.yaml)
    export EMR_KUBERNETES_ENDPOINT="https://${API_URL}:6443"
    emergency-credentials-receive $CLUSTER_ID
    export KUBECONFIG="em-${CLUSTER_ID}"
    kubectl cluster-info
  7. Disable Syn

    kubectl -n syn patch apps --type=json \
        -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]' \
        root argocd
    kubectl -n syn-argocd-operator scale deployment \
        syn-argocd-operator-controller-manager --replicas 0
    kubectl -n syn scale sts syn-argocd-application-controller --replicas 0
  8. Delete all LoadBalancer services on the cluster

    kubectl delete svc --field-selector spec.type=LoadBalancer -A
    This is required in order for Terraform to be able to delete the instance pool.
  9. Configure Terraform secrets

    cat <<EOF > terraform.env
    EXOSCALE_API_KEY
    EXOSCALE_API_SECRET
    EOF
  10. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='touch .terraformrc; docker run -it --rm \
      -e REAL_UID=$(id -u) \
      -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"
  11. Grab location of LB backups and potential Icinga2 satellite host before decommissioning VMs.

    declare -a LB_FQDNS
    for id in 1 2; do
      LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.exoscale_domain_record.lb[$(expr $id - 1)]" | grep hostname | awk '{print $3}' | tr -d ' "\r\n')
    done
    for lb in ${LB_FQDNS[*]}; do
      ssh "${lb}" "sudo grep 'server =' /etc/burp/burp.conf && sudo grep 'ParentZone' /etc/icinga2/constants.conf" | tee "../../../$lb.info"
    done
  12. Set downtimes for both LBs in Icinga2.

  13. Remove APPUiO hieradata Git repository resource from Terraform state

    terraform state rm "module.cluster.module.lb.module.hiera[0].gitfile_checkout.appuio_hieradata"
    This step is necessary to ensure the subsequent terraform destroy completes without errors.
  14. Delete resources using Terraform

    terraform destroy
    
    popd
  15. Use Exoscale CLI tool to empty and remove buckets

    # Bootstrap bucket
    exo storage rb -r -f "${CLUSTER_ID}-bootstrap"
    # OpenShift Image Registry bucket
    exo storage rb -r -f "${CLUSTER_ID}-image-registry"
    # OpenShift Loki logstore
    exo storage rb -r -f "${CLUSTER_ID}-logstore"
  16. Delete the cluster-backup bucket

    Verify that the cluster backups aren’t needed anymore before cleaning up the backup bucket. Consider extracting the most recent cluster objects and etcd backups before deleting the bucket. See the Recover objects from backup how-to for instructions. At this point in the decommissioning process, you’ll have to extract the Restic configuration from Vault instead of the cluster itself.

    exo storage rb -r -f "${CLUSTER_ID}-cluster-backup"
  17. Delete the cluster’s API keys and the API key created for decommissioning

    # delete restricted api keys
    exo iam api-key delete -f ${CLUSTER_ID}_appcat-provider-exoscale
    exo iam api-key delete -f ${CLUSTER_ID}_ccm-exoscale
    exo iam api-key delete -f ${CLUSTER_ID}_csi-driver-exoscale
    exo iam api-key delete -f ${CLUSTER_ID}_floaty
    exo iam api-key delete -f ${CLUSTER_ID}_object_storage
    
    # delete decommissioning api key
    exo iam api-key delete -f ${CLUSTER_ID} (1)
    1 This command assumes that the decommissioning api key’s name is the cluster’s Project Syn ID
  18. Decommission Puppet-managed LBs

    See the VSHN documentation (Internal link) for the full instructions.
    1. Remove both LBs in control.vshn.net

    2. Clean encdata caches

      for lb in ${LB_FQDNS[*]}; do
        ssh nfs1.ch1.puppet.vshn.net \
          "sudo rm /srv/nfs/export/puppetserver-puppetserver-enc-cache-pvc-*/${lb}.yaml"
      done
    3. Clean up LBs in Icinga

      parent_zone=$(grep "ParentZone = " ${LB_FQDNS[1]}.info | cut -d = -f2 | tr -d '" ')
      if [ "$parent_zone" != "master" ]; then
        icinga_host="$parent_zone"
      else
        icinga_host="master2.prod.monitoring.vshn.net"
      fi
      
      prompt="Clean up LBs in Icinga ${icinga_host}? "
      if [ -n "$ZSH_VERSION" ]; then
        read -k 1 "?${prompt}"
      else
        read -p "${prompt}" -n 1 -r && echo
      fi
      if [[ $REPLY =~ '^[Yy]' ]]; then
        for lb in ${LB_FQDNS[*]}; do
          ssh "${icinga_host}" "sudo rm -rf /var/lib/icinga2/api/zones/${lb}"
        done
        if [ "$parent_zone" != "master" ]; then
          ssh "${icinga_host}" sudo puppetctl run
        fi
        ssh master2.prod.monitoring.vshn.net sudo puppetctl run
      fi
    4. Remove LBs in nodes hieradata

      git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git
      pushd nodes_hieradata
      
      for lb in ${LB_FQDNS[*]}; do
        git rm ${lb}.yaml
      done
      
      git commit -m"Decommission LBs for ${CLUSTER_ID}"
      git push origin master
      
      popd
    5. Remove cluster in appuio hieradata

      git clone git@git.vshn.net:appuio/appuio_hieradata.git
      pushd appuio_hieradata
      
      git rm -rf lbaas/${CLUSTER_ID}*
      
      git commit -m"Decommission ${CLUSTER_ID}"
      git push origin master
      
      popd
    6. Delete LB backup client certs and backups on Burp server

      for lb in ${LB_FQDNS[*]}; do
        backup_server=$(grep "server = " ${lb}.info | cut -d= -f2)
        ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.crt"
        ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.csr"
        ssh "$backup_server" "rm -rf /var/lib/burp/${lb}"
      done
  19. Remove cluster DNS records from VSHN DNS zonefiles

    This step isn’t necessary for clusters where the customer manages DNS.
  20. Delete Vault secrets for the cluster

    for secret in $(find catalog/refs/ -type f -printf "clusters/kv/%P\n" \
        | sed -r 's#(.*)/.*#\1#' | grep -v '__shared__/__shared__' \
        | sort -u);
    do
      vault kv delete "$secret"
    done
  21. Delete cluster from Lieutenant API (via portal)

    • Select the Lieutenant API Endpoint

    • Search cluster name

    • Delete cluster entry using the delete button

  22. Delete Keycloak service (via portal)

    • Search cluster name

    • Delete cluster entry service using the delete button

  23. Update any related documentation