Uninstallation on cloudscale

Steps to remove an OpenShift 4 cluster from cloudscale.ch.

The commands are idempotent and can be retried if any of the steps fail.
In the future, this procedure will be mostly automated

Prerequisites

The following CLI utilities need to be available locally:

docker
curl
kubectl
oc
vault Vault CLI
commodore, see Running Commodore
jq
yq yq YAML processor (version 4 or higher)
macOS: gdate from GNU coreutils, brew install coreutils
emergency-credentials-receive Install instructions

Cluster Decommission

Setup environment variables for GitLab and Vault credentials

export GITLAB_TOKEN=<gitlab-api-token> # From https://git.vshn.net/-/user_settings/personal_access_tokens
export GITLAB_USER=<gitlab-user-name>

Connect with Vault

export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc

Configure API access

export COMMODORE_API_URL=https://api.syn.vshn.net (1)

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-cluster-id-1234
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)

1	Replace with the API URL of the desired Lieutenant instance.

Create a local directory to work in and compile the cluster catalog

export WORK_DIR=/path/to/work/dir
mkdir -p "${WORK_DIR}"
pushd "${WORK_DIR}"

commodore catalog compile "${CLUSTER_ID}"

We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.

Setup environment variables for cloudscale credentials

export CLOUDSCALE_API_TOKEN=$(vault kv get -format=json clusters/kv/$TENANT_ID/$CLUSTER_ID/cloudscale | jq -r .data.data.token)
export REGION=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region)
export BACKUP_REGION=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" https://api.cloudscale.ch/v1/regions | jq -r '.[].slug' | grep -v $REGION)
export HIERADATA_REPO_SECRET=$(vault kv get \
  -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq -r '.data.data.token')

Get emergency credentials for cluster access

API_URL=$(yq '.spec.dnsNames[0]' catalog/manifests/openshift4-api/00_certs.yaml)
export EMR_KUBERNETES_ENDPOINT="https://${API_URL}:6443"
emergency-credentials-receive $CLUSTER_ID
export KUBECONFIG="em-${CLUSTER_ID}"
kubectl cluster-info

Disable Syn

kubectl -n syn patch apps --type=json \
    -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]' \
    root argocd
kubectl -n syn-argocd-operator scale deployment \
    syn-argocd-operator-controller-manager --replicas 0
kubectl -n syn scale sts syn-argocd-application-controller --replicas 0

Delete all LB services

kubectl delete svc --field-selector spec.type=LoadBalancer -A

Delete all PVs

kubectl cordon -l node-role.kubernetes.io/worker
kubectl get po -A -oyaml | yq '.items = [.items[] |
    select(.spec.nodeName | test("master-") | not) |
    select(.metadata.namespace != "syn-csi-cloudscale")]' |\
    kubectl delete --wait=false -f-
kubectl delete pvc -A --all --wait=false
kubectl wait --for=delete pv --all --timeout=120s

By cordoning all non-master nodes and deleting all their pods (except the csi driver pods) we ensure that no new PVs are created, while the existing ones can be cleaned up. Deleting all pods has the additional benefit that we don’t have to deal with PDBs when deleting the machinesets in the next step.

Delete remaining PVs explicitly if not all PVs were deleted in the previous step.

kubectl delete pv --all

Delete all machinesets

kubectl -n openshift-machine-api delete machinesets --all
kubectl -n openshift-machine-api wait --for=delete \
    machinesets,machines --all --timeout=120s
kubectl get nodes

Configure Terraform secrets

cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN
HIERADATA_REPO_TOKEN
EOF

Setup Terraform

Prepare Terraform execution environment

# Set terraform image and tag to be used
tf_image=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.image" \
  dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
  yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
  dependencies/openshift4-terraform/class/defaults.yml)

# Generate the terraform alias
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run -it --rm \
  -e REAL_UID=$(id -u) \
  -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \
  --env-file ${base_dir}/terraform.env \
  -w /tf \
  -v $(pwd):/tf \
  --ulimit memlock=-1 \
  "${tf_image}:${tf_tag}" /tf/terraform.sh'

export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"

pushd catalog/manifests/openshift4-terraform/

Initialize Terraform

terraform init \
  "-backend-config=address=${GITLAB_STATE_URL}" \
  "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
  "-backend-config=username=${GITLAB_USER}" \
  "-backend-config=password=${GITLAB_TOKEN}" \
  "-backend-config=lock_method=POST" \
  "-backend-config=unlock_method=DELETE" \
  "-backend-config=retry_wait_min=5"

Grab location of LB backups and potential Icinga2 satellite host before decommissioning VMs.

declare -a LB_FQDNS
for id in 1 2; do
  LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[$(expr $id - 1)]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n')
done
for lb in ${LB_FQDNS[*]}; do
  ssh "${lb}" "sudo grep 'server =' /etc/burp/burp.conf && sudo grep 'ParentZone' /etc/icinga2/constants.conf" | tee "../../../$lb.info"
done

Set downtimes for both LBs in Icinga2.
Remove APPUiO hieradata Git repository resource from Terraform state
```
terraform state rm "module.cluster.module.lb.module.hiera[0].gitfile_checkout.appuio_hieradata"
```
This step is necessary to ensure the subsequent terraform destroy completes without errors.

Delete resources from cloudscale using Terraform

# The first time it will fail
terraform destroy
# Destroy a second time to delete private networks
terraform destroy

popd

After all resources are deleted we need to remove the buckets

# Use already exiting bucket user
response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" \
  https://api.cloudscale.ch/v1/objects-users | \
  jq -e ".[] | select(.display_name == \"${CLUSTER_ID}\")")

# configure minio client to use the bucket
mc config host add \
  "${CLUSTER_ID}" "https://objects.${REGION}.cloudscale.ch" \
  $(echo $response | jq -r '.keys[0].access_key') \
  $(echo $response | jq -r '.keys[0].secret_key')

# delete bootstrap-ignition bucket (should already be deleted after setup)
mc rb "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition" --force

# delete image-registry bucket
mc rb "${CLUSTER_ID}/${CLUSTER_ID}-image-registry" --force

# delete Loki logstore bucket
mc rb "${CLUSTER_ID}/${CLUSTER_ID}-logstore" --force

Delete the cluster-backup bucket in the cloudscale project

Verify that the cluster backups aren’t needed anymore before cleaning up the backup bucket. Consider extracting the most recent cluster objects and etcd backups before deleting the bucket. See the Recover objects from backup how-to for instructions. At this point in the decommissioning process, you’ll have to extract the Restic configuration from Vault instead of the cluster itself.

# configure minio client to use the bucket
mc config host add \
  "${CLUSTER_ID}_backup" "https://objects.${BACKUP_REGION}.cloudscale.ch" \
  $(echo $response | jq -r '.keys[0].access_key') \
  $(echo $response | jq -r '.keys[0].secret_key')

mc rb "${CLUSTER_ID}_backup/${CLUSTER_ID}-cluster-backup" --force

# delete cloudscale object user
curl -i -H "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" -X DELETE $(echo $response | jq -r '.href')

Delete all remaining volumes which were associated with the cluster in the cloudscale project.

This step is required because the csi-cloudscale driver doesn’t have time to properly cleanup PVs when the cluster is decommissioned with terraform destroy.
Delete the cluster’s API tokens in the cloudscale UI

Decommission Puppet-managed LBs

See the VSHN documentation (Internal link) for the full instructions.

Remove both LBs in control.vshn.net

Clean encdata caches

for lb in ${LB_FQDNS[*]}; do
  ssh nfs1.ch1.puppet.vshn.net \
    "sudo rm /srv/nfs/export/puppetserver-puppetserver-enc-cache-pvc-*/${lb}.yaml"
done

Clean up LBs in Icinga

parent_zone=$(grep "ParentZone = " ${LB_FQDNS[1]}.info | cut -d = -f2 | tr -d '" ')
if [ "$parent_zone" != "master" ]; then
  icinga_host="$parent_zone"
else
  icinga_host="master2.prod.monitoring.vshn.net"
fi

prompt="Clean up LBs in Icinga ${icinga_host}? "
if [ -n "$ZSH_VERSION" ]; then
  read -k 1 "?${prompt}"
else
  read -p "${prompt}" -n 1 -r && echo
fi
if [[ $REPLY =~ '^[Yy]' ]]; then
  for lb in ${LB_FQDNS[*]}; do
    ssh "${icinga_host}" "sudo rm -rf /var/lib/icinga2/api/zones/${lb}"
  done
  if [ "$parent_zone" != "master" ]; then
    ssh "${icinga_host}" sudo puppetctl run
  fi
  ssh master2.prod.monitoring.vshn.net sudo puppetctl run
fi

Remove LBs in nodes hieradata

git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git
pushd nodes_hieradata

for lb in ${LB_FQDNS[*]}; do
  git rm ${lb}.yaml
done

git commit -m"Decommission LBs for ${CLUSTER_ID}"
git push origin master

popd

Remove cluster in appuio hieradata

git clone git@git.vshn.net:appuio/appuio_hieradata.git
pushd appuio_hieradata

git rm -rf lbaas/${CLUSTER_ID}*

git commit -m"Decommission ${CLUSTER_ID}"
git push origin master

popd

Delete LB backup client certs and backups on Burp server

for lb in ${LB_FQDNS[*]}; do
  backup_server=$(grep "server = " ${lb}.info | cut -d= -f2)
  ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.crt"
  ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.csr"
  ssh "$backup_server" "rm -rf /var/lib/burp/${lb}"
done

Remove cluster DNS records from VSHN DNS zonefiles

This step isn’t necessary for clusters where the customer manages DNS.

Delete Vault secrets for the cluster

for secret in $(find catalog/refs/ -type f -printf "clusters/kv/%P\n" \
    | sed -r 's#(.*)/.*#\1#' | grep -v '__shared__/__shared__' \
    | sort -u);
do
  vault kv delete "$secret"
done

Delete cluster from Lieutenant API (via portal)

Go to control.vshn.net/syn/lieutenantclusters
- Select the Lieutenant API Endpoint
- Search cluster name
- Delete cluster entry using the delete button
Delete Keycloak service (via portal)

Go to control.vshn.net/vshn/services
- Search cluster name
- Delete cluster entry service using the delete button
Update any related documentation