Uninstallation on cloudscale
Steps to remove an OpenShift 4 cluster from cloudscale.ch.
|
Prerequisites
The following CLI utilities need to be available locally:
-
docker
-
curl
-
kubectl
-
oc
-
vault
Vault CLI -
commodore
, see Running Commodore -
jq
-
yq
yq YAML processor (version 4 or higher) -
macOS:
gdate
from GNU coreutils,brew install coreutils
-
emergency-credentials-receive
Install instructions
Cluster Decommission
-
Setup environment variables for GitLab and Vault credentials
export GITLAB_TOKEN=<gitlab-api-token> # From https://git.vshn.net/-/user_settings/personal_access_tokens export GITLAB_USER=<gitlab-user-name>
Connect with Vaultexport VAULT_ADDR=https://vault-prod.syn.vshn.net vault login -method=oidc
-
Configure API access
export COMMODORE_API_URL=https://api.syn.vshn.net (1) # Set Project Syn cluster and tenant ID export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-cluster-id-1234 export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
1 Replace with the API URL of the desired Lieutenant instance. -
Create a local directory to work in and compile the cluster catalog
export WORK_DIR=/path/to/work/dir mkdir -p "${WORK_DIR}" pushd "${WORK_DIR}" commodore catalog compile "${CLUSTER_ID}"
We strongly recommend creating an empty directory, unless you already have a work directory for the cluster you’re about to work on. This guide will run Commodore in the directory created in this step.
-
Setup environment variables for cloudscale credentials
export CLOUDSCALE_API_TOKEN=$(vault kv get -format=json clusters/kv/$TENANT_ID/$CLUSTER_ID/cloudscale | jq -r .data.data.token) export REGION=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region) export BACKUP_REGION=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" https://api.cloudscale.ch/v1/regions | jq -r '.[].slug' | grep -v $REGION) export HIERADATA_REPO_SECRET=$(vault kv get \ -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq -r '.data.data.token')
-
Get emergency credentials for cluster access
API_URL=$(yq '.spec.dnsNames[0]' catalog/manifests/openshift4-api/00_certs.yaml) export EMR_KUBERNETES_ENDPOINT="https://${API_URL}:6443" emergency-credentials-receive $CLUSTER_ID export KUBECONFIG="em-${CLUSTER_ID}" kubectl cluster-info
-
Disable Syn
kubectl -n syn patch apps --type=json \ -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]' \ root argocd kubectl -n syn-argocd-operator scale deployment \ syn-argocd-operator-controller-manager --replicas 0 kubectl -n syn scale sts syn-argocd-application-controller --replicas 0
-
Delete all LB services
kubectl delete svc --field-selector spec.type=LoadBalancer -A
-
Delete all PVs
kubectl cordon -l node-role.kubernetes.io/worker kubectl get po -A -oyaml | yq '.items = [.items[] | select(.spec.nodeName | test("master-") | not) | select(.metadata.namespace != "syn-csi-cloudscale")]' |\ kubectl delete --wait=false -f- kubectl delete pvc -A --all --wait=false kubectl wait --for=delete pv --all --timeout=120s
By cordoning all non-master nodes and deleting all their pods (except the csi driver pods) we ensure that no new PVs are created, while the existing ones can be cleaned up. Deleting all pods has the additional benefit that we don’t have to deal with PDBs when deleting the machinesets in the next step. Delete remaining PVs explicitly if not all PVs were deleted in the previous step.
kubectl delete pv --all
-
Delete all machinesets
kubectl -n openshift-machine-api delete machinesets --all kubectl -n openshift-machine-api wait --for=delete \ machinesets,machines --all --timeout=120s kubectl get nodes
-
Configure Terraform secrets
cat <<EOF > ./terraform.env CLOUDSCALE_API_TOKEN HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='touch .terraformrc; docker run -it --rm \ -e REAL_UID=$(id -u) \ -e TF_CLI_CONFIG_FILE=/tf/.terraformrc \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
-
Grab location of LB backups and potential Icinga2 satellite host before decommissioning VMs.
declare -a LB_FQDNS for id in 1 2; do LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[$(expr $id - 1)]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n') done for lb in ${LB_FQDNS[*]}; do ssh "${lb}" "sudo grep 'server =' /etc/burp/burp.conf && sudo grep 'ParentZone' /etc/icinga2/constants.conf" | tee "../../../$lb.info" done
-
Set downtimes for both LBs in Icinga2.
-
Remove APPUiO hieradata Git repository resource from Terraform state
terraform state rm "module.cluster.module.lb.module.hiera[0].gitfile_checkout.appuio_hieradata"
This step is necessary to ensure the subsequent terraform destroy
completes without errors. -
Delete resources from cloudscale using Terraform
# The first time it will fail terraform destroy # Destroy a second time to delete private networks terraform destroy
popd
-
After all resources are deleted we need to remove the buckets
# Use already exiting bucket user response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" \ https://api.cloudscale.ch/v1/objects-users | \ jq -e ".[] | select(.display_name == \"${CLUSTER_ID}\")") # configure minio client to use the bucket mc config host add \ "${CLUSTER_ID}" "https://objects.${REGION}.cloudscale.ch" \ $(echo $response | jq -r '.keys[0].access_key') \ $(echo $response | jq -r '.keys[0].secret_key') # delete bootstrap-ignition bucket (should already be deleted after setup) mc rb "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition" --force # delete image-registry object mc rb "${CLUSTER_ID}/${CLUSTER_ID}-image-registry" --force
-
Delete the cluster-backup bucket in the cloudscale project
Verify that the cluster backups aren’t needed anymore before cleaning up the backup bucket. Consider extracting the most recent cluster objects and etcd backups before deleting the bucket. See the Recover objects from backup how-to for instructions. At this point in the decommissioning process, you’ll have to extract the Restic configuration from Vault instead of the cluster itself.
# configure minio client to use the bucket mc config host add \ "${CLUSTER_ID}_backup" "https://objects.${BACKUP_REGION}.cloudscale.ch" \ $(echo $response | jq -r '.keys[0].access_key') \ $(echo $response | jq -r '.keys[0].secret_key') mc rb "${CLUSTER_ID}_backup/${CLUSTER_ID}-cluster-backup" --force # delete cloudscale object user curl -i -H "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" -X DELETE $(echo $response | jq -r '.href')
-
Delete all remaining volumes which were associated with the cluster in the cloudscale project.
This step is required because the csi-cloudscale driver doesn’t have time to properly cleanup PVs when the cluster is decommissioned with terraform destroy
. -
Delete the cluster’s API tokens in the cloudscale UI
-
Decommission Puppet-managed LBs
See the VSHN documentation (Internal link) for the full instructions. -
Remove both LBs in control.vshn.net
-
Clean encdata caches
for lb in ${LB_FQDNS[*]}; do ssh nfs1.ch1.puppet.vshn.net \ "sudo rm /srv/nfs/export/puppetserver-puppetserver-enc-cache-pvc-*/${lb}.yaml" done
-
Clean up LBs in Icinga
parent_zone=$(grep "ParentZone = " ${LB_FQDNS[1]}.info | cut -d = -f2 | tr -d '" ') if [ "$parent_zone" != "master" ]; then icinga_host="$parent_zone" else icinga_host="master2.prod.monitoring.vshn.net" fi prompt="Clean up LBs in Icinga ${icinga_host}? " if [ -n "$ZSH_VERSION" ]; then read -k 1 "?${prompt}" else read -p "${prompt}" -n 1 -r && echo fi if [[ $REPLY =~ '^[Yy]' ]]; then for lb in ${LB_FQDNS[*]}; do ssh "${icinga_host}" "sudo rm -rf /var/lib/icinga2/api/zones/${lb}" done if [ "$parent_zone" != "master" ]; then ssh "${icinga_host}" sudo puppetctl run fi ssh master2.prod.monitoring.vshn.net sudo puppetctl run fi
-
Remove LBs in nodes hieradata
git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git pushd nodes_hieradata for lb in ${LB_FQDNS[*]}; do git rm ${lb}.yaml done git commit -m"Decommission LBs for ${CLUSTER_ID}" git push origin master popd
-
Remove cluster in appuio hieradata
git clone git@git.vshn.net:appuio/appuio_hieradata.git pushd appuio_hieradata git rm -rf lbaas/${CLUSTER_ID}* git commit -m"Decommission ${CLUSTER_ID}" git push origin master popd
-
Delete LB backup client certs and backups on Burp server
for lb in ${LB_FQDNS[*]}; do backup_server=$(grep "server = " ${lb}.info | cut -d= -f2) ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.crt" ssh "$backup_server" "rm /var/lib/burp/CA/${lb}.csr" ssh "$backup_server" "rm -rf /var/lib/burp/${lb}" done
-
-
Remove cluster DNS records from VSHN DNS zonefiles
This step isn’t necessary for clusters where the customer manages DNS. -
Delete Vault secrets for the cluster
for secret in $(find catalog/refs/ -type f -printf "clusters/kv/%P\n" \ | sed -r 's#(.*)/.*#\1#' | grep -v '__shared__/__shared__' \ | sort -u); do vault kv delete "$secret" done
-
Delete cluster from Lieutenant API (via portal)
-
Select the Lieutenant API Endpoint
-
Search cluster name
-
Delete cluster entry using the delete button
-
-
Delete Keycloak service (via portal)
-
Search cluster name
-
Delete cluster entry service using the delete button
-
-
Update any related documentation