Install OpenShift 4 on cloudscale.ch
Steps to install an OpenShift 4 cluster on cloudscale.ch.
These steps follow the Installing a cluster on bare metal docs to set up a user provisioned installation (UPI). Terraform is used to provision the cloud infrastructure.
|
The commands are idempotent and can be retried if any of the steps fail. The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h. |
|
This how-to guide is exported from the Guided Setup automation tool. It’s highly recommended to run these instructions using said tool, as opposed to running them manually. |
Starting situation
-
You already have a Tenant and its git repository
-
You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager
Don’t use your personal account to login to the cluster manager for installation. -
You want to register a new cluster in Lieutenant and are about to install Openshift 4 on cloudscale.ch
Prerequisites
-
jq -
yqyq YAML processor (version 4 or higher - use the go version by mikefarah, not the jq wrapper by kislyuk) -
vaultVault CLI -
curl -
emergency-credentials-receiveInstall instructions -
commodore, see Installing Commodore -
kapitan(should automatically be available in$PATHifcommodoreis installed withuvas described in the link above) -
gzip -
docker -
mc>=RELEASE.2024-01-18T07-03-39ZMinio client (aliased tomcif necessary) -
awsCLI Official install instructions. You can also install the Python package with your favorite package manager (we recommenduv:uv tool install awscli).
|
Make sure the minor version of |
Workflow
Given I have all prerequisites installed
This step checks if all necessary prerequisites are installed on your system, including 'yq' (version 4 or higher, by Mike Farah) and 'oc' (OpenShift CLI).
Script
OUTPUT=$(mktemp)
set -euo pipefail
echo "Checking prerequisites..."
if which yq >/dev/null 2>&1 ; then { echo "✅ yq is installed."; } ; else { echo "❌ yq is not installed. Please install yq to proceed."; exit 1; } ; fi
if yq --version | grep -E 'version v[4-9]\.' | grep 'mikefarah' >/dev/null 2>&1 ; then { echo "✅ yq by mikefarah version 4 or higher is installed."; } ; else { echo "❌ yq version 4 or higher is required. Please upgrade yq to proceed."; exit 1; } ; fi
if which jq >/dev/null 2>&1 ; then { echo "✅ jq is installed."; } ; else { echo "❌ jq is not installed. Please install jq to proceed."; exit 1; } ; fi
if which oc >/dev/null 2>&1 ; then { echo "✅ oc (OpenShift CLI) is installed."; } ; else { echo "❌ oc (OpenShift CLI) is not installed. Please install oc to proceed."; exit 1; } ; fi
if which vault >/dev/null 2>&1 ; then { echo "✅ vault (HashiCorp Vault) is installed."; } ; else { echo "❌ vault (HashiCorp Vault) is not installed. Please install vault to proceed."; exit 1; } ; fi
if which curl >/dev/null 2>&1 ; then { echo "✅ curl is installed."; } ; else { echo "❌ curl is not installed. Please install curl to proceed."; exit 1; } ; fi
if which docker >/dev/null 2>&1 ; then { echo "✅ docker is installed."; } ; else { echo "❌ docker is not installed. Please install docker to proceed."; exit 1; } ; fi
if which glab >/dev/null 2>&1 ; then { echo "✅ glab (GitLab CLI) is installed."; } ; else { echo "❌ glab (GitLab CLI) is not installed. Please install glab to proceed."; exit 1; } ; fi
if which host >/dev/null 2>&1 ; then { echo "✅ host (DNS lookup utility) is installed."; } ; else { echo "❌ host (DNS lookup utility) is not installed. Please install host to proceed."; exit 1; } ; fi
if which mc >/dev/null 2>&1 ; then { echo "✅ mc (MinIO Client) is installed."; } ; else { echo "❌ mc (MinIO Client) is not installed. Please install mc >= RELEASE.2024-01-18T07-03-39Z to proceed."; exit 1; } ; fi
mc_version=$(mc --version | grep -Eo 'RELEASE[^ ]+')
if echo "$mc_version" | grep -E 'RELEASE\.202[4-9]-' >/dev/null 2>&1 ; then { echo "✅ mc version ${mc_version} is sufficient."; } ; else { echo "❌ mc version ${mc_version} is insufficient. Please upgrade mc to >= RELEASE.2024-01-18T07-03-39Z to proceed."; exit 1; } ; fi
if which aws >/dev/null 2>&1 ; then { echo "✅ aws (AWS CLI) is installed."; } ; else { echo "❌ aws (AWS CLI) is not installed. Please install aws to proceed. Our recommended installer is uv: 'uv tool install awscli'"; exit 1; } ; fi
if which restic >/dev/null 2>&1 ; then { echo "✅ restic (Backup CLI) is installed."; } ; else { echo "❌ restic (Backup CLI) is not installed. Please install restic to proceed."; exit 1; } ; fi
if which emergency-credentials-receive >/dev/null 2>&1 ; then { echo "✅ emergency-credentials-receive (Cluster emergency access helper) is installed."; } ; else { echo "❌ emergency-credentials-receive is not installed. Please install it from https://github.com/vshn/emergency-credentials-receive ."; exit 1; } ; fi
echo "✅ All prerequisites are met."
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I have the openshift-install binary for version "4.20"
This step checks if the openshift-install binary for the specified OpenShift version is available in your PATH.
If not found, it provides instructions on how to download it.
Script
OUTPUT=$(mktemp)
set -euo pipefail
if command -v openshift-install >/dev/null 2>&1; then
INSTALLED_VERSION=$(openshift-install version | grep 'openshift-install' | awk '{print $2}' | sed 's/^v//' | sed -E 's/\.[0-9]{1,2}$//')
if [ "$INSTALLED_VERSION" = "$MATCH_ocp_version" ]; then
echo "✅ openshift-install version ${MATCH_ocp_version}.XX is installed."
exit 0
else
echo "❌ openshift-install version $INSTALLED_VERSION is installed, but version $MATCH_ocp_version is required. Please download the openshift-install binary for version $MATCH_ocp_version from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-${MATCH_ocp_version}/ and add it to your PATH."
exit 1
fi
else
echo "❌ openshift-install binary not found in PATH. Please download the openshift-install binary for version $MATCH_ocp_version"
echo "from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-${MATCH_ocp_version}/ and add it to your PATH."
exit 1
fi
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And a lieutenant cluster
This step retrieves the Commodore tenant ID associated with the given lieutenant cluster ID.
Use api.syn.vshn.net as the Commodore API URL for production clusters. You might use the WebUI at control.vshn.net/syn/lieutenantapiendpoints to create and manage your clusters.
For customer clusters ensure the following facts are set:
-
sales_order: Name of the sales order to which the cluster is billed, such as S10000
-
service_level: Name of the service level agreement for this cluster, such as guaranteed-availability
-
access_policy: Access-Policy of the cluster, such as regular or swissonly
-
release_channel: Name of the syn component release channel to use, such as stable
-
maintenance_window: Pick the appropriate upgrade schedule, such as monday-1400 for test clusters, tuesday-1000 for prod or custom to not (yet) enable maintenance
-
cilium_addons: Comma-separated list of cilium addons the customer gets billed for, such as advanced_networking or tetragon. Set to NONE if no addons should be billed.
This step checks that you have access to the Commodore API and the cluster ID is valid.
Inputs
-
commodore_api_url: URL of the Commodore API to use for retrieving cluster information.
Use api.syn.vshn.net as the Commodore API URL for production clusters. Use api-int.syn.vshn.net for test clusters.
You might use the WebUI at control.vshn.net/syn/lieutenantapiendpoints to create and manage your clusters.
-
commodore_cluster_id: Project Syn cluster ID for the cluster to be set up.
In the form of c-example-infra-prod1.
You might use the WebUI at control.vshn.net/syn/lieutenantapiendpoints to create and manage your clusters.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_api_url=
# export INPUT_commodore_cluster_id=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
echo "Retrieving Commodore tenant ID for cluster ID '$INPUT_commodore_cluster_id' from API at '$INPUT_commodore_api_url'..."
tenant_id=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${INPUT_commodore_cluster_id} | jq -r .tenant)
if echo "$tenant_id" | grep 't-' >/dev/null 2>&1 ; then { echo "✅ Retrieved tenant ID '$tenant_id' for cluster ID '$INPUT_commodore_cluster_id'."; } else { echo "❌ Failed to retrieve valid tenant ID for cluster ID '$INPUT_commodore_cluster_id'. Got '$tenant_id'. Please check your Commodore API access and cluster ID."; exit 1; } ; fi
env -i "commodore_tenant_id=$tenant_id" >> "$OUTPUT"
region=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${INPUT_commodore_cluster_id} | jq -r .facts.region)
if test -z "$region" && test "$region" != "null" ; then { echo "❌ Failed to retrieve cloudscale region for cluster ID '$INPUT_commodore_cluster_id'."; exit 1; } ; else { echo "✅ Retrieved cloudscale region '$region' for cluster ID '$INPUT_commodore_cluster_id'."; } ; fi
env -i "cloudscale_region=$region" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And a Keycloak service
In this step, you have to create a Keycloak service for the new cluster via the VSHN Control Web UI at control.vshn.net/vshn/services/_create
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
echo '#########################################################'
echo '# #'
echo "# Please create a Keycloak service with the cluster's #"
echo '# ID as Service Name via the VSHN Control Web UI. #'
echo '# #'
echo '#########################################################'
echo
echo "The name and ID of the service should be ${INPUT_commodore_cluster_id}."
echo "You can go to https://control.vshn.net/vshn/services/_create"
sleep 2
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And Cloudscale API tokens
Create 2 new cloudscale API tokens with read+write permissions and name them <cluster_id> and <cluster_id>_floaty on control.cloudscale.ch/service/<your-project>/api-token.
This step currently does not validate whether the tokens have read permission.
Inputs
-
cloudscale_token: Cloudscale API token with read+write permissions.
Used for setting up the cluster and for the machine api provider.
-
cloudscale_token_floaty: Cloudscale API token with read+write permissions.
Used for managing the floating IPs.
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
set -euo pipefail
if [[ $( curl -sH "Authorization: Bearer ${INPUT_cloudscale_token}" https://api.cloudscale.ch/v1/flavors -o /dev/null -w"%{http_code}" ) != 200 ]]
then
echo "Cloudscale token not valid!"
fi
if [[ $( curl -sH "Authorization: Bearer ${INPUT_cloudscale_token_floaty}" https://api.cloudscale.ch/v1/flavors -o /dev/null -w"%{http_code}" ) != 200 ]]
then
echo "Cloudscale Floaty token not valid!"
fi
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And a personal VSHN GitLab access token
This step ensures that you have provided a personal access token for VSHN GitLab.
Create the token at git.vshn.net/-/user_settings/personal_access_tokens with the "api" scope.
This step currently does not validate the token’s scope.
Inputs
-
gitlab_api_token: Personal access token for VSHN GitLab with the "api" scope.
Create the token at git.vshn.net/-/user_settings/personal_access_tokens with the "api" scope.
Script
OUTPUT=$(mktemp)
# export INPUT_gitlab_api_token=
set -euo pipefail
user="$( curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/user" | jq -r .username )"
if [[ "$user" == "null" ]]
then
echo "Error validating GitLab token. Are you sure it is valid?"
exit 1
fi
env -i "gitlab_user_name=$user" >> "$OUTPUT"
echo "Token is valid."
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And a control.vshn.net Servers API token
This step ensures that you have provided an API token for control.vshn.net Servers API.
Create the token at control.vshn.net/tokens/_create/servers and ensure your IP is allowlisted.
Inputs
-
control_vshn_api_token: API token for control.vshn.net Servers API.
Used to create the puppet based LBs.
Be extra careful with the IP allowlist.
Script
OUTPUT=$(mktemp)
# export INPUT_control_vshn_api_token=
set -euo pipefail
AUTH="X-AccessToken: ${INPUT_control_vshn_api_token}"
code="$( curl -H"$AUTH" https://control.vshn.net/api/servers/1/appuio/ -o /dev/null -w"%{http_code}" )"
if [[ "$code" != 200 ]]
then
echo "ERROR: could not access Server API (Status $code)"
echo "Please ensure your token is valid and your IP is on the allowlist."
exit 1
fi
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And basic cluster information
This step collects two essential pieces of information required for cluster setup: the base domain and the Red Hat pull secret.
See kb.vshn.ch/oc4/explanations/dns_scheme.html for more information about the base domain. Get a pull secret from cloud.redhat.com/openshift/install/pull-secret.
Inputs
-
base_domain: The base domain for the cluster without the cluster ID prefix and the last dot.
Example: appuio-beta.ch
See kb.vshn.ch/oc4/explanations/dns_scheme.html for more information about the base domain.
-
redhat_pull_secret: Red Hat pull secret for accessing Red Hat container images.
Get a pull secret from cloud.redhat.com/openshift/install/pull-secret.
Then I download the OpenShift image for version "4.20.0"
This step downloads the OpenShift image for the version specified by in the step.
If the image already exists locally, it skips the download.
Script
OUTPUT=$(mktemp)
set -euo pipefail
. ./workflows/cloudscale/scripts/semver.sh
MAJOR=0
MINOR=0
PATCH=0
SPECIAL=""
semverParseInto "$MATCH_image_name" MAJOR MINOR PATCH SPECIAL
image_path="rhcos-$MAJOR.$MINOR.qcow2"
env -i "image_major=$MAJOR" >> "$OUTPUT"
env -i "image_minor=$MINOR" >> "$OUTPUT"
env -i "image_patch=$PATCH" >> "$OUTPUT"
echo "Image is $image_path"
if [ -f "$image_path" ]; then
echo "Image $image_path already exists, skipping download."
env -i "image_path=$image_path" >> "$OUTPUT"
exit 0
fi
echo Downloading OpenShift image "$MATCH_image_name" to "$image_path"
curl -L "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/${MAJOR}.${MINOR}/${MATCH_image_name}/rhcos-${MATCH_image_name}-x86_64-openstack.x86_64.qcow2.gz" | gzip -d > "$image_path"
env -i "image_path=$image_path" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I set up required S3 buckets
This step sets up the required S3 buckets for the OpenShift cluster installation.
It uses the MinIO Client (mc) to create the necessary buckets if they do not already exist.
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_commodore_cluster_id=
# export INPUT_cloudscale_region=
set -euo pipefail
response=$(curl -sH "Authorization: Bearer ${INPUT_cloudscale_token}" \
https://api.cloudscale.ch/v1/objects-users | \
jq -e ".[] | select(.display_name == \"${INPUT_commodore_cluster_id}\")" ||:)
if [ -z "$response" ]; then
echo "Creating Cloudscale S3 user for cluster ID '${INPUT_commodore_cluster_id}'..."
response=$(curl -sH "Authorization: Bearer ${INPUT_cloudscale_token}" \
-F display_name=${INPUT_commodore_cluster_id} \
https://api.cloudscale.ch/v1/objects-users)
echo "Created user with id $(echo "$response" | jq -r .id)"
else
echo "Cloudscale S3 user for cluster ID '${INPUT_commodore_cluster_id}' already exists. id: $(echo "$response" | jq -r .id)"
fi
echo -n "Waiting for S3 credentials to become available ..."
until mc alias set \
"${INPUT_commodore_cluster_id}" "https://objects.${INPUT_cloudscale_region}.cloudscale.ch" \
"$(echo "$response" | jq -r '.keys[0].access_key')" \
"$(echo "$response" | jq -r '.keys[0].secret_key')"
do
echo -n .
sleep 5
done
echo "OK"
mc mb --ignore-existing \
"${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition"
mc mb --ignore-existing \
"${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-image-registry"
mc mb --ignore-existing \
"${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-logstore"
keyid=$(mc alias list ${INPUT_commodore_cluster_id} -json | jq -r .accessKey)
export AWS_ACCESS_KEY_ID="${keyid}"
secretkey=$(mc alias list ${INPUT_commodore_cluster_id} -json | jq -r .secretKey)
export AWS_SECRET_ACCESS_KEY="${secretkey}"
echo "Configuring S3 bucket policies..."
aws s3api put-public-access-block \
--endpoint-url "https://objects.${INPUT_cloudscale_region}.cloudscale.ch" \
--bucket "${INPUT_commodore_cluster_id}-image-registry" \
--public-access-block-configuration BlockPublicAcls=false
aws s3api put-bucket-lifecycle-configuration \
--endpoint-url "https://objects.${INPUT_cloudscale_region}.cloudscale.ch" \
--bucket "${INPUT_commodore_cluster_id}-image-registry" \
--lifecycle-configuration '{
"Rules": [
{
"ID": "cleanup-incomplete-multipart-registry-uploads",
"Prefix": "",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 1
}
}
]
}'
echo "S3 buckets are set up."
env -i "bucket_user=$(echo "$response" | jq -c .)" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I import the image in Cloudscale
This step uploads the Red Hat CoreOS image to the S3 bucket for the image registry.
It then imports the image into Cloudscale as a custom image.
It uses the MinIO Client (mc) to perform the upload.
Inputs
-
image_path -
commodore_cluster_id -
cloudscale_region -
bucket_user -
image_major -
image_minor -
cloudscale_token
Script
OUTPUT=$(mktemp)
# export INPUT_image_path=
# export INPUT_commodore_cluster_id=
# export INPUT_cloudscale_region=
# export INPUT_bucket_user=
# export INPUT_image_major=
# export INPUT_image_minor=
# export INPUT_cloudscale_token=
set -euo pipefail
auth_header="Authorization: Bearer ${INPUT_cloudscale_token}"
slug=$(curl -sH "$auth_header" https://api.cloudscale.ch/v1/custom-images | jq -r ".[] | select(.slug == \"rhcos-${INPUT_image_major}.${INPUT_image_minor}\") | .zones[].slug")
if [ -n "$slug" ] && [ "$slug" != "null" ]; then
echo "Image 'rhcos-${INPUT_image_major}.${INPUT_image_minor}' already exists in Cloudscale, skipping upload."
exit 0
fi
mc alias set \
"${INPUT_commodore_cluster_id}" "https://objects.${INPUT_cloudscale_region}.cloudscale.ch" \
"$(echo "$INPUT_bucket_user" | jq -r '.keys[0].access_key')" \
"$(echo "$INPUT_bucket_user" | jq -r '.keys[0].secret_key')"
echo "Uploading Red Hat CoreOS image '$INPUT_image_path' to S3 bucket '${INPUT_commodore_cluster_id}-image-registry'..."
mc cp "rhcos-${INPUT_image_major}.${INPUT_image_minor}.qcow2" "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition/"
echo "Upload completed."
mc anonymous set download "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition/rhcos-${INPUT_image_major}.${INPUT_image_minor}.qcow2"
echo "Importing image into Cloudscale..."
curl -i -H "$auth_header" \
-F url="$(mc share download --json "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition/rhcos-${INPUT_image_major}.${INPUT_image_minor}.qcow2" | jq -r .url)" \
-F name="RHCOS ${INPUT_image_major}.${INPUT_image_minor}" \
-F zones="${INPUT_cloudscale_region}1" \
-F slug="rhcos-${INPUT_image_major}.${INPUT_image_minor}" \
-F source_format=qcow2 \
-F user_data_handling=pass-through \
https://api.cloudscale.ch/v1/custom-images/import
echo "Image import initiated. ⚠️ TODO: Poll for completion."
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I set secrets in Vault
This step stores the collected secrets and tokens in the ProjectSyn Vault.
Inputs
-
vault_address: Address of the Vault server associated with the Lieutenant API to store cluster secrets.
vault-prod.syn.vshn.net/ for production clusters.
-
commodore_cluster_id -
commodore_tenant_id -
bucket_user -
cloudscale_token -
cloudscale_token_floaty
Script
OUTPUT=$(mktemp)
# export INPUT_vault_address=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
# export INPUT_bucket_user=
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
set -euo pipefail
export VAULT_ADDR=${INPUT_vault_address}
vault login -method=oidc
# Set the cloudscale.ch access secrets
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/cloudscale \
token=${INPUT_cloudscale_token} \
s3_access_key="$(echo "${INPUT_bucket_user}" | jq -r '.keys[0].access_key')" \
s3_secret_key="$(echo "${INPUT_bucket_user}" | jq -r '.keys[0].secret_key')"
# Put LB API key in Vault
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/floaty \
iam_secret="${INPUT_cloudscale_token_floaty}"
# Generate an HTTP secret for the registry
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/registry \
httpSecret="$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)"
# Generate a master password for K8up backups
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/global-backup \
password="$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)"
# Generate a password for the cluster object backups
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/cluster-backup \
password="$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)"
hieradata_repo_secret=$(vault kv get \
-format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
env -i "hieradata_repo_user=$(echo "${hieradata_repo_secret}" | jq -r '.user')" >> "$OUTPUT"
env -i "hieradata_repo_token=$(echo "${hieradata_repo_secret}" | jq -r '.token')" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I check the cluster domain
Please verify that the base domain generated is correct for your setup.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_base_domain=
set -euo pipefail
cluster_domain="${INPUT_commodore_cluster_id}.${INPUT_base_domain}"
echo "Cluster domain is set to '$cluster_domain'"
echo "cluster_domain=$cluster_domain" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I prepare the cluster repository
This step prepares the local cluster repository by cloning the Commodore hieradata repository and setting up the necessary configuration for the specified cluster.
Inputs
-
commodore_api_url -
commodore_cluster_id -
commodore_tenant_id -
hieradata_repo_user -
cluster_domain -
hieradata_repo_token -
image_major -
image_minor
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_api_url=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
# export INPUT_hieradata_repo_user=
# export INPUT_cluster_domain=
# export INPUT_hieradata_repo_token=
# export INPUT_image_major=
# export INPUT_image_minor=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
rm -rf inventory/classes/
mkdir -p inventory/classes/
git clone "$(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${INPUT_commodore_api_url}/tenants/${INPUT_commodore_tenant_id}" | jq -r '.gitRepo.url')" inventory/classes/${INPUT_commodore_tenant_id}
pushd "inventory/classes/${INPUT_commodore_tenant_id}/"
yq eval -i ".parameters.openshift.baseDomain = \"${INPUT_cluster_domain}\"" \
${INPUT_commodore_cluster_id}.yml
git diff --exit-code --quiet || git commit -a -m "Configure cluster domain for ${INPUT_commodore_cluster_id}"
if ls openshift4.y*ml 1>/dev/null 2>&1; then
yq eval -i '.classes += ".openshift4"' ${INPUT_commodore_cluster_id}.yml;
git diff --exit-code --quiet || git commit -a -m "Include openshift4 class for ${INPUT_commodore_cluster_id}"
fi
yq eval -i '.parameters.openshift.cloudscale.subnet_uuid = "TO_BE_DEFINED"' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.openshift.cloudscale.rhcos_image_slug = "rhcos-4.19"' \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"TO_BE_DEFINED\"" \
${INPUT_commodore_cluster_id}.yml
git diff --exit-code --quiet || git commit -a -m "Configure Cloudscale metaparameters on ${INPUT_commodore_cluster_id}"
yq eval -i '.applications += ["cloudscale-loadbalancer-controller"]' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.applications = (.applications | unique)' ${INPUT_commodore_cluster_id}.yml
cat ${INPUT_commodore_cluster_id}.yml
git diff --exit-code --quiet || git commit -a -m "Enable cloudscale loadbalancer controller for ${INPUT_commodore_cluster_id}"
yq eval -i '.applications += ["cilium"]' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.applications = (.applications | unique)' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.cilium.olm.generate_olm_deployment = true' ${INPUT_commodore_cluster_id}.yml
git diff --exit-code --quiet || git commit -a -m "Add Cilium addon to ${INPUT_commodore_cluster_id}"
git push
popd
commodore catalog compile ${INPUT_commodore_cluster_id} --push \
--dynamic-fact kubernetesVersion.major=1 \
--dynamic-fact kubernetesVersion.minor="$((INPUT_image_minor+13))" \
--dynamic-fact openshiftVersion.Major=${INPUT_image_major} \
--dynamic-fact openshiftVersion.Minor=${INPUT_image_minor}
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I configure the OpenShift installer
This step configures the OpenShift installer for the Cloudscale cluster by generating the necessary installation files using Commodore.
Inputs
-
commodore_cluster_id -
commodore_tenant_id -
base_domain -
cluster_domain -
vault_address -
redhat_pull_secret -
cloudscale_region -
bucket_user -
cloudscale_token
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
# export INPUT_base_domain=
# export INPUT_cluster_domain=
# export INPUT_vault_address=
# export INPUT_redhat_pull_secret=
# export INPUT_cloudscale_region=
# export INPUT_bucket_user=
# export INPUT_cloudscale_token=
set -euo pipefail
export VAULT_ADDR="${INPUT_vault_address}"
vault login -method=oidc
ssh_private_key="$(pwd)/ssh_${INPUT_commodore_cluster_id}"
ssh_public_key="${ssh_private_key}.pub"
env -i "ssh_public_key_path=$ssh_public_key" >> "$OUTPUT"
if vault kv get -format=json clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/cloudscale/ssh >/dev/null 2>&1; then
echo "SSH keypair for cluster ${INPUT_commodore_cluster_id} already exists in Vault, skipping generation."
vault kv get -format=json clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/cloudscale/ssh | \
jq -r '.data.data.private_key|@base64d' > "${ssh_private_key}"
chmod 600 "${ssh_private_key}"
ssh-keygen -f "${ssh_private_key}" -y > "${ssh_public_key}"
else
echo "Generating new SSH keypair for cluster ${INPUT_commodore_cluster_id}."
ssh-keygen -C "vault@${INPUT_commodore_cluster_id}" -t ed25519 -f "$ssh_private_key" -N ''
base64_no_wrap='base64'
if [[ "$OSTYPE" == "linux"* ]]; then
base64_no_wrap='base64 --wrap 0'
fi
vault kv put clusters/kv/${INPUT_commodore_tenant_id}/${INPUT_commodore_cluster_id}/cloudscale/ssh \
private_key="$(cat "$ssh_private_key" | eval "$base64_no_wrap")"
fi
echo Adding SSH private key to ssh-agent...
echo You might need to start the ssh-agent first using: eval "\$(ssh-agent)"
echo ssh-add "$ssh_private_key"
ssh-add "$ssh_private_key"
installer_dir="$(pwd)/target"
rm -rf "${installer_dir}"
mkdir -p "${installer_dir}"
cat > "${installer_dir}/install-config.yaml" <<EOF
apiVersion: v1
metadata:
name: ${INPUT_commodore_cluster_id}
baseDomain: ${INPUT_base_domain}
platform:
external:
platformName: cloudscale
cloudControllerManager: External
networking:
networkType: Cilium
pullSecret: |
${INPUT_redhat_pull_secret}
sshKey: "$(cat "$ssh_public_key")"
EOF
echo Running OpenShift installer to create manifests...
openshift-install --dir "${installer_dir}" create manifests
echo Copying machineconfigs...
machineconfigs=catalog/manifests/openshift4-nodes/10_machineconfigs.yaml
if [ -f $machineconfigs ]; then
yq --no-doc -s \
"\"${installer_dir}/openshift/99x_openshift-machineconfig_\" + .metadata.name" \
$machineconfigs
fi
echo Copying Cloudscale CCM manifests...
for f in catalog/manifests/cloudscale-cloud-controller-manager/*; do
cp "$f" "${installer_dir}/manifests/cloudscale_ccm_$(basename "$f")"
done
yq -i e ".stringData.access-token=\"${INPUT_cloudscale_token}\"" \
"${installer_dir}/manifests/cloudscale_ccm_01_secret.yaml"
echo Copying Cilium OLM manifests...
for f in catalog/manifests/cilium/olm/[a-z]*; do
cp "$f" "${installer_dir}/manifests/cilium_$(basename "$f")"
done
# shellcheck disable=2016
# We don't want the shell to execute network.operator.openshift.io as a
# command, so we need single quotes here.
echo 'Generating initial `network.operator.openshift.io` resource...'
yq '{
"apiVersion": "operator.openshift.io/v1",
"kind": "Network",
"metadata": {
"name": "cluster"
},
"spec": {
"deployKubeProxy": false,
"clusterNetwork": .spec.clusterNetwork,
"externalIP": {
"policy": {}
},
"networkType": "Cilium",
"serviceNetwork": .spec.serviceNetwork
}}' "${installer_dir}/manifests/cluster-network-02-config.yml" \
> "${installer_dir}/manifests/cilium_cluster-network-operator.yaml"
gen_cluster_domain=$(yq e '.spec.baseDomain' \
"${installer_dir}/manifests/cluster-dns-02-config.yml")
if [ "$gen_cluster_domain" != "$INPUT_cluster_domain" ]; then
echo -e "\033[0;31mGenerated cluster domain doesn't match expected cluster domain: Got '$gen_cluster_domain', want '$INPUT_cluster_domain'\033[0;0m"
exit 1
else
echo -e "\033[0;32mGenerated cluster domain matches expected cluster domain.\033[0;0m"
fi
echo Running OpenShift installer to create ignition configs...
openshift-install --dir "${installer_dir}" \
create ignition-configs
mc alias set \
"${INPUT_commodore_cluster_id}" "https://objects.${INPUT_cloudscale_region}.cloudscale.ch" \
"$(echo "$INPUT_bucket_user" | jq -r '.keys[0].access_key')" \
"$(echo "$INPUT_bucket_user" | jq -r '.keys[0].secret_key')"
mc cp "${installer_dir}/bootstrap.ign" "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition/"
ignition_bootstrap=$(mc share download \
--json --expire=4h \
"${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition/bootstrap.ign" | jq -r '.share')
env -i "ignition_bootstrap=$ignition_bootstrap" >> "$OUTPUT"
echo "✅ OpenShift installer configured successfully."
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I configure Terraform for team "aldebaran"
This step configures Terraform the Commodore rendered terraform configuration.
Inputs
-
commodore_api_url -
commodore_cluster_id -
commodore_tenant_id -
ssh_public_key_path -
hieradata_repo_user -
base_domain -
image_major -
image_minor
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_api_url=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
# export INPUT_ssh_public_key_path=
# export INPUT_hieradata_repo_user=
# export INPUT_base_domain=
# export INPUT_image_major=
# export INPUT_image_minor=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
installer_dir="$(pwd)/target"
pushd "inventory/classes/${INPUT_commodore_tenant_id}/"
yq eval -i '.classes += ["global.distribution.openshift4.no-opsgenie"]' ${INPUT_commodore_cluster_id}.yml;
yq eval -i '.classes = (.classes | unique)' ${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${installer_dir}/metadata.json")\"" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${installer_dir}/metadata.json")\"" \
${INPUT_commodore_cluster_id}.yml
yq eval -i 'del(.parameters.cilium.olm.generate_olm_deployment)' \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${INPUT_ssh_public_key_path})\"" \
${INPUT_commodore_cluster_id}.yml
ca_cert=$(jq -r '.ignition.security.tls.certificateAuthorities[0].source' \
"${installer_dir}/master.ign" | \
awk -F ',' '{ print $2 }' | \
base64 --decode)
yq eval -i ".parameters.openshift4_terraform.terraform_variables.base_domain = \"${INPUT_base_domain}\"" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"${ca_cert}\"" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.ssh_keys = [\"$(cat ${INPUT_ssh_public_key_path})\"]" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.allocate_router_vip_for_lb_controller = true" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.team = \"${MATCH_team_name}\"" \
${INPUT_commodore_cluster_id}.yml
yq eval -i ".parameters.openshift4_terraform.terraform_variables.hieradata_repo_user = \"${INPUT_hieradata_repo_user}\"" \
${INPUT_commodore_cluster_id}.yml
git commit -a -m "Setup cluster ${INPUT_commodore_cluster_id}"
git push
popd
commodore catalog compile ${INPUT_commodore_cluster_id} --push \
--dynamic-fact kubernetesVersion.major=1 \
--dynamic-fact kubernetesVersion.minor="$((INPUT_image_minor+13))" \
--dynamic-fact openshiftVersion.Major=${INPUT_image_major} \
--dynamic-fact openshiftVersion.Minor=${INPUT_image_minor}
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I provision the loadbalancers
This step provisions the load balancers for the Cloudscale OpenShift cluster using Terraform.
Inputs
-
cloudscale_token -
cloudscale_token_floaty -
control_vshn_api_token -
ignition_bootstrap -
hieradata_repo_token -
gitlab_user_name -
gitlab_api_token -
commodore_cluster_id -
commodore_api_url -
cluster_domain
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
# export INPUT_control_vshn_api_token=
# export INPUT_ignition_bootstrap=
# export INPUT_hieradata_repo_token=
# export INPUT_gitlab_user_name=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_api_url=
# export INPUT_cluster_domain=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN=${INPUT_cloudscale_token}
TF_VAR_ignition_bootstrap=${INPUT_ignition_bootstrap}
TF_VAR_lb_cloudscale_api_secret=${INPUT_cloudscale_token_floaty}
TF_VAR_control_vshn_net_token=${INPUT_control_vshn_api_token}
GIT_AUTHOR_NAME=$(git config --global user.name)
GIT_AUTHOR_EMAIL=$(git config --global user.email)
HIERADATA_REPO_TOKEN=${INPUT_hieradata_repo_token}
EOF
tf_image=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.image" \
dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
dependencies/openshift4-terraform/class/defaults.yml)
echo "Using Terraform image: ${tf_image}:${tf_tag}"
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run --rm -e REAL_UID=$(id -u) -e TF_CLI_CONFIG_FILE=/tf/.terraformrc --env-file ${base_dir}/terraform.env -w /tf -v $(pwd):/tf --ulimit memlock=-1 "${tf_image}:${tf_tag}" /tf/terraform.sh'
gitlab_repository_url=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${INPUT_commodore_api_url}/clusters/${INPUT_commodore_cluster_id} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
gitlab_repository_name=${gitlab_repository_url##*/}
gitlab_catalog_project_id=$(curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/projects?simple=true&search=${gitlab_repository_name/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${gitlab_repository_url}\") | .id")
gitlab_state_url="https://git.vshn.net/api/v4/projects/${gitlab_catalog_project_id}/terraform/state/cluster"
pushd catalog/manifests/openshift4-terraform/
terraform init \
"-backend-config=address=${gitlab_state_url}" \
"-backend-config=lock_address=${gitlab_state_url}/lock" \
"-backend-config=unlock_address=${gitlab_state_url}/lock" \
"-backend-config=username=${INPUT_gitlab_user_name}" \
"-backend-config=password=${INPUT_gitlab_api_token}" \
"-backend-config=lock_method=POST" \
"-backend-config=unlock_method=DELETE" \
"-backend-config=retry_wait_min=5"
cat > override.tf <<EOF
module "cluster" {
bootstrap_count = 0
master_count = 0
infra_count = 0
worker_count = 0
additional_worker_groups = {}
}
EOF
terraform apply -auto-approve -target "module.cluster.module.lb.module.hiera"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please review and merge the LB hieradata MR listed in Terraform output hieradata_mr. @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "${INPUT_commodore_cluster_id}")
do
sleep 10
done
echo PR merged, waiting for CI to finish...
sleep 10
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "running")
do
sleep 10
done
terraform apply -auto-approve
dnstmp=$(mktemp)
terraform output -raw cluster_dns > "$dnstmp"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please add the DNS records shown in the Terraform output to your DNS provider. @"
echo "@ Most probably in https://git.vshn.net/vshn/vshn_zonefiles @"
echo "@ @"
echo "@ If terminal selection does not work the entries can also be copied from @"
echo "@ $dnstmp @"
echo "@ @"
echo "@ Waiting for record to propagate... @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
while ! (host "api.${INPUT_cluster_domain}")
do
sleep 15
done
rm -f "$dnstmp"
lb1=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[0]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n')
lb2=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[1]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n')
echo "Loadbalancer FQDNs: $lb1 , $lb2"
echo "Waiting for HAproxy ..."
while true; do
curl --connect-timeout 1 "http://api.${INPUT_cluster_domain}:6443" &>/dev/null || exit_code=$?
if [ "$exit_code" -eq 52 ]; then
echo " HAproxy up!"
break
else
echo -n "."
sleep 5
fi
done
echo "updating ssh config..."
ssh management2.corp.vshn.net "sshop --output-archive /dev/stdout" | tar -C ~ -xzf -
echo "done"
echo "waiting for ssh access ..."
ssh "${lb1}" hostname -f
ssh "${lb2}" hostname -f
env -i "lb_fqdn_1=$lb1" >> "$OUTPUT"
env -i "lb_fqdn_2=$lb2" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I provision the bootstrap node
This step provisions the bootstrap node for the Cloudscale OpenShift cluster using Terraform.
Inputs
-
cloudscale_token -
cloudscale_token_floaty -
control_vshn_api_token -
ignition_bootstrap -
hieradata_repo_token -
gitlab_user_name -
gitlab_api_token -
commodore_cluster_id -
commodore_api_url -
lb_fqdn_1 -
lb_fqdn_2
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
# export INPUT_control_vshn_api_token=
# export INPUT_ignition_bootstrap=
# export INPUT_hieradata_repo_token=
# export INPUT_gitlab_user_name=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_api_url=
# export INPUT_lb_fqdn_1=
# export INPUT_lb_fqdn_2=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
installer_dir="$(pwd)/target"
cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN=${INPUT_cloudscale_token}
TF_VAR_ignition_bootstrap=${INPUT_ignition_bootstrap}
TF_VAR_lb_cloudscale_api_secret=${INPUT_cloudscale_token_floaty}
TF_VAR_control_vshn_net_token=${INPUT_control_vshn_api_token}
GIT_AUTHOR_NAME=$(git config --global user.name)
GIT_AUTHOR_EMAIL=$(git config --global user.email)
HIERADATA_REPO_TOKEN=${INPUT_hieradata_repo_token}
EOF
tf_image=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.image" \
dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
dependencies/openshift4-terraform/class/defaults.yml)
echo "Using Terraform image: ${tf_image}:${tf_tag}"
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run --rm -e REAL_UID=$(id -u) -e TF_CLI_CONFIG_FILE=/tf/.terraformrc --env-file ${base_dir}/terraform.env -w /tf -v $(pwd):/tf --ulimit memlock=-1 "${tf_image}:${tf_tag}" /tf/terraform.sh'
gitlab_repository_url=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${INPUT_commodore_api_url}/clusters/${INPUT_commodore_cluster_id} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
gitlab_repository_name=${gitlab_repository_url##*/}
gitlab_catalog_project_id=$(curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/projects?simple=true&search=${gitlab_repository_name/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${gitlab_repository_url}\") | .id")
gitlab_state_url="https://git.vshn.net/api/v4/projects/${gitlab_catalog_project_id}/terraform/state/cluster"
pushd catalog/manifests/openshift4-terraform/
terraform init \
"-backend-config=address=${gitlab_state_url}" \
"-backend-config=lock_address=${gitlab_state_url}/lock" \
"-backend-config=unlock_address=${gitlab_state_url}/lock" \
"-backend-config=username=${INPUT_gitlab_user_name}" \
"-backend-config=password=${INPUT_gitlab_api_token}" \
"-backend-config=lock_method=POST" \
"-backend-config=unlock_method=DELETE" \
"-backend-config=retry_wait_min=5"
cat > override.tf <<EOF
module "cluster" {
bootstrap_count = 1
master_count = 0
infra_count = 0
worker_count = 0
additional_worker_groups = {}
}
EOF
terraform apply -auto-approve
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please review and merge the LB hieradata MR listed in Terraform output hieradata_mr. @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "${INPUT_commodore_cluster_id}")
do
sleep 10
done
echo PR merged, waiting for CI to finish...
sleep 10
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "running")
do
sleep 10
done
ssh "${INPUT_lb_fqdn_1}" sudo puppetctl run
ssh "${INPUT_lb_fqdn_2}" sudo puppetctl run
echo -n "Waiting for Bootstrap API to become available .."
API_URL=$(yq e '.clusters[0].cluster.server' "${installer_dir}/auth/kubeconfig")
while ! curl --connect-timeout 1 "${API_URL}/healthz" -k &>/dev/null; do
echo -n "."
sleep 5
done && echo "✅ API is up"
env -i "kubeconfig_path=${installer_dir}/auth/kubeconfig" >> "$OUTPUT"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I store the subnet ID and floating IP in the Syn hierarchy
This step retrieves the subnet ID and ingress floating IP from Terraform and stores them in the Syn hierarchy.
Inputs
-
cloudscale_token -
cloudscale_token_floaty -
control_vshn_api_token -
ignition_bootstrap -
hieradata_repo_token -
gitlab_user_name -
gitlab_api_token -
commodore_cluster_id -
commodore_tenant_id -
commodore_api_url -
image_major -
image_minor
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
# export INPUT_control_vshn_api_token=
# export INPUT_ignition_bootstrap=
# export INPUT_hieradata_repo_token=
# export INPUT_gitlab_user_name=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
# export INPUT_commodore_api_url=
# export INPUT_image_major=
# export INPUT_image_minor=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN=${INPUT_cloudscale_token}
TF_VAR_ignition_bootstrap=${INPUT_ignition_bootstrap}
TF_VAR_lb_cloudscale_api_secret=${INPUT_cloudscale_token_floaty}
TF_VAR_control_vshn_net_token=${INPUT_control_vshn_api_token}
GIT_AUTHOR_NAME=$(git config --global user.name)
GIT_AUTHOR_EMAIL=$(git config --global user.email)
HIERADATA_REPO_TOKEN=${INPUT_hieradata_repo_token}
EOF
tf_image=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.image" \
dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
dependencies/openshift4-terraform/class/defaults.yml)
echo "Using Terraform image: ${tf_image}:${tf_tag}"
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run --rm -e REAL_UID=$(id -u) -e TF_CLI_CONFIG_FILE=/tf/.terraformrc --env-file ${base_dir}/terraform.env -w /tf -v $(pwd):/tf --ulimit memlock=-1 "${tf_image}:${tf_tag}" /tf/terraform.sh'
gitlab_repository_url=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${INPUT_commodore_api_url}/clusters/${INPUT_commodore_cluster_id} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
gitlab_repository_name=${gitlab_repository_url##*/}
gitlab_catalog_project_id=$(curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/projects?simple=true&search=${gitlab_repository_name/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${gitlab_repository_url}\") | .id")
gitlab_state_url="https://git.vshn.net/api/v4/projects/${gitlab_catalog_project_id}/terraform/state/cluster"
pushd catalog/manifests/openshift4-terraform/
terraform init \
"-backend-config=address=${gitlab_state_url}" \
"-backend-config=lock_address=${gitlab_state_url}/lock" \
"-backend-config=unlock_address=${gitlab_state_url}/lock" \
"-backend-config=username=${INPUT_gitlab_user_name}" \
"-backend-config=password=${INPUT_gitlab_api_token}" \
"-backend-config=lock_method=POST" \
"-backend-config=unlock_method=DELETE" \
"-backend-config=retry_wait_min=5"
SUBNET_UUID="$(terraform output -raw subnet_uuid)"
INGRESS_FLOATING_IP="$(terraform output -raw router_vip)"
pushd ../../../inventory/classes/${INPUT_commodore_tenant_id}
yq eval -i '.parameters.openshift.cloudscale.subnet_uuid = "'"$SUBNET_UUID"'"' \
${INPUT_commodore_cluster_id}.yml
yq eval -i '.parameters.openshift.cloudscale.ingress_floating_ip_v4 = "'"$INGRESS_FLOATING_IP"'"' \
${INPUT_commodore_cluster_id}.yml
if not git diff-index --quiet HEAD
then
git commit -am "Configure cloudscale subnet UUID and ingress floating IP for ${INPUT_commodore_cluster_id}"
git push origin master
fi || true
popd
popd # yes, twice.
# Recompile the catalog
commodore catalog compile ${INPUT_commodore_cluster_id} --push \
--dynamic-fact kubernetesVersion.major=1 \
--dynamic-fact kubernetesVersion.minor="$((INPUT_image_minor+13))" \
--dynamic-fact openshiftVersion.Major=${INPUT_image_major} \
--dynamic-fact openshiftVersion.Minor=${INPUT_image_minor}
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I provision the control plane
This step provisions the control plane nodes with Terraform.
Inputs
-
cloudscale_token -
cloudscale_token_floaty -
control_vshn_api_token -
ignition_bootstrap -
hieradata_repo_token -
gitlab_user_name -
gitlab_api_token -
commodore_cluster_id -
commodore_api_url -
kubeconfig_path -
cluster_domain
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
# export INPUT_control_vshn_api_token=
# export INPUT_ignition_bootstrap=
# export INPUT_hieradata_repo_token=
# export INPUT_gitlab_user_name=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_api_url=
# export INPUT_kubeconfig_path=
# export INPUT_cluster_domain=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN=${INPUT_cloudscale_token}
TF_VAR_ignition_bootstrap=${INPUT_ignition_bootstrap}
TF_VAR_lb_cloudscale_api_secret=${INPUT_cloudscale_token_floaty}
TF_VAR_control_vshn_net_token=${INPUT_control_vshn_api_token}
GIT_AUTHOR_NAME=$(git config --global user.name)
GIT_AUTHOR_EMAIL=$(git config --global user.email)
HIERADATA_REPO_TOKEN=${INPUT_hieradata_repo_token}
EOF
tf_image=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.image" \
dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
dependencies/openshift4-terraform/class/defaults.yml)
echo "Using Terraform image: ${tf_image}:${tf_tag}"
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run --rm -e REAL_UID=$(id -u) -e TF_CLI_CONFIG_FILE=/tf/.terraformrc --env-file ${base_dir}/terraform.env -w /tf -v $(pwd):/tf --ulimit memlock=-1 "${tf_image}:${tf_tag}" /tf/terraform.sh'
gitlab_repository_url=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${INPUT_commodore_api_url}/clusters/${INPUT_commodore_cluster_id} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
gitlab_repository_name=${gitlab_repository_url##*/}
gitlab_catalog_project_id=$(curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/projects?simple=true&search=${gitlab_repository_name/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${gitlab_repository_url}\") | .id")
gitlab_state_url="https://git.vshn.net/api/v4/projects/${gitlab_catalog_project_id}/terraform/state/cluster"
pushd catalog/manifests/openshift4-terraform/
terraform init \
"-backend-config=address=${gitlab_state_url}" \
"-backend-config=lock_address=${gitlab_state_url}/lock" \
"-backend-config=unlock_address=${gitlab_state_url}/lock" \
"-backend-config=username=${INPUT_gitlab_user_name}" \
"-backend-config=password=${INPUT_gitlab_api_token}" \
"-backend-config=lock_method=POST" \
"-backend-config=unlock_method=DELETE" \
"-backend-config=retry_wait_min=5"
cat > override.tf <<EOF
module "cluster" {
bootstrap_count = 1
infra_count = 0
worker_count = 0
additional_worker_groups = {}
}
EOF
echo "Running Terraform ..."
terraform apply -auto-approve
dnstmp=$(mktemp)
terraform output -raw cluster_dns > "$dnstmp"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please add the etcd DNS records shown in the Terraform output to your DNS provider. @"
echo "@ Most probably in https://git.vshn.net/vshn/vshn_zonefiles @"
echo "@ @"
echo "@ If terminal selection does not work the entries can also be copied from @"
echo "@ $dnstmp @"
echo "@ @"
echo "@ Waiting for record to propagate... @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
while ! (host "etcd-0.${INPUT_cluster_domain}")
do
sleep 15
done
rm -f "$dnstmp"
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo "Waiting for masters to become ready ..."
kubectl wait --for create --timeout=600s node -l node-role.kubernetes.io/master
kubectl wait --for condition=ready --timeout=600s node -l node-role.kubernetes.io/master
popd
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I deploy initial manifests
This step deploys some manifests required during bootstrap, including cert-manager, machine-api-provider, machinesets, loadbalancer controller, and ingress loadbalancer.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_api_url=
# export INPUT_vault_address=
# export INPUT_kubeconfig_path=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
export KUBECONFIG="${INPUT_kubeconfig_path}"
export VAULT_ADDR=${INPUT_vault_address}
vault login -method=oidc
echo '# Applying cert-manager ... #'
kubectl apply -f catalog/manifests/cert-manager/00_namespace.yaml
kubectl apply -Rf catalog/manifests/cert-manager/10_cert_manager
# shellcheck disable=2046
# we need word splitting here
kubectl -n syn-cert-manager patch --type=merge \
$(kubectl -n syn-cert-manager get deploy -oname) \
-p '{"spec":{"template":{"spec":{"tolerations":[{"operator":"Exists"}]}}}}'
echo '# Applied cert-manager. #'
echo
echo '# Applying machine-api-provider ... #'
VAULT_TOKEN=$(vault token lookup -format=json | jq -r .data.id)
export VAULT_TOKEN
kapitan refs --reveal --refs-path catalog/refs -f catalog/manifests/machine-api-provider-cloudscale/00_secrets.yaml | kubectl apply -f -
kubectl apply -f catalog/manifests/machine-api-provider-cloudscale/10_clusterRoleBinding.yaml
kubectl apply -f catalog/manifests/machine-api-provider-cloudscale/10_serviceAccount.yaml
kubectl apply -f catalog/manifests/machine-api-provider-cloudscale/11_deployment.yaml
echo '# Applied machine-api-provider. #'
echo
echo '# Applying machinesets ... #'
for f in catalog/manifests/openshift4-nodes/machineset-*.yaml;
do kubectl apply -f "$f";
done
echo '# Applied machinesets. #'
echo
echo '# Applying loadbalancer controller ... #'
kubectl apply -f catalog/manifests/cloudscale-loadbalancer-controller/00_namespace.yaml
kapitan refs --reveal --refs-path catalog/refs -f catalog/manifests/cloudscale-loadbalancer-controller/10_secrets.yaml | kubectl apply -f -
# TODO(aa): This fails on the first attempt because likely some of the previous resources need time to come online; figure out what to wait for
until kubectl apply -Rf catalog/manifests/cloudscale-loadbalancer-controller/10_kustomize
do
echo "Manifests didn't apply, waiting a moment to try again ..."
sleep 20
done
echo "Waiting for load balancer to become available ..."
kubectl -n appuio-cloudscale-loadbalancer-controller \
wait --for condition=available --timeout 3m \
deploy cloudscale-loadbalancer-controller-controller-manager
echo '# Applied loadbalancer controller. #'
echo
echo '# Applying ingress loadbalancer ... #'
kubectl apply -f catalog/manifests/cloudscale-loadbalancer-controller/20_loadbalancers.yaml
echo '# Applied ingress loadbalancer. #'
echo
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I wait for bootstrap to complete
This step waits for OpenShift bootstrap to complete successfully.
Then I remove the bootstrap node
After successful bootstrapping, this step removes the bootstrap node again.
Inputs
-
cloudscale_token -
cloudscale_token_floaty -
control_vshn_api_token -
ignition_bootstrap -
hieradata_repo_token -
gitlab_user_name -
gitlab_api_token -
commodore_cluster_id -
commodore_api_url -
lb_fqdn_1 -
lb_fqdn_2
Script
OUTPUT=$(mktemp)
# export INPUT_cloudscale_token=
# export INPUT_cloudscale_token_floaty=
# export INPUT_control_vshn_api_token=
# export INPUT_ignition_bootstrap=
# export INPUT_hieradata_repo_token=
# export INPUT_gitlab_user_name=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_api_url=
# export INPUT_lb_fqdn_1=
# export INPUT_lb_fqdn_2=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
cat <<EOF > ./terraform.env
CLOUDSCALE_API_TOKEN=${INPUT_cloudscale_token}
TF_VAR_ignition_bootstrap=${INPUT_ignition_bootstrap}
TF_VAR_lb_cloudscale_api_secret=${INPUT_cloudscale_token_floaty}
TF_VAR_control_vshn_net_token=${INPUT_control_vshn_api_token}
GIT_AUTHOR_NAME=$(git config --global user.name)
GIT_AUTHOR_EMAIL=$(git config --global user.email)
HIERADATA_REPO_TOKEN=${INPUT_hieradata_repo_token}
EOF
tf_image=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.image" \
dependencies/openshift4-terraform/class/defaults.yml)
tf_tag=$(\
yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
dependencies/openshift4-terraform/class/defaults.yml)
echo "Using Terraform image: ${tf_image}:${tf_tag}"
base_dir=$(pwd)
alias terraform='touch .terraformrc; docker run --rm -e REAL_UID=$(id -u) -e TF_CLI_CONFIG_FILE=/tf/.terraformrc --env-file ${base_dir}/terraform.env -w /tf -v $(pwd):/tf --ulimit memlock=-1 "${tf_image}:${tf_tag}" /tf/terraform.sh'
gitlab_repository_url=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${INPUT_commodore_api_url}/clusters/${INPUT_commodore_cluster_id} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
gitlab_repository_name=${gitlab_repository_url##*/}
gitlab_catalog_project_id=$(curl -sH "Authorization: Bearer ${INPUT_gitlab_api_token}" "https://git.vshn.net/api/v4/projects?simple=true&search=${gitlab_repository_name/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${gitlab_repository_url}\") | .id")
gitlab_state_url="https://git.vshn.net/api/v4/projects/${gitlab_catalog_project_id}/terraform/state/cluster"
pushd catalog/manifests/openshift4-terraform/
terraform init \
"-backend-config=address=${gitlab_state_url}" \
"-backend-config=lock_address=${gitlab_state_url}/lock" \
"-backend-config=unlock_address=${gitlab_state_url}/lock" \
"-backend-config=username=${INPUT_gitlab_user_name}" \
"-backend-config=password=${INPUT_gitlab_api_token}" \
"-backend-config=lock_method=POST" \
"-backend-config=unlock_method=DELETE" \
"-backend-config=retry_wait_min=5"
rm override.tf
terraform apply --auto-approve
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please review and merge the LB hieradata MR listed in Terraform output hieradata_mr. @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "${INPUT_commodore_cluster_id}")
do
sleep 10
done
echo PR merged, waiting for CI to finish...
sleep 10
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab mr list -R=appuio/appuio_hieradata | grep "running")
do
sleep 10
done
ssh "${INPUT_lb_fqdn_1}" sudo puppetctl run
ssh "${INPUT_lb_fqdn_2}" sudo puppetctl run
popd
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I configure initial deployments
This step configures some deployments that require manual changes after cluster bootstrap, such as reverting the Cilium patch from earlier, enabling proxy protocol on the Ingress controller, and scheduling the ingress controller on the infrastructure nodes.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_api_url=
# export INPUT_kubeconfig_path=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo '# Enabling proxy protocol ... #'
kubectl -n openshift-ingress-operator patch ingresscontroller default --type=json \
-p '[{
"op":"replace",
"path":"/spec/endpointPublishingStrategy",
"value": {"type": "HostNetwork", "hostNetwork": {"protocol": "PROXY"}}
}]'
echo '# Enabled proxy protocol. #'
echo
distribution="$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${INPUT_commodore_cluster_id} | jq -r .facts.distribution)"
if [[ "$distribution" != "oke" ]]
then
echo '# Scheduling ingress controller on infra nodes ... #'
kubectl -n openshift-ingress-operator patch ingresscontroller default --type=json \
-p '[{
"op":"replace",
"path":"/spec/nodePlacement",
"value":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/infra":""}}}
}]'
echo '# Scheduled ingress controller on infra nodes. #'
echo
fi
echo '# Removing temporary cert-manager tolerations ... #'
# shellcheck disable=2046
# we need word splitting here
kubectl -n syn-cert-manager patch --type=json \
$(kubectl -n syn-cert-manager get deploy -oname) \
-p '[{"op":"remove","path":"/spec/template/spec/tolerations"}]'
echo '# Removed temporary cert-manager tolerations. #'
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I wait for installation to complete
This step waits for OpenShift installation to complete successfully.
Then I synthesize the cluster
This step enables Project Syn on the cluster.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_api_url=
# export INPUT_commodore_cluster_id=
# export INPUT_kubeconfig_path=
set -euo pipefail
export COMMODORE_API_URL="${INPUT_commodore_api_url}"
export KUBECONFIG="${INPUT_kubeconfig_path}"
LIEUTENANT_AUTH="Authorization:Bearer $(commodore fetch-token)"
if ! kubectl get deploy -n syn steward > /dev/null; then
INSTALL_URL=$(curl -H "${LIEUTENANT_AUTH}" "${COMMODORE_API_URL}/clusters/${INPUT_commodore_cluster_id}" | jq -r ".installURL")
if [[ $INSTALL_URL == "null" ]]
# TODO(aa): consider doing this programmatically - especially if, at a later point, we add the lieutenant kubeconfig to the inputs anyway
then
echo '###################################################################################'
echo '# #'
echo '# Could not fetch install URL! Please reset the bootstrap token and try again. #'
echo '# #'
echo '###################################################################################'
echo
echo 'See https://kb.vshn.ch/corp-tech/projectsyn/explanation/bootstrap-token.html#_resetting_the_bootstrap_token'
exit 1
fi
echo "# Deploying steward ..."
kubectl create -f "$INSTALL_URL"
fi
echo "# Waiting for ArgoCD resource to exist ..."
kubectl wait --for=create crds/argocds.argoproj.io --timeout=5m
echo "# Waiting for ArgoCD instance to exist ..."
kubectl wait --for=create argocd/syn-argocd -nsyn --timeout=90s
echo "# Waiting for ArgoCD instance to be ready ..."
kubectl wait --for=jsonpath='{.status.phase}'=Available argocd/syn-argocd -nsyn --timeout=5m
echo "Done."
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I set acme-dns CNAME records
This step ensures CNAME records exist for ACME challenges once cert-manager is properly deployed.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo '# Waiting for cert-manager namespace ...'
kubectl wait --for=create ns/syn-cert-manager
echo '# Waiting for cert-manager secret ...'
kubectl wait --for=create secret/acme-dns-client -nsyn-cert-manager
fulldomain=""
while [[ -z "$fulldomain" ]]
do
fulldomain=$(kubectl -n syn-cert-manager \
get secret acme-dns-client \
-o jsonpath='{.data.acmedns\.json}' | \
base64 -d | \
jq -r '[.[]][0].fulldomain')
echo "$fulldomain"
done
dnstmp=$(mktemp)
echo "_acme-challenge.api IN CNAME $fulldomain." > "$dnstmp"
echo "_acme-challenge.apps IN CNAME $fulldomain." >> "$dnstmp"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo "@ @"
echo "@ Please add the acme DNS records below to your DNS provider. @"
echo "@ Most probably in https://git.vshn.net/vshn/vshn_zonefiles @"
echo "@ @"
echo "@ If terminal selection does not work the entries can also be copied from @"
echo "@ $dnstmp @"
echo "@ @"
echo "@ Waiting for record to propagate... @"
echo "@ @"
echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
echo
echo "The following entry must be created in the same origin as the api record:"
echo "_acme-challenge.api IN CNAME $fulldomain."
echo "The following entry must be created in the same origin as the apps record:"
echo "_acme-challenge.apps IN CNAME $fulldomain."
echo
# back up kubeconfig, just in case
cp "${INPUT_kubeconfig_path}" "${INPUT_kubeconfig_path}2"
yq -i e 'del(.clusters[0].cluster.certificate-authority-data)' "${INPUT_kubeconfig_path}"
echo "Waiting for cluster certificate to be issued ..."
echo "If you need to debug things on the cluster, run:"
echo "export KUBECONFIG=${INPUT_kubeconfig_path}2"
echo
until kubectl get nodes
do
sleep 20
done
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I verify emergency access
This step ensures the emergency credentials for the cluster can be retrieved.
Inputs
-
kubeconfig_path -
cluster_domain -
commodore_cluster_id -
passbolt_passphrase: Your password for Passbolt.
This is required to access the encrypted emergency credentials.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
# export INPUT_cluster_domain=
# export INPUT_commodore_cluster_id=
# export INPUT_passbolt_passphrase=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo '# Waiting for emergency-credentials-controller namespace ...'
kubectl wait --for=create ns/appuio-emergency-credentials-controller
echo '# Waiting for emergency-credentials-controller ...'
kubectl wait --for=create secret/acme-dns-client -nsyn-cert-manager
echo '# Waiting for emergency credential tokens ...'
until kubectl -n appuio-emergency-credentials-controller get emergencyaccounts.cluster.appuio.io -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.lastTokenCreationTimestamp}{"\n"}{end}' | grep "$( date '+%Y' )" >/dev/null
do
echo -n .
done
export EMR_KUBERNETES_ENDPOINT=https://api.${INPUT_cluster_domain}:6443
export EMR_PASSPHRASE="${INPUT_passbolt_passphrase}"
emergency-credentials-receive "${INPUT_commodore_cluster_id}"
yq -i e '.clusters[0].cluster.insecure-skip-tls-verify = true' "em-${INPUT_commodore_cluster_id}"
export KUBECONFIG="em-${INPUT_commodore_cluster_id}"
kubectl get nodes
oc whoami | grep system:serviceaccount:appuio-emergency-credentials-controller: || exit 1
env -i "kubeconfig_path=$(pwd)/em-${INPUT_commodore_cluster_id}" >> "$OUTPUT"
echo "# Invalidating 10-year admin kubeconfig ..."
kubectl -n openshift-config patch cm admin-kubeconfig-client-ca --type=merge -p '{"data": {"ca-bundle.crt": ""}}'
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I configure the cluster alerts
This step configures monitoring alerts on the cluster.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo '# Installing default alert silence ...'
oc --as=system:admin -n openshift-monitoring create job --from=cronjob/silence silence-manual
oc wait -n openshift-monitoring --for=condition=complete job/silence-manual
oc --as=system:admin -n openshift-monitoring delete job/silence-manual
echo '# Retrieving active alerts ...'
kubectl --as=system:admin -n openshift-monitoring exec sts/alertmanager-main -- \
amtool --alertmanager.url=http://localhost:9093 alert --active
echo
echo '#######################################################'
echo '# #'
echo '# Please review the list of open alerts above, #'
echo '# address any that require action before proceeding. #'
echo '# #'
echo '#######################################################'
sleep 2
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I enable Opsgenie alerting
This step enables Opsgenie alerting for the cluster via Project Syn.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_commodore_tenant_id=
set -euo pipefail
pushd "inventory/classes/${INPUT_commodore_tenant_id}/"
yq eval -i 'del(.classes[] | select(. == "*.no-opsgenie"))' ${INPUT_commodore_cluster_id}.yml
git commit -a -m "Enable opsgenie alerting on cluster ${INPUT_commodore_cluster_id}"
git push
popd
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I verify the image registry config
This step verifies that the image registry config has bootstrapped correctly.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo '# Checking image registry status conditions ...'
status="$( kubectl get config.imageregistry/cluster -oyaml --as system:admin | yq '.status.conditions[] | select(.type == "Available").status' )"
if [[ $status != "True" ]]
then
kubectl get config.imageregistry/cluster -oyaml --as system:admin | yq '.status.conditions'
echo
echo ERROR: image registry is not available.
echo Please review the status reports above and manually fix the registry.
echo
echo > kubectl get config.imageregistry/cluster
exit 1
fi
echo '# Checking image registry pods ...'
numpods="$( kubectl -n openshift-image-registry get pods -l docker-registry=default --field-selector=status.phase==Running -oyaml | yq '.items | length' )"
if (( numpods != 2 ))
then
kubectl -n openshift-image-registry get pods -l docker-registry=default
echo
echo ERROR: unexpected number of registry pods
echo Please review the running pods above and ensure the 2 registry pods are running.
echo
echo > kubectl -n openshift-image-registry get pods -l docker-registry=default
exit 1
fi
echo '# Ensuring openshift-samples operator is enabled ...'
mgstate="$( kubectl get config.samples cluster -ojsonpath='{.spec.managementState}' )"
if [[ $mgstate != "Managed" ]]
then
kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'
fi
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I schedule the first maintenance
This step verifies that the UpgradeConfig object is present on the cluster, and schedules a first maintenance.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
numconfs="$( kubectl -n appuio-openshift-upgrade-controller get upgradeconfig -oyaml | yq '.items | length' )"
if (( numconfs < 1 ))
then
kubectl -n appuio-openshift-upgrade-controller get upgradeconfig
echo
echo ERROR: did not find an upgradeconfig
echo Please review the output above and ensure an upgradeconfig is present.
echo
echo "Double check the cluster's maintenance_window fact."
exit 1
fi
echo '# Scheduling a first maintenance ...'
uc="$(yq .parameters.facts.maintenance_window inventory/classes/params/cluster.yml)"
kubectl -n appuio-openshift-upgrade-controller get upgradeconfig "$uc" -oyaml | \
yq '
.metadata.name = "first",
.metadata.labels = {},
.spec.jobTemplate.metadata.labels.upgradeconfig/name = "first",
.spec.schedule.cron = ((now+"1m")|format_datetime("4 15")) + " * * *",
.spec.pinVersionWindow = "0m"
' | \
kubectl create -f - --as=system:admin
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
Then I configure apt-dater groups for the LoadBalancers
This step configures the apt-dater groups for the LoadBalancers via puppet.
Script
OUTPUT=$(mktemp)
# export INPUT_lb_fqdn_1=
# export INPUT_lb_fqdn_2=
# export INPUT_gitlab_api_token=
# export INPUT_commodore_cluster_id=
set -euo pipefail
if [ -e nodes_hieradata ]
then
rm -rf nodes_hieradata
fi
git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git
pushd nodes_hieradata
if ! grep "s_apt_dater::host::group" "${INPUT_lb_fqdn_1}"
then
# NOTE(aa): no indentation because here documents are ... something
cat >"${INPUT_lb_fqdn_1}.yaml" <<EOF
---
s_apt_dater::host::group: '2200_20_night_main'
EOF
fi
if ! grep "s_apt_dater::host::group" "${INPUT_lb_fqdn_2}"
then
# NOTE(aa): no indentation because here documents are ... something
cat >"${INPUT_lb_fqdn_2}.yaml" <<EOF
---
s_apt_dater::host::group: '2200_40_night_second'
EOF
fi
git add ./*.yaml
if not git diff-index --quiet HEAD
then
git commit -m"Configure apt-dater groups for LBs for OCP4 cluster ${INPUT_commodore_cluster_id}"
git push origin master
echo Waiting for CI to finish...
sleep 10
while (GITLAB_HOST=git.vshn.net GITLAB_TOKEN="${INPUT_gitlab_api_token}" glab ci list -R=vshn-puppet/nodes_hieradata | grep "running")
do
sleep 10
done
echo Running puppet ...
for fqdn in "${INPUT_lb_fqdn_1}" "${INPUT_lb_fqdn_2}"
do
ssh "${fqdn}" sudo puppetctl run
done
fi || true
popd
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I remove the bootstrap bucket
This step deletes the S3 bucket with the bootstrap ignition config.
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_vault_address=
set -euo pipefail
export VAULT_ADDR=${INPUT_vault_address}
vault login -method=oidc
mc rm -r --force "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition"
mc rb "${INPUT_commodore_cluster_id}/${INPUT_commodore_cluster_id}-bootstrap-ignition"
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I add the cluster to openshift4-clusters
This step adds the cluster to git.vshn.net/vshn/openshift4-clusters
Inputs
-
commodore_cluster_id -
kubeconfig_path -
jumphost_fqdn: FQDN of the jumphost used to connect to this cluster, if any.
If no jumphost is used, enter "NONE".
-
socks5_port: SOCKS5 port number to use for this cluster, of the form 120XX. If the cluster shares a proxy jumphost with another cluster, use the same port. If the cluster uses a brand new jumphost, choose a new unique port.
If the cluster does not use a proxy jumphost, enter "NONE".
Script
OUTPUT=$(mktemp)
# export INPUT_commodore_cluster_id=
# export INPUT_kubeconfig_path=
# export INPUT_jumphost_fqdn=
# export INPUT_socks5_port=
set -euo pipefail
if [ -e openshift4-clusters ]
then
rm -rf openshift4-clusters
fi
git clone git@git.vshn.net:vshn/openshift4-clusters.git
pushd openshift4-clusters
if [[ -d "${INPUT_commodore_cluster_id}" ]]
then
echo "Cluster entry already exists - not touching that!"
exit 0
else
API_URL=$(yq e '.clusters[0].cluster.server' "${INPUT_kubeconfig_path}")
mkdir -p "${INPUT_commodore_cluster_id}"
pushd "${INPUT_commodore_cluster_id}"
ln -s ../base_envrc .envrc
cat >.connection_facts <<EOF
API=${API_URL}
EOF
popd
port="$( echo "${INPUT_socks5_port}" | tr '[:upper:]' '[:lower:]' )"
jumphost="$( echo "${INPUT_jumphost_fqdn}" | tr '[:upper:]' '[:lower:]' )"
if [[ "$port" != "none" ]] && [[ "$jumphost" != "none" ]]
then
cat >> "${INPUT_commodore_cluster_id}/.connection_facts" <<EOF
JUMPHOST=${INPUT_jumphost_fqdn}
SOCKS5_PORT=${INPUT_socks5_port}
EOF
python foxyproxy_generate.py
fi
git add --force "${INPUT_commodore_cluster_id}"
git add .
if not git diff-index --quiet HEAD
then
git commit -am "Add cluster ${INPUT_commodore_cluster_id}"
fi || true
fi
popd
echo
echo '#########################################################'
echo '# #'
echo '# Please test the cluster connection, and if it works #'
echo '# as expected, push the commit to the repository. #'
echo '# #'
echo '#########################################################'
echo
echo "Run the following:"
echo "cd $(pwd)/openshift4-clusters/${INPUT_commodore_cluster_id}"
echo "direnv allow"
echo "oc whoami"
echo "git push origin main # only if everything is OK"
sleep 2
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"
And I wait for maintenance to complete
This step waits for the first maintenance to complete, and then removes the initial UpgradeConfig.
Script
OUTPUT=$(mktemp)
# export INPUT_kubeconfig_path=
set -euo pipefail
export KUBECONFIG="${INPUT_kubeconfig_path}"
echo "# Waiting for initial maintenance to complete ..."
oc get clusterversion
until kubectl wait --for=condition=Succeeded upgradejob -l "upgradeconfig/name=first" -n appuio-openshift-upgrade-controller 2>/dev/null
do
oc get clusterversion | grep -v NAME
done
echo "# Deleting initial UpgradeConfig ..."
kubectl --as=system:admin -n appuio-openshift-upgrade-controller \
delete upgradeconfig first
# echo "# Outputs"
# cat "$OUTPUT"
# rm -f "$OUTPUT"