Installation on Exoscale
Steps to install an OpenShift 4 cluster on Exoscale.
These steps follow the Installing a cluster on bare metal docs to set up a user provisioned installation (UPI). Terraform is used to provision the cloud infrastructure.
This how-to guide is still a work in progress and will change. It’s currently very specific to VSHN and needs further changes to be more generic. |
This guide is currently assuming that you’re using terraform-openshift4-exoscale v5 (component openshift4-terraform v7)
|
Starting situation
-
You already have a Tenant and its Git repository
-
You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager
Don’t use your personal account to login to the cluster manager for installation. -
You want to register a new cluster in Lieutenant and are about to install Openshift 4 on Exoscale
Prerequisites
-
jq
-
yq
yq YAML processor (version 4 or higher - use the go version by mikefarah, not the jq wrapper by kislyuk) -
vault
Vault CLI -
curl
-
emergency-credentials-receive
Install instructions -
gzip
-
docker
-
md5sum
-
virt-edit
-
cpio
-
exo
>= v1.76.0 Exoscale CLI -
An Exoscale API key with full permissions
-
DNS subscription activated in the Exoscale organisation
Make sure the version of openshift-install and the rhcos image is the same, otherwise ignition will fail. |
Cluster Installation
Register the new OpenShift 4 cluster in Lieutenant.
Set cluster facts
For customer clusters, set the following cluster facts in Lieutenant:
-
access_policy
: Access-Policy of the cluster, such asregular
orswissonly
-
service_level
: Name of the service level agreement for this cluster, such asguaranteed-availability
-
sales_order
: Name of the sales order to which the cluster is billed, such asS10000
-
release_channel
: Name of the syn component release channel to use, such asstable
Set up Keycloak service
-
Create a Keycloak service
Use control.vshn.net/vshn/services/_create to create a service. The name and ID must be clusters name. For the optional URL use the OpenShift console URL.
Configure input
export EXOSCALE_API_KEY=<exoscale-key> (1)
export EXOSCALE_API_SECRET=<exoscale-secret>
export EXOSCALE_ZONE=<exoscale-zone> (2)
export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"
1 | We recommend setting up an IAMv3 role called unrestricted with "Default Service Strategy" set to allow if it doesn’t exist yet. |
2 | All lower case. For example ch-dk-2 . |
# From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens
export BASE_DOMAIN=<your-base-domain> # customer-provided base domain without cluster name, e.g. "zrh.customer.vshnmanaged.net"
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.
For BASE_DOMAIN
explanation, see DNS Scheme.
Create restricted Exoscale IAM keys for the LBs and object storage
If creating the API key fails, please retry the commands starting from the command which contains |
-
Create restricted API key for Exoscale object storage
# Create SOS IAM role, if it doesn't exist yet in the organization sos_iam_role_id=$(exo iam role list -O json | \ jq -r '.[] | select(.name=="sos-full-access") | .key') if [ -z "${sos_iam_role_id}" ]; then echo '{ "default-service-strategy": "deny", "services": { "sos": {"type": "allow"} } }' | \ exo iam role create sos-full-access \ --description "Full access to object storage service" \ --policy - fi # Create access key exoscale_s3_credentials=$(exo iam api-key create -O json \ "${CLUSTER_ID}_object_storage" sos-full-access) export EXOSCALE_S3_ACCESSKEY=$(echo "${exoscale_s3_credentials}" | jq -r '.key') export EXOSCALE_S3_SECRETKEY=$(echo "${exoscale_s3_credentials}" | jq -r '.secret')
-
Create restricted API key for AppCat Provider Exoscale
# Create AppCat Provider Exoscale IAM role, if it doesn't exist yet in the organization appcat_role_id=$(exo iam role list -O json | \ jq -r '.[] | select(.name=="appcat-provider-exoscale") | .key') if [ -z "${appcat_role_id}" ]; then echo '{ "default-service-strategy": "deny", "services": { "sos": {"type": "allow"}, "dbaas": {"type": "allow"}, "iam": {"type": "allow"} } }' | \ exo iam role create appcat-provider-exoscale \ --description "AppCat provider role: Full access to SOS, DBaaS and IAM" \ --policy - fi # Create access key appcat_credentials=$(exo iam api-key create -O json \ appcat-provider-exoscale appcat-provider-exoscale) export APPCAT_ACCESSKEY=$(echo "${appcat_credentials}" | jq -r '.key') export APPCAT_SECRETKEY=$(echo "${appcat_credentials}" | jq -r '.secret')
-
Create restricted API key for Exoscale CSI driver
# Create Exoscale CSi driver Exoscale IAM role, if it doesn't exist yet in the organization csidriver_role_id=$(exo iam role list -O json | \ jq -r '.[] | select(.name=="csi-driver-exoscale") | .key') if [ -z "${csidriver_role_id}" ]; then echo '{ "default-service-strategy": "deny", "services": { "compute": { "type": "rules", "rules": [ { "expression": "operation in ['list-zones', 'get-block-storage-volume', 'list-block-storage-volumes', 'create-block-storage-volume', 'delete-block-storage-volume', 'attach-block-storage-volume-to-instance', 'detach-block-storage-volume', 'update-block-storage-volume-labels', 'resize-block-storage-volume', 'get-block-storage-snapshot', 'list-block-storage-snapshots', 'create-block-storage-snapshot', 'delete-block-storage-snapshot']", "action": "allow" } ] } } }' | \ exo iam role create csi-driver-exoscale \ --description "Exoscale CSI Driver: Access to storage operations and zone list" \ --policy - fi # Create access key csi_credentials=$(exo iam api-key create -O json \ csi-driver-exoscale csi-driver-exoscale) export CSI_ACCESSKEY=$(echo "${csi_credentials}" | jq -r '.key') export CSI_SECRETKEY=$(echo "${csi_credentials}" | jq -r '.secret')
Set up S3 bucket for cluster bootstrap
-
Create S3 bucket
exo storage create "sos://${CLUSTER_ID}-bootstrap" --zone "${EXOSCALE_ZONE}"
Upload Red Hat CoreOS image
-
Fetch and convert the latest Red Hat CoreOS image
RHCOS_VERSION="4.15.23" curl -L "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.15/${RHCOS_VERSION}/rhcos-${RHCOS_VERSION}-x86_64-openstack.x86_64.qcow2.gz" | gunzip > rhcos-${RHCOS_VERSION}.qcow2 sudo virt-edit -a rhcos-${RHCOS_VERSION}.qcow2 \ -m /dev/sda3:/ /loader/entries/ostree-1.conf \ -e 's/openstack/exoscale/' exo storage upload rhcos-${RHCOS_VERSION}.qcow2 "sos://${CLUSTER_ID}-bootstrap" --acl public-read exo compute instance-template register "rhcos-${RHCOS_VERSION}" \ "https://${EXOSCALE_S3_ENDPOINT}/${CLUSTER_ID}-bootstrap/rhcos-${RHCOS_VERSION}.qcow2" \ "$(md5sum rhcos-${RHCOS_VERSION}.qcow2 | awk '{ print $1 }')" \ --zone "${EXOSCALE_ZONE}" \ --boot-mode uefi \ --disable-password \ --username core \ --description "Red Hat Enterprise Linux CoreOS (RHCOS) ${RHCOS_VERSION}" exo storage delete "sos://${CLUSTER_ID}-bootstrap/rhcos-${RHCOS_VERSION}.qcow2" export RHCOS_TEMPLATE="rhcos-${RHCOS_VERSION}"
Set secrets in Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
# Set the Exoscale object storage API key
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/storage_iam \
s3_access_key=${EXOSCALE_S3_ACCESSKEY} \
s3_secret_key=${EXOSCALE_S3_SECRETKEY}
# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)
# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Set the AppCat Provider Exoscale Credentials
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/appcat/provider-exoscale \
access-key=${APPCAT_ACCESSKEY} \
secret-key=${APPCAT_SECRETKEY}
# Set the CSI Driver Exoscale Credentials
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/csi_driver \
s3_access_key=${CSI_ACCESSKEY} \
s3_secret_key=${CSI_SECRETKEY}
export HIERADATA_REPO_SECRET=$(vault kv get \
-format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
Prepare Cluster Repository
Starting with this section, we recommend that you change into a clean directory (for example a directory in your home). |
Check Running Commodore for details on how to run commodore. |
-
Prepare Commodore inventory.
mkdir -p inventory/classes/ git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
-
Configure the cluster’s domain in Project Syn
export CLUSTER_DOMAIN="${CLUSTER_ID}.${BASE_DOMAIN}" (1)
1 Adjust this as necessary if you’re using a non-standard cluster domain. The cluster domain configured here must be correct. The value is used to configure how Cilium connects to the cluster’s K8s API.
pushd "inventory/classes/${TENANT_ID}/" yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \ ${CLUSTER_ID}.yml git commit -a -m "Configure cluster domain for ${CLUSTER_ID}"
-
Add Cilium to cluster configuration
These instructions assume that Cilium is configured to use
api-int.${CLUSTER_DOMAIN}:6443
to connect to the cluster’s K8s API. To ensure that that’s the case, add the configuration shown below somewhere in the Project Syn config hierarchy.parameters: cilium: cilium_helm_values: k8sServiceHost: api-int.${openshift:baseDomain} k8sServicePort: "6443"
For VSHN, this configuration is set in the Commodore global defaults (internal).
yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml git commit -a -m "Add Cilium addon to ${CLUSTER_ID}" git push popd
-
Compile catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.28" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.28" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.15" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.15" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Configure the OpenShift Installer
-
Generate SSH key
We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.
SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID" export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub" ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N '' BASE64_NO_WRAP='base64' if [[ "$OSTYPE" == "linux"* ]]; then BASE64_NO_WRAP='base64 --wrap 0' fi vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/ssh \ private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP") ssh-add $SSH_PRIVATE_KEY
-
Prepare
install-config.yaml
You can add more options to the
install-config.yaml
file. Have a look at the config example for more information.For example, you could change the SDN from a default value to something a customer requests due to some network requirements.
export INSTALLER_DIR="$(pwd)/target" mkdir -p "${INSTALLER_DIR}" cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF apiVersion: v1 metadata: name: ${CLUSTER_ID} (1) baseDomain: ${BASE_DOMAIN} (1) platform: none: {} networking: networkType: Cilium pullSecret: | ${PULL_SECRET} sshKey: "$(cat $SSH_PUBLIC_KEY)" EOF
1 Make sure that the values here match the value of $CLUSTER_DOMAIN
when combined as<metadata.name>.<baseDomain>
. Otherwise, the installation will most likely fail.If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with
less catalog/manifests/cilium/olm/*ciliumconfig.yaml
.
Run OpenShift Installer
The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps. |
-
Render install manifests (this will consume the
install-config.yaml
)openshift-install --dir "${INSTALLER_DIR}" \ create manifests
-
If you want to change the default "apps" domain for the cluster:
yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \ spec.domain apps.example.com
-
-
Copy pre-rendered extra machine configs
machineconfigs=catalog/manifests/openshift4-nodes/10_machineconfigs.yaml if [ -f $machineconfigs ]; then yq --no-doc -s \ "\"${INSTALLER_DIR}/openshift/99x_openshift-machineconfig_\" + .metadata.name" \ $machineconfigs fi
-
Copy pre-rendered Cilium manifests
cp catalog/manifests/cilium/olm/* ${INSTALLER_DIR}/manifests/
-
Verify that the generated cluster domain matches the desired cluster domain
GEN_CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \ "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml") if [ "$GEN_CLUSTER_DOMAIN" != "$CLUSTER_DOMAIN" ]; then echo -e "\033[0;31mGenerated cluster domain doesn't match expected cluster domain: Got '$GEN_CLUSTER_DOMAIN', want '$CLUSTER_DOMAIN'\033[0;0m" else echo -e "\033[0;32mGenerated cluster domain matches expected cluster domain.\033[0;0m" fi
-
Prepare install manifests and ignition config
openshift-install --dir "${INSTALLER_DIR}" \ create ignition-configs
-
Upload ignition config
exo storage upload "${INSTALLER_DIR}/bootstrap.ign" "sos://${CLUSTER_ID}-bootstrap" --acl public-read
Terraform Cluster Config
-
Switch to the tenant repo
pushd "inventory/classes/${TENANT_ID}/"
-
Include openshift4.yml if it exists
if ls openshift4.y*ml 1>/dev/null 2>&1; then yq eval -i '.classes += ".openshift4"' ${CLUSTER_ID}.yml; fi
-
Update cluster config
yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \ ${CLUSTER_ID}.yml
If you use a custom "apps" domain, make sure to set
parameters.openshift.appsDomain
accordingly.APPS_DOMAIN=your.custom.apps.domain yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \ ${CLUSTER_ID}.yml
By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set
parameters.openshift4_version.spec.channel
accordingly.# Configure the OpenShift update channel as `fast` yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \ ${CLUSTER_ID}.yml
-
Set team responsible for handling Icinga alerts
# use lower case for team name. # e.g. TEAM=aldebaran TEAM=<team-name>
-
Prepare Terraform cluster config
CA_CERT=$(jq -r '.ignition.security.tls.certificateAuthorities[0].source' \ "${INSTALLER_DIR}/master.ign" | \ awk -F ',' '{ print $2 }' | \ base64 --decode) yq eval -i ".parameters.openshift4_terraform.terraform_variables.base_domain = \"${BASE_DOMAIN}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"${CA_CERT}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.team = \"${TEAM}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.hieradata_repo_user = \"${HIERADATA_REPO_USER}\"" \ ${CLUSTER_ID}.yml
-
Configure Exoscale-specific Terraform variables
yq eval -i ".parameters.openshift4_terraform.terraform_variables.rhcos_template = \"${RHCOS_TEMPLATE}\"" \ ${CLUSTER_ID}.yml
You now have the option to further customize the cluster by editing Please look at the configuration reference for the available options. |
Commit changes and compile cluster catalog
-
Review changes. Have a look at the file
${CLUSTER_ID}.yml
. Override default parameters or add more component configurations as required for your cluster.Ensure that you’re using component
openshift4-terraform
v7.0.0 or newer. Otherwise the instructions in this how-to might not apply. -
Commit changes
git commit -a -m "Setup cluster ${CLUSTER_ID}" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.28" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.28" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.15" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.15" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Provision Infrastructure
If Terraform is unable to create a new DNS domain because the DNS subscription limit has been reached, you need to manually upgrade the DNS subscription in the Exoscale console (DNS tab). Similarly, if Terraform is unable to create new instances because the usage of resource 'instance' has been exceeded, you can request a quota increase via the Exoscale console (Organization → Quotas). This usually gets processed within 5-10 minutes. You can retry the failed |
-
Configure Terraform secrets
cat <<EOF > ./terraform.env EXOSCALE_API_KEY EXOSCALE_API_SECRET TF_VAR_control_vshn_net_token GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='docker run -it --rm \ -e REAL_UID=$(id -u) \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
-
Provision Domain and security groups
cat > override.tf <<EOF module "cluster" { bootstrap_count = 0 lb_count = 0 master_count = 0 infra_count = 0 storage_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Create LB hieradata
cat > override.tf <<EOF module "cluster" { bootstrap_count = 0 master_count = 0 infra_count = 0 storage_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply -target "module.cluster.module.lb.module.hiera"
-
Set up DNS NS records on parent zone using the data from the Terraform output variable
ns_records
from the previous step -
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and wait until the deploy pipeline after the merge is completed. -
Create LBs
terraform apply
-
Make LB FQDNs available for later steps
Store LB FQDNs in environmentdeclare -a LB_FQDNS for id in 1 2; do LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.exoscale_domain_record.lb[$(expr $id - 1)]" | grep hostname | cut -d'=' -f2 | tr -d ' "\r\n') done
Verify FQDNsfor lb in "${LB_FQDNS[@]}"; do echo $lb; done
-
Check LB connectivity
for lb in "${LB_FQDNS[@]}"; do ping -c1 "${lb}" done
-
Wait until LBs are fully initialized by Puppet
# Wait for Puppet provisioning to complete while true; do curl --connect-timeout 1 "http://api.${CLUSTER_DOMAIN}:6443" &>/dev/null if [ $? -eq 52 ]; then echo -e "\nHAproxy up" break else echo -n "." sleep 5 fi done # Update sshop config, see https://wiki.vshn.net/pages/viewpage.action?pageId=40108094 sshop_update # Check that you can access the LBs using your usual SSH config for lb in "${LB_FQDNS[@]}"; do ssh "${lb}" hostname -f done
While you’re waiting for the LBs to be provisioned, you can check the cloud-init logs with the following SSH commands
ssh ubuntu@"${LB_FQDNS[1]}" tail -f /var/log/cloud-init-output.log ssh ubuntu@"${LB_FQDNS[2]}" tail -f /var/log/cloud-init-output.log
-
Check the "Server created" tickets for the LBs and link them to the cluster setup ticket.
-
Deploy bootstrap node
cat > override.tf <<EOF module "cluster" { bootstrap_count = 1 master_count = 0 infra_count = 0 storage_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and run Puppet on the LBs after the deploy job has completedfor fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Wait for bootstrap API to come up
API_URL=$(yq e '.clusters[0].cluster.server' "${INSTALLER_DIR}/auth/kubeconfig") while ! curl --connect-timeout 1 "${API_URL}/healthz" -k &>/dev/null; do echo -n "." sleep 5 done && echo -e "\nAPI is up"
-
Patch Cilium config to allow control plane bootstrap to succeed
We need to temporarily adjust the Cilium config to not use full kube-proxy replacement, since we currently don’t have a way to disable the initial OpenShift-managed kube-proxy deployment. Additionally, Because the cloudscale Cloud Controller Manager accesses the K8s API via service IP, we need to configure Cilium to provide partial kube-proxy replacement so that the CCM can start and untaint the control plane nodes so that other pods can be scheduled.
export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig" while ! kubectl get ciliumconfig -A &>/dev/null; do echo -n "." sleep 2 done && echo -e "\nCiliumConfig CR is present" kubectl patch -n cilium ciliumconfig cilium-enterprise --type=merge \ -p '{ "spec": { "cilium": { "kubeProxyReplacement": "false", "nodePort": { "enabled": true }, "socketLB": { "enabled": true }, "sessionAffinity": true, "externalIPs": { "enabled": true }, "hostPort": { "enabled": true } } } }'
-
Deploy control plane nodes
cat > override.tf <<EOF module "cluster" { bootstrap_count = 1 infra_count = 0 storage_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Wait for bootstrap to complete
openshift-install --dir "${INSTALLER_DIR}" \ wait-for bootstrap-complete --log-level debug
If you’re using a CNI other than Cilium you may need to remove the following taint from the nodes to allow the network to come up:
kubectl taint no --all node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-
Once the bootstrap is complete, taint the master nodes again to ensure that they’re properly initialized by the cloud-controller-manager.
kubectl taint no -l node-role.kubernetes.io/master node.cloudprovider.kubernetes.io/uninitialized=:NoSchedule
-
Remove bootstrap node and provision remaining nodes
rm override.tf terraform apply popd
-
Disable OpenShift kube-proxy deployment and revert Cilium patch
kubectl patch network.operator cluster --type=merge \ -p '{"spec":{"deployKubeProxy":false}}' kubectl -n cilium replace -f catalog/manifests/cilium/olm/cluster-network-07-cilium-ciliumconfig.yaml while ! kubectl -n cilium get cm cilium-config -oyaml | grep 'kube-proxy-replacement: "true"' &>/dev/null; do echo -n "." sleep 2 done && echo -e "\nCilium config updated" kubectl -n cilium rollout restart ds/cilium
-
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and run Puppet on the LBs after the deploy job has completedfor fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Approve node certs
# Once CSRs in state Pending show up, approve them # Needs to be run twice, two CSRs for each node need to be approved kubectl get csr -w oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \ xargs oc adm certificate approve kubectl get nodes
-
Label infra nodes
kubectl get node -ojson | \ jq -r '.items[] | select(.metadata.name | test("infra-")).metadata.name' | \ xargs -I {} kubectl label node {} node-role.kubernetes.io/infra=
-
Label and taint storage nodes
kubectl get node -ojson | \ jq -r '.items[] | select(.metadata.name | test("storage-")).metadata.name' | \ xargs -I {} kubectl label node {} node-role.kubernetes.io/storage= kubectl taint node -lnode-role.kubernetes.io/storage \ storagenode=True:NoSchedule
-
Label worker nodes
kubectl get node -ojson | \ jq -r '.items[] | select(.metadata.name | test("infra|master|storage-")|not).metadata.name' | \ xargs -I {} kubectl label node {} node-role.kubernetes.io/app=
At this point you may want to add extra labels to the additional worker groups, if there are any. -
Enable proxy protocol on ingress controller
kubectl -n openshift-ingress-operator patch ingresscontroller default --type=json \ -p '[{ "op":"replace", "path":"/spec/endpointPublishingStrategy", "value": {"type": "HostNetwork", "hostNetwork": {"protocol": "PROXY"}} }]'
This step isn’t necessary if you’ve disabled the proxy protocol on the load-balancers manually during setup.
By default, PROXY protocol is enabled through the VSHN Commodore global defaults.
-
Wait for installation to complete
openshift-install --dir ${INSTALLER_DIR} \ wait-for install-complete --log-level debug
-
Create secret with S3 credentials for the registry
oc create secret generic image-registry-private-configuration-user \ --namespace openshift-image-registry \ --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=${EXOSCALE_S3_ACCESSKEY} \ --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=${EXOSCALE_S3_SECRETKEY}
If the registry S3 credentials are created too long after the initial cluster setup, it’s possible that the
openshift-samples
operator has disabled itself because it couldn’t find a working in-cluster registry.If the samples operator is disabled, no templates and builder images will be available on the cluster.
You can check the samples-operator’s state with the following command:
kubectl get config.samples cluster -ojsonpath='{.spec.managementState}'
If the command returns
Removed
, verify that the in-cluster registry pods are now running, and enable the samples operator again:kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'
See the upstream documentation for more details on the samples operator.
Setup acme-dns CNAME records for the cluster
You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates. |
-
Extract the acme-dns subdomain for the cluster after
cert-manager
has been deployed via Project Syn.fulldomain=$(kubectl -n syn-cert-manager \ get secret acme-dns-client \ -o jsonpath='{.data.acmedns\.json}' | \ base64 -d | \ jq -r '[.[]][0].fulldomain') echo "$fulldomain"
-
Setup the
_acme-challenge
CNAME records in the cluster’s DNS zoneThe
_acme-challenge
records must be created in the same zone as the cluster’sapi
andapps
records respectively. The snippet below assumes that the cluster is configured to use the default "apps" domain in the cluster’s zone.for cname in "api" "apps"; do exo dns add CNAME "${CLUSTER_DOMAIN}" -n "_acme-challenge.${cname}" -a "${fulldomain}." -t 600 done
Ensure emergency admin access to the cluster
-
Check that emergency credentials were uploaded and are accessible:
emergency-credentials-receive "${CLUSTER_ID}" # Follow the instructions to use the downloaded kubeconfig file
You need to be in the passbolt group
VSHN On-Call
.If the command fails, check if the controller is already deployed, running, and if the credentials are uploaded:
kubectl -n appuio-emergency-credentials-controller get emergencyaccounts.cluster.appuio.io -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.lastTokenCreationTimestamp}{"\n"}{end}'
-
Follow the instructions from
emergency-credentials-receive
to use the downloadedkubeconfig
file.export KUBECONFIG="em-${CLUSTER_ID}" kubectl get nodes oc whoami # should output system:serviceaccount:appuio-emergency-credentials-controller:*
-
Invalidate the 10 year admin kubeconfig.
kubectl -n openshift-config patch cm admin-kubeconfig-client-ca --type=merge -p '{"data": {"ca-bundle.crt": ""}}'
Finalize installation
-
Configure the apt-dater groups for the LBs.
git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git pushd nodes_hieradata cat >"${LB_FQDNS[1]}.yaml" <<EOF --- s_apt_dater::host::group: '2200_20_night_main' EOF cat >"${LB_FQDNS[2]}.yaml" <<EOF --- s_apt_dater::host::group: '2200_40_night_second' EOF git add *.yaml git commit -m"Configure apt-dater groups for LBs for OCP4 cluster ${CLUSTER_ID}" git push origin master popd
This how-to defaults to the night maintenance window on Tuesday at 22:00. Adjust the apt-dater groups according to the documented groups (VSHN-internal only) if the cluster requires a different maintenance window.
-
Wait for deploy job on nodes hieradata to complete and run Puppet on the LBs to update the apt-dater groups.
for fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Delete local config files
rm -r ${INSTALLER_DIR}/
Post tasks
VSHN
-
Add the cluster to the maintenance template, if necessary
Generic
-
Do a first maintenance