Install OpenShift 4 on cloudscale.ch
Steps to install an OpenShift 4 cluster on cloudscale.ch.
These steps follow the Installing a cluster on bare metal docs to set up a user provisioned installation (UPI). Terraform is used to provision the cloud infrastructure.
The commands are idempotent and can be retried if any of the steps fail. The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h. |
This how-to guide is still a work in progress and will change. It’s currently very specific to VSHN and needs further changes to be more generic. |
Starting situation
-
You already have a Tenant and its git repository
-
You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager
Don’t use your personal account to login to the cluster manager for installation. -
You want to register a new cluster in Lieutenant and are about to install Openshift 4 on cloudscale.ch
Prerequisites
-
jq
-
yq
yq YAML processor (version 4 or higher - use the go version by mikefarah, not the jq wrapper by kislyuk) -
vault
Vault CLI -
curl
-
gzip
-
docker
-
mc
Minio client (aliased tomc
if necessary)
Make sure the minor version of |
Cluster Installation
Register the new OpenShift 4 cluster in Lieutenant.
Set up LDAP service
-
Create an LDAP service
Use control.vshn.net/vshn/services/_create to create a service. The name must contain the customer and the cluster name. And then put the LDAP service ID in the following variable:
export LDAP_ID="Your_LDAP_ID_here" export LDAP_PASSWORD="Your_LDAP_pw_here"
Configure input
# https://control.cloudscale.ch/service/<your-project>/api-token
export CLOUDSCALE_API_TOKEN=<cloudscale-api-token>
export TF_VAR_lb_cloudscale_api_secret=<cloudscale-api-token-for-Floaty>
# From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token> # use your personal SERVERS API token from https://control.vshn.net/tokens
export BASE_DOMAIN=<your-base-domain> # customer-provided base domain without cluster name, e.g. "zrh.customer.vshnmanaged.net"
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.
For BASE_DOMAIN
explanation, see DNS Scheme.
Set up S3 bucket for cluster bootstrap
-
Create S3 bucket
-
If a bucket user already exists for this cluster:
# Use already existing bucket user response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" \ https://api.cloudscale.ch/v1/objects-users | \ jq -e ".[] | select(.display_name == \"${CLUSTER_ID}\")")
-
To create a new bucket user:
# Create a new user response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_API_TOKEN}" \ -F display_name=${CLUSTER_ID} \ https://api.cloudscale.ch/v1/objects-users)
-
-
Configure the Minio client
export REGION=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region) mc config host add \ "${CLUSTER_ID}" "https://objects.${REGION}.cloudscale.ch" \ $(echo $response | jq -r '.keys[0].access_key') \ $(echo $response | jq -r '.keys[0].secret_key') mc mb --ignore-existing \ "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition"
Upload Red Hat CoreOS image
-
Export the Authorization header for the cloudscale.ch API.
export AUTH_HEADER="Authorization: Bearer ${CLOUDSCALE_API_TOKEN}"
The variable
CLOUDSCALE_API_TOKEN
could be used directly. Exporting the variableAUTH_HEADER
is done to be compatible with the cloudscale.ch API documentation. -
Check if image already exists in the correct zone
curl -sH "$AUTH_HEADER" https://api.cloudscale.ch/v1/custom-images | jq -r '.[] | select(.slug == "rhcos-4.11") | .zones[].slug'
If a URL is printed to the output, you can skip the next steps and directly jump to the next section.
-
Fetch the latest Red Hat CoreOS image
curl -L https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.11/4.11.9/rhcos-4.11.9-x86_64-openstack.x86_64.qcow2.gz | gzip -d > rhcos-4.11.qcow2
-
Upload the image to S3 and make it public
mc cp rhcos-4.11.qcow2 "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/" mc policy set download "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.11.qcow2"
You can check that the download policy is applied successfully with
mc policy get "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.11.qcow2"
The output should be
`Access permission for `[…]-bootstrap-ignition/rhcos-4.11.qcow2` is `download``
-
Import the image to cloudscale.ch
curl -i -H "$AUTH_HEADER" \ -F url="$(mc share download --json "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.11.qcow2" | jq -r .url)" \ -F name='RHCOS 4.11' \ -F zones="${REGION}1" \ -F slug=rhcos-4.11 \ -F source_format=qcow2 \ -F user_data_handling=pass-through \ https://api.cloudscale.ch/v1/custom-images/import
Set secrets in Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
# Set the cloudscale.ch access secrets
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale \
token=${CLOUDSCALE_API_TOKEN} \
s3_access_key=$(mc config host ls ${CLUSTER_ID} -json | jq -r .accessKey) \
s3_secret_key=$(mc config host ls ${CLUSTER_ID} -json | jq -r .secretKey)
# Put LB API key in Vault
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty \
iam_secret=${TF_VAR_lb_cloudscale_api_secret}
# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)
# Set the LDAP password
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vshn-ldap \
bindPassword=${LDAP_PASSWORD}
# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Copy the Dagobert OpenShift Node Collector Credentials
vault kv get -format=json "clusters/kv/template/dagobert" | jq '.data.data' \
| vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/dagobert" -
# Copy the VSHN acme-dns registration password
vault kv get -format=json "clusters/kv/template/cert-manager" | jq '.data.data' \
| vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cert-manager" -
export HIERADATA_REPO_SECRET=$(vault kv get \
-format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')
Prepare Cluster Repository
Starting with this section, we recommend that you change into a clean directory (for example a directory in your home). |
Check Running Commodore for details on how to run commodore. |
-
Prepare Commodore inventory.
mkdir -p inventory/classes/ git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
-
Add Cilium to cluster configuration
pushd "inventory/classes/${TENANT_ID}/" yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml git commit -a -m "Add Cilium addon to ${CLUSTER_ID}" git push popd
-
Compile catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.24" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.24" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.11" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.11" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Configure the OpenShift Installer
-
Generate SSH key
We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.
SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID" export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub" ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N '' BASE64_NO_WRAP='base64' if [[ "$OSTYPE" == "linux"* ]]; then BASE64_NO_WRAP='base64 --wrap 0' fi vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale/ssh \ private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP") ssh-add $SSH_PRIVATE_KEY
-
Prepare
install-config.yaml
You can add more options to the
install-config.yaml
file. Have a look at the config example for more information.For example, you could change the SDN from a default value to something a customer requests due to some network requirements.
export INSTALLER_DIR="$(pwd)/target" mkdir -p "${INSTALLER_DIR}" cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF apiVersion: v1 metadata: name: ${CLUSTER_ID} baseDomain: ${BASE_DOMAIN} platform: none: {} networking: networkType: Cilium pullSecret: | ${PULL_SECRET} sshKey: "$(cat $SSH_PUBLIC_KEY)" capabilities: baselineCapabilitySet: v4.11 EOF
If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with
less catalog/manifests/cilium/olm/*ciliumconfig.yaml
.
Run the OpenShift Installer
The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps. |
-
Render install manifests (this will consume the
install-config.yaml
)openshift-install --dir "${INSTALLER_DIR}" \ create manifests
-
If you want to change the default "apps" domain for the cluster:
yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \ spec.domain apps.example.com
-
-
Copy pre-rendered Cilium manifests
cp catalog/manifests/cilium/olm/* target/manifests/
-
Extract the cluster domain from the generated manifests
export CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \ "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml")
-
Prepare install manifests and ignition config
openshift-install --dir "${INSTALLER_DIR}" \ create ignition-configs
-
Upload ignition config
mc cp "${INSTALLER_DIR}/bootstrap.ign" "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/" export TF_VAR_ignition_bootstrap=$(mc share download \ --json --expire=4h \ "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/bootstrap.ign" | jq -r '.share')
Terraform Cluster Config
-
Switch to the tenant repo
pushd "inventory/classes/${TENANT_ID}/"
-
Update cluster config
yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.vshnLdap.serviceId = \"${LDAP_ID}\"" \ ${CLUSTER_ID}.yml
If you use a custom "apps" domain, make sure to set
parameters.openshift.appsDomain
accordingly.APPS_DOMAIN=your.custom.apps.domain yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \ ${CLUSTER_ID}.yml
By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set
parameters.openshift4_version.spec.channel
accordingly.# Configure the OpenShift update channel as `fast` yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \ ${CLUSTER_ID}.yml
-
Set team responsible for handling Icinga alerts
# use lower case for team name. # e.g. TEAM=tarazed TEAM=<team-name>
-
Prepare Terraform cluster config
CA_CERT=$(jq -r '.ignition.security.tls.certificateAuthorities[0].source' \ "${INSTALLER_DIR}/master.ign" | \ awk -F ',' '{ print $2 }' | \ base64 --decode) yq eval -i ".parameters.openshift4_terraform.terraform_variables.base_domain = \"${BASE_DOMAIN}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"${CA_CERT}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.ssh_keys = [\"$(cat ${SSH_PUBLIC_KEY})\"]" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.team = \"${TEAM}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift4_terraform.terraform_variables.hieradata_repo_user = \"${HIERADATA_REPO_USER}\"" \ ${CLUSTER_ID}.yml
-
Configure cloudscale.ch-specific Terraform variables
yq eval -i ".parameters.openshift4_terraform.terraform_variables.image_slug = \"custom:rhcos-4.11\"" \ ${CLUSTER_ID}.yml
You now have the option to further customize the cluster by editing Please look at the configuration reference for the available options. |
Commit changes and compile cluster catalog
-
Review changes. Have a look at the file
${CLUSTER_ID}.yml
. Override default parameters or add more component configurations as required for your cluster. -
Commit changes
git commit -a -m "Setup cluster ${CLUSTER_ID}" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.24" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.24" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.11" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.11" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Provision Infrastructure
-
Configure Terraform secrets
cat <<EOF > ./terraform.env CLOUDSCALE_API_TOKEN TF_VAR_ignition_bootstrap TF_VAR_lb_cloudscale_api_secret TF_VAR_control_vshn_net_token GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL HIERADATA_REPO_TOKEN EOF
-
Setup Terraform
Prepare Terraform execution environment# Set terraform image and tag to be used tf_image=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.image" \ dependencies/openshift4-terraform/class/defaults.yml) tf_tag=$(\ yq eval ".parameters.openshift4_terraform.images.terraform.tag" \ dependencies/openshift4-terraform/class/defaults.yml) # Generate the terraform alias base_dir=$(pwd) alias terraform='docker run -it --rm \ -e REAL_UID=$(id -u) \ --env-file ${base_dir}/terraform.env \ -w /tf \ -v $(pwd):/tf \ --ulimit memlock=-1 \ "${tf_image}:${tf_tag}" /tf/terraform.sh' export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|') export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/} export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id") export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster" pushd catalog/manifests/openshift4-terraform/
Initialize Terraformterraform init \ "-backend-config=address=${GITLAB_STATE_URL}" \ "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \ "-backend-config=username=${GITLAB_USER}" \ "-backend-config=password=${GITLAB_TOKEN}" \ "-backend-config=lock_method=POST" \ "-backend-config=unlock_method=DELETE" \ "-backend-config=retry_wait_min=5"
-
Create LB hieradata
cat > override.tf <<EOF module "cluster" { bootstrap_count = 0 master_count = 0 infra_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply -target "module.cluster.module.lb.module.hiera"
-
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and wait until the deploy pipeline after the merge is completed. -
Create LBs
terraform apply
-
Setup the DNS records shown in output variable
dns_entries
from the previous step in the cluster’s parent zone. If you use a custom apps domain, make the necessary changes to the DNS record for*.apps
. -
Make LB FQDNs available for later steps
Store LB FQDNs in environmentdeclare -a LB_FQDNS for id in 1 2; do LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.cloudscale_server.lb[$(expr $id - 1)]" | grep fqdn | awk '{print $2}' | tr -d ' "\r\n') done
Verify FQDNsfor lb in "${LB_FQDNS[@]}"; do echo $lb; done
-
Check LB connectivity
for lb in "${LB_FQDNS[@]}"; do ping -c1 "${lb}" done
-
Wait until LBs are fully initialized by Puppet
# Wait for Puppet provisioning to complete while true; do curl --connect-timeout 1 "http://api.${CLUSTER_DOMAIN}:6443" &>/dev/null if [ $? -eq 52 ]; then echo -e "\nHAproxy up" break else echo -n "." sleep 5 fi done # Update sshop config, see https://wiki.vshn.net/pages/viewpage.action?pageId=40108094 sshop_update # Check that you can access the LBs using your usual SSH config for lb in "${LB_FQDNS[@]}"; do ssh "${lb}" hostname -f done
While you’re waiting for the LBs to be provisioned, you can check the cloud-init logs with the following SSH commands
ssh ubuntu@"${LB_FQDNS[1]}" tail -f /var/log/cloud-init-output.log ssh ubuntu@"${LB_FQDNS[2]}" tail -f /var/log/cloud-init-output.log
-
Check the "Server created" tickets for the LBs and link them to the cluster setup ticket.
-
Deploy bootstrap node
cat > override.tf <<EOF module "cluster" { bootstrap_count = 1 master_count = 0 infra_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and run Puppet on the LBs after the deploy job has completedfor fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Wait for bootstrap API to come up
API_URL=$(yq e '.clusters[0].cluster.server' "${INSTALLER_DIR}/auth/kubeconfig") while ! curl --connect-timeout 1 "${API_URL}/healthz" -k &>/dev/null; do echo -n "." sleep 5 done && echo -e "\nAPI is up"
-
Deploy control plane nodes
cat > override.tf <<EOF module "cluster" { bootstrap_count = 1 infra_count = 0 worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Add the DNS records for etcd shown in output variable
dns_entries
from the previous step to the cluster’s parent zone -
Wait for bootstrap to complete
openshift-install --dir "${INSTALLER_DIR}" \ wait-for bootstrap-complete --log-level debug
-
Remove bootstrap node and provision infra nodes
cat > override.tf <<EOF module "cluster" { worker_count = 0 additional_worker_groups = {} } EOF terraform apply
-
Approve infra certs
export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
# Once CSRs in state Pending show up, approve them # Needs to be run twice, two CSRs for each node need to be approved kubectl get csr -w oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \ xargs oc adm certificate approve kubectl get nodes
-
Label infra nodes
kubectl get nodes -lnode-role.kubernetes.io/worker kubectl label node -lnode-role.kubernetes.io/worker \ node-role.kubernetes.io/infra=""
-
Enable proxy protocol on ingress controller
kubectl -n openshift-ingress-operator patch ingresscontroller default --type=json \ -p '[{ "op":"replace", "path":"/spec/endpointPublishingStrategy", "value": {"type": "HostNetwork", "hostNetwork": {"protocol": "PROXY"}} }]'
This step isn’t necessary if you’ve disabled the proxy protocol on the load-balancers manually during setup.
By default, PROXY protocol is enabled through the VSHN Commodore global defaults.
-
Review and merge the LB hieradata MR (listed in Terraform output
hieradata_mr
) and run Puppet on the LBs after the deploy job has completedfor fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Wait for installation to complete
openshift-install --dir ${INSTALLER_DIR} \ wait-for install-complete --log-level debug
-
Provision worker nodes
rm override.tf terraform apply popd
-
Approve worker certs
# Once CSRs in state Pending show up, approve them # Needs to be run twice, two CSRs for each node need to be approved kubectl get csr -w oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \ xargs oc adm certificate approve kubectl get nodes
-
Label worker nodes
kubectl label --overwrite node -lnode-role.kubernetes.io/worker \ node-role.kubernetes.io/app="" kubectl label node -lnode-role.kubernetes.io/infra \ node-role.kubernetes.io/app- # This should show the worker nodes only kubectl get nodes -l node-role.kubernetes.io/app
At this point you may want to add extra labels to the additional worker groups, if there are any. -
Create secret with S3 credentials for the registry
oc create secret generic image-registry-private-configuration-user \ --namespace openshift-image-registry \ --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=$(mc config host ls ${CLUSTER_ID} -json | jq -r .accessKey) \ --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=$(mc config host ls ${CLUSTER_ID} -json | jq -r .secretKey)
If the registry S3 credentials are created too long after the initial cluster setup, it’s possible that the
openshift-samples
operator has disabled itself because it couldn’t find a working in-cluster registry.If the samples operator is disabled, no templates and builder images will be available on the cluster.
You can check the samples-operator’s state with the following command:
kubectl get config.samples cluster -ojsonpath='{.spec.managementState}'
If the command returns
Removed
, verify that the in-cluster registry pods are now running, and enable the samples operator again:kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'
See the upstream documentation for more details on the samples operator.
Setup acme-dns CNAME records for the cluster
You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates. |
-
Extract the acme-dns subdomain for the cluster after
cert-manager
has been deployed via Project Syn.fulldomain=$(kubectl -n syn-cert-manager \ get secret acme-dns-client \ -o jsonpath='{.data.acmedns\.json}' | \ base64 -d | \ jq -r '[.[]][0].fulldomain') echo "$fulldomain"
-
Add the following CNAME records to the cluster’s DNS zone
The
_acme-challenge
records must be created in the same zone as the cluster’sapi
andapps
records respectively.$ORIGIN <cluster-zone> (2) _acme-challenge.api IN CNAME <fulldomain>. (1) $ORIGIN <apps-base-domain> (3) _acme-challenge.apps IN CNAME <fulldomain>. (1)
1 Replace <fulldomain>
with the output of the previous step.2 The _acme-challenge.api
record must be created in the same origin as theapi
record.3 The _acme-challenge.apps
record must be created in the same origin as theapps
record.
Store the cluster’s admin credentials in the password manager
-
Once the cluster’s production API certificate has been deployed, edit the cluster’s admin kubeconfig file to remove the initial API certificate CA.
You may see the error
Unable to connect to the server: x509: certificate signed by unknown authority
when executingkubectl
oroc
commands after the cluster’s production API certificate has been deployed by Project Syn.This error can be addressed by removing the initial CA certificate data from the admin kubeconfig as shown in this step.
yq e -i 'del(.clusters[0].cluster.certificate-authority-data)' \ "${INSTALLER_DIR}/auth/kubeconfig"
-
Save the admin credentials in the password manager. You can find the password in the file
target/auth/kubeadmin-password
and the kubeconfig intarget/auth/kubeconfig
ls -l ${INSTALLER_DIR}/auth/
Configure access for registry bucket
OpenShift does configure a PublicAccessBlockConfiguration. Ceph currently has a bug, where pushing objects into the S3 bucket are prevented. The error message in the docker-registry logs is `s3aws: AccessDenied: \n\tstatus code: 403, request id: tx00000000000003ea93fa6-00112504a0-4fa9e750e-rma1, host id: `. See tracker.ceph.com/issues/49135 for more information. |
-
Install the aws cli tool
pip install awscli
-
Check the current S3 bucket configuration after
openshift4-registry
has been deployed via Project Syn.export AWS_ACCESS_KEY_ID=$(mc config host ls ${CLUSTER_ID} -json | jq -r .accessKey) export AWS_SECRET_ACCESS_KEY=$(mc config host ls ${CLUSTER_ID} -json | jq -r .secretKey) export REGION=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region) aws --endpoint-url "https://objects.${REGION}.cloudscale.ch" s3api get-public-access-block --bucket "${CLUSTER_ID}-image-registry"
-
Configure BlockPublicAcls to
false
aws s3api put-public-access-block --endpoint-url "https://objects.${REGION}.cloudscale.ch" --bucket "${CLUSTER_ID}-image-registry" --public-access-block-configuration BlockPublicAcls=false
-
Verify the configuration BlockPublicAcls is
false
aws s3api get-public-access-block --endpoint-url "https://objects.${REGION}.cloudscale.ch" --bucket "${CLUSTER_ID}-image-registry"
The final configuration should look like this:
{ "PublicAccessBlockConfiguration": { "BlockPublicAcls": false, "IgnorePublicAcls": false, "BlockPublicPolicy": false, "RestrictPublicBuckets": false } }
Finalize installation
-
Configure the apt-dater groups for the LBs.
git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git pushd nodes_hieradata cat >"${LB_FQDNS[1]}.yaml" <<EOF --- s_apt_dater::host::group: '2200_20_night_main' EOF cat >"${LB_FQDNS[2]}.yaml" <<EOF --- s_apt_dater::host::group: '2200_40_night_second' EOF git add *.yaml git commit -m"Configure apt-dater groups for LBs for OCP4 cluster ${CLUSTER_ID}" git push origin master popd
This how-to defaults to the night maintenance window on Tuesday at 22:00. Adjust the apt-dater groups according to the documented groups (VSHN-internal only) if the cluster requires a different maintenance window.
-
Wait for deploy job on nodes hieradata to complete and run Puppet on the LBs to update the apt-dater groups.
for fqdn in "${LB_FQDNS[@]}"; do ssh "${fqdn}" sudo puppetctl run done
-
Delete local config files
rm -r ${INSTALLER_DIR}/
Post tasks
OR
-
Doing a first maintenance
-
Add the cluster to the maintenance template
-
Remove bootstrap bucket
mc rm -r --force "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition" mc rb "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition"