Installation on Exoscale

Steps to install an OpenShift 4 cluster on Exoscale.

These steps follow the Installing a cluster on bare metal docs to set up a user provisioned installation (UPI). Terraform is used to provision the cloud infrastructure.

This guide describes the installation with Cilium as the network provider. Cilium is an optional add-on and marked sections should be skipped for clusters which don’t require Cilium. Optional sections are marked with Cilium Optional.

This how-to guide is still a work in progress and will change. It’s currently very specific to VSHN and needs further changes to be more generic.

Starting situation

  • You already have a Tenant and its Git repository

  • You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager

  • You want to register a new cluster in Lieutenant and are about to install Openshift 4 on Exoscale

Prerequisites

Make sure the version of openshift-install and the rhcos image is the same, otherwise ignition will fail.

Cluster Installation

Register the new OpenShift 4 cluster in Lieutenant.

Lieutenant API endpoint

Use the following endpoint for Lieutenant:

Set up LDAP service

  1. Create an LDAP service

    Use control.vshn.net/vshn/services/_create to create a service. The name must contain the customer and the cluster name. And then put the LDAP service ID in the following variable:

    export LDAP_ID="Your_LDAP_ID_here"
    export LDAP_PASSWORD="Your_LDAP_pw_here"

Configure input

Access to cloud API
export EXOSCALE_API_KEY=<exoscale-key>
export EXOSCALE_API_SECRET=<exoscale-secret>
export EXOSCALE_ZONE=<exoscale-zone> (1)
export EXOSCALE_S3_ENDPOINT="sos-${EXOSCALE_ZONE}.exo.io"
1 All lower case. For example ch-dk-2.
Access to various API
# From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>

# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
Configuration for hieradata commits
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token>
OpenShift configuration
export BASE_DOMAIN=<your-base-domain>
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.

For BASE_DOMAIN explanation, see DNS Scheme.

Create restricted Exoscale IAM keys for the LBs and object storage

  1. Create restricted API key for Exoscale object storage

    exoscale_s3_credentials=$(exo iam access-key create "${CLUSTER_ID}_object_storage" \
      --tag sos -O json)
    export EXOSCALE_S3_ACCESSKEY=$(echo "${exoscale_s3_credentials}" | jq -r '.api_key')
    export EXOSCALE_S3_SECRETKEY=$(echo "${exoscale_s3_credentials}" | jq -r '.api_secret')

Set up S3 bucket for cluster bootstrap

  1. Create S3 bucket

    exo storage create "sos://${CLUSTER_ID}-bootstrap" --zone "${EXOSCALE_ZONE}"

Upload Red Hat CoreOS image

  1. Fetch and convert the latest Red Hat CoreOS image

    RHCOS_VERSION="4.11.9"
    
    curl "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.11/${RHCOS_VERSION}/rhcos-${RHCOS_VERSION}-x86_64-openstack.x86_64.qcow2.gz" | gunzip > rhcos-${RHCOS_VERSION}.qcow2
    
    virt-edit -a rhcos-${RHCOS_VERSION}.qcow2 \
      -m /dev/sda3:/ /loader/entries/ostree-1-rhcos.conf \
      -e 's/openstack/exoscale/'
    
    exo storage upload rhcos-${RHCOS_VERSION}.qcow2 "sos://${CLUSTER_ID}-bootstrap" --acl public-read
    
    exo compute instance-template register "rhcos-${RHCOS_VERSION}" \
      "https://${EXOSCALE_S3_ENDPOINT}/${CLUSTER_ID}-bootstrap/rhcos-${RHCOS_VERSION}.qcow2" \
      "$(md5sum rhcos-${RHCOS_VERSION}.qcow2 | awk '{ print $1 }')" \
      --zone "${EXOSCALE_ZONE}" \
      --boot-mode uefi \
      --disable-password \
      --username core \
      --description "Red Hat Enterprise Linux CoreOS (RHCOS) ${RHCOS_VERSION}"
    
    exo storage delete "sos://${CLUSTER_ID}-bootstrap/rhcos-${RHCOS_VERSION}.qcow2"
    
    export RHCOS_TEMPLATE="rhcos-${RHCOS_VERSION}"

Set secrets in Vault

Connect with Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
Store various secrets in Vault
# Set the Exoscale object storage API key
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/storage_iam \
  s3_access_key=${EXOSCALE_S3_ACCESSKEY} \
  s3_secret_key=${EXOSCALE_S3_SECRETKEY}

# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
  httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)

# Set the LDAP password
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vshn-ldap \
  bindPassword=${LDAP_PASSWORD}

# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Copy the Dagobert OpenShift Node Collector Credentials
vault kv get -format=json "clusters/kv/template/dagobert" | jq '.data.data' \
  | vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/dagobert" -

# Copy the VSHN acme-dns registration password
vault kv get -format=json "clusters/kv/template/cert-manager" | jq '.data.data' \
  | vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cert-manager" -
Grab the LB hieradata repo token from Vault
export HIERADATA_REPO_SECRET=$(vault kv get \
  -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')

Prepare Cluster Repository

For the following steps, change into a clean directory (for example a directory in your home).

Check Running Commodore for details on how to run commodore.

  1. Prepare Commodore inventory.

    mkdir -p inventory/classes/
    git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}

Cilium Optional: Prepare Cilium Configuration

Details
  1. Add Cilium to cluster configuration

    pushd "inventory/classes/${TENANT_ID}/"
    
    yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml
    yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml
    yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml
    
    git commit -a -m "Add Cilium addon to ${CLUSTER_ID}"
    git push
    popd
  2. Compile catalog

    commodore catalog compile ${CLUSTER_ID} --push -i \
      --dynamic-fact kubernetesVersion.major=$(echo "1.24" | awk -F. '{print $1}') \
      --dynamic-fact kubernetesVersion.minor=$(echo "1.24" | awk -F. '{print $2}') \
      --dynamic-fact openshiftVersion.Major=$(echo "4.11" | awk -F. '{print $1}') \
      --dynamic-fact openshiftVersion.Minor=$(echo "4.11" | awk -F. '{print $2}')
    This commodore call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.

Configure the OpenShift Installer

Starting with this section, we recommend that you change into a clean directory (for example a directory in your home).

  1. Generate SSH key

    We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.

    SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID"
    export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub"
    
    ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N ''
    
    BASE64_NO_WRAP='base64'
    if [[ "$OSTYPE" == "linux"* ]]; then
      BASE64_NO_WRAP='base64 --wrap 0'
    fi
    
    vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/exoscale/ssh \
      private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP")
    
    ssh-add $SSH_PRIVATE_KEY
  2. Prepare install-config.yaml

    You can add more options to the install-config.yaml file. Have a look at the config example for more information.

    For example, you could change the SDN from a default value to something a customer requests due to some network requirements.

    export INSTALLER_DIR="$(pwd)/target"
    mkdir -p "${INSTALLER_DIR}"
    
    cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF
    apiVersion: v1
    metadata:
      name: ${CLUSTER_ID}
    baseDomain: ${BASE_DOMAIN}
    platform:
      none: {}
    networking:
      networkType: OVNKubernetes
    pullSecret: |
      ${PULL_SECRET}
    sshKey: "$(cat $SSH_PUBLIC_KEY)"
    capabilities:
      baselineCapabilitySet: v4.11
    EOF
  3. Cilium Optional: Add Cilium

    Details
    yq eval -i '.networking.networkType = "Cilium"' "${INSTALLER_DIR}/install-config.yaml"
    If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with less catalog/manifests/cilium/olm/*ciliumconfig.yaml.

Run OpenShift Installer

The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps.
  1. Render install manifests (this will consume the install-config.yaml)

    openshift-install --dir "${INSTALLER_DIR}" \
      create manifests
    1. If you want to change the default "apps" domain for the cluster:

      yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \
        spec.domain apps.example.com
  2. Cilium Optional: Copy pre-rendered Cilium manifests

    Details
    cp catalog/manifests/cilium/olm/* target/manifests/
  3. Extract the cluster domain from the generated manifests

    export CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \
      "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml")
  4. Prepare install manifests and ignition config

    openshift-install --dir "${INSTALLER_DIR}" \
      create ignition-configs
  5. Upload ignition config

    exo storage upload "${INSTALLER_DIR}/bootstrap.ign" "sos://${CLUSTER_ID}-bootstrap" --acl public-read

Terraform Cluster Config

  1. Set team responsible for handling Icinga alerts

    # use lower case for team name.
    # e.g. TEAM=tarazed
    TEAM=<team-name>
  2. Prepare Terraform cluster config

    CA_CERT=$(jq -r '.ignition.security.tls.certificateAuthorities[0].source' \
      "${INSTALLER_DIR}/master.ign" | \
      awk -F ',' '{ print $2 }' | \
      base64 --decode)
    
    pushd "inventory/classes/${TENANT_ID}/"
    
    yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.base_domain = \"${BASE_DOMAIN}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"${CA_CERT}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.hieradata_repo_user = \"${HIERADATA_REPO_USER}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.team = \"${TEAM}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.vshnLdap.serviceId = \"${LDAP_ID}\"" \
      ${CLUSTER_ID}.yml

    If you use a custom "apps" domain, make sure to set parameters.openshift.appsDomain accordingly.

    APPS_DOMAIN=your.custom.apps.domain
    yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \
      ${CLUSTER_ID}.yml

    By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set parameters.openshift4_version.spec.channel accordingly.

    # Configure the OpenShift update channel as `fast`
    yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \
      ${CLUSTER_ID}.yml
  3. Configure Exoscale-specific Terraform variables

    yq eval -i ".parameters.openshift4_terraform.terraform_variables.rhcos_template = \"${RHCOS_TEMPLATE}\"" \
      ${CLUSTER_ID}.yml

You now have the option to further customize the cluster by editing terraform_variables. Most importantly you have the option to change node sizes or add additional specialized worker nodes.

Please look at the configuration reference for the available options.

Commit changes and compile cluster catalog

  1. Review changes. Have a look at the file ${CLUSTER_ID}.yml. Override default parameters or add more component configurations as required for your cluster.

    Ensure that you’re using component openshift4-terraform v5.0.0 or newer. Otherwise the instructions in this how-to might not apply.

  2. Commit changes

    git commit -a -m "Setup cluster ${CLUSTER_ID}"
    git push
    
    popd
  3. Compile and push cluster catalog

    commodore catalog compile ${CLUSTER_ID} --push -i \
      --dynamic-fact kubernetesVersion.major=$(echo "1.24" | awk -F. '{print $1}') \
      --dynamic-fact kubernetesVersion.minor=$(echo "1.24" | awk -F. '{print $2}') \
      --dynamic-fact openshiftVersion.Major=$(echo "4.11" | awk -F. '{print $1}') \
      --dynamic-fact openshiftVersion.Minor=$(echo "4.11" | awk -F. '{print $2}')
    This commodore call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.

Provision Infrastructure

  1. Configure Terraform secrets

    cat <<EOF > ./terraform.env
    EXOSCALE_API_KEY
    EXOSCALE_API_SECRET
    TF_VAR_control_vshn_net_token
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    HIERADATA_REPO_TOKEN
    EOF
  2. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='docker run -it --rm \
      -e REAL_UID=$(id -u) \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"
  3. Provision Domain and security groups

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count          = 0
      lb_count                 = 0
      master_count             = 0
      infra_count              = 0
      storage_count            = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply
  4. Set Exoscale Elastic IP reverse DNS

    terraform state pull > state.json
    
    ingress_id=$(jq -r '.resources[] | select(.type == "exoscale_elastic_ip" and .name == "ingress") | .instances[0].attributes.id' state.json)
    api_id=$(jq -r '.resources[] | select(.type == "exoscale_elastic_ip" and .name == "api") | .instances[0].attributes.id' state.json)
    # Use auto-generated Exoscale API V2 OpenAPI command to set EIP reverse DNS
    echo "{\"domain-name\": \"ingress.${CLUSTER_DOMAIN}.\"}" | exo x --zone "${EXOSCALE_ZONE}" update-reverse-dns-elastic-ip "$ingress_id"
    echo "{\"domain-name\": \"api.${CLUSTER_DOMAIN}.\"}" | exo x --zone "${EXOSCALE_ZONE}" update-reverse-dns-elastic-ip "$api_id"
    
    rm state.json
  5. Create LB hieradata

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count          = 0
      master_count             = 0
      infra_count              = 0
      storage_count            = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply -target "module.cluster.module.lb.module.hiera"
  6. Set up DNS NS records on parent zone using the data from the Terraform output variable ns_records from the previous step

  7. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and wait until the deploy pipeline after the merge is completed.

  8. Create LBs

    terraform apply
  9. Make LB FQDNs available for later steps

    Store LB FQDNs in environment
    declare -a LB_FQDNS
    for id in 1 2; do
      LB_FQDNS[$id]=$(terraform state show "module.cluster.module.lb.exoscale_domain_record.lb[$(expr $id - 1)]" | grep hostname | cut -d'=' -f2 | tr -d ' "\r\n')
    done
    Verify FQDNs
    for lb in "${LB_FQDNS[@]}"; do echo $lb; done
  10. Check LB connectivity

    for lb in "${LB_FQDNS[@]}"; do
      ping -c1 "${lb}"
    done
  11. Wait until LBs are fully initialized by Puppet

    # Wait for Puppet provisioning to complete
    while true; do
      curl --connect-timeout 1 "http://api.${CLUSTER_DOMAIN}:6443" &>/dev/null
      if [ $? -eq 52 ]; then
        echo -e "\nHAproxy up"
        break
      else
        echo -n "."
        sleep 5
      fi
    done
    # Update sshop config, see https://wiki.vshn.net/pages/viewpage.action?pageId=40108094
    sshop_update
    # Check that you can access the LBs using your usual SSH config
    for lb in "${LB_FQDNS[@]}"; do
      ssh "${lb}" hostname -f
    done

    While you’re waiting for the LBs to be provisioned, you can check the cloud-init logs with the following SSH commands

    ssh ubuntu@"${LB_FQDNS[1]}" tail -f /var/log/cloud-init-output.log
    ssh ubuntu@"${LB_FQDNS[2]}" tail -f /var/log/cloud-init-output.log
  12. Check the "Server created" tickets for the LBs and link them to the cluster setup ticket.

  13. Deploy bootstrap node

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count          = 1
      master_count             = 0
      infra_count              = 0
      storage_count            = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply
  14. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and run Puppet on the LBs after the deploy job has completed

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  15. Wait for bootstrap API to come up

    API_URL=$(yq e '.clusters[0].cluster.server' "${INSTALLER_DIR}/auth/kubeconfig")
    while ! curl --connect-timeout 1 "${API_URL}/healthz" -k &>/dev/null; do
      echo -n "."
      sleep 5
    done && echo -e "\nAPI is up"
  16. Deploy control plane nodes

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count          = 1
      infra_count              = 0
      storage_count            = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply
  17. Wait for bootstrap to complete

    openshift-install --dir "${INSTALLER_DIR}" \
      wait-for bootstrap-complete --log-level debug
  18. Remove bootstrap node and provision infra nodes

    cat > override.tf <<EOF
    module "cluster" {
      storage_count            = 0
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply
  19. Approve infra certs

    export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
    # Once CSRs in state Pending show up, approve them
    # Needs to be run twice, two CSRs for each node need to be approved
    
    kubectl  get csr -w
    
    oc  get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc  adm certificate approve
    
    kubectl  get nodes
  20. Label infra nodes

    kubectl get nodes -lnode-role.kubernetes.io/worker
    kubectl label node -lnode-role.kubernetes.io/worker \
      node-role.kubernetes.io/infra=""
  21. Enable proxy protocol on ingress controller

    kubectl -n openshift-ingress-operator patch ingresscontroller default --type=json \
      -p '[{
        "op":"replace",
        "path":"/spec/endpointPublishingStrategy",
        "value": {"type": "HostNetwork", "hostNetwork": {"protocol": "PROXY"}}
      }]'

    This step isn’t necessary if you’ve disabled the proxy protocol on the load-balancers manually during setup.

    By default, PROXY protocol is enabled through the VSHN Commodore global defaults.

  22. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and run Puppet on the LBs after the deploy job has completed

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  23. Wait for installation to complete

    openshift-install --dir ${INSTALLER_DIR} \
      wait-for install-complete --log-level debug
  24. Provision storage nodes

    cat > override.tf <<EOF
    module "cluster" {
      worker_count             = 0
      additional_worker_groups = {}
    }
    EOF
    terraform apply
  25. Approve storage certs

    # Once CSRs in state Pending show up, approve them
    # Needs to be run twice, two CSRs for each node need to be approved
    
    kubectl  get csr -w
    
    oc  get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc  adm certificate approve
    
    kubectl  get nodes
  26. Label and taint storage nodes

    kubectl  label --overwrite node -lnode-role.kubernetes.io/worker \
      node-role.kubernetes.io/storage=""
    kubectl  label node -lnode-role.kubernetes.io/infra \
      node-role.kubernetes.io/storage-
    
    kubectl  taint node -lnode-role.kubernetes.io/storage \
      storagenode=True:NoSchedule
  27. Provision worker nodes

    rm override.tf
    terraform apply
    
    popd
  28. Approve worker certs

    # Once CSRs in state Pending show up, approve them
    # Needs to be run twice, two CSRs for each node need to be approved
    
    kubectl  get csr -w
    
    oc  get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc  adm certificate approve
    
    kubectl  get nodes
  29. Label worker nodes

    kubectl label --overwrite node -lnode-role.kubernetes.io/worker \
      node-role.kubernetes.io/app=""
    kubectl label node -lnode-role.kubernetes.io/infra \
      node-role.kubernetes.io/app-
    kubectl label node -lnode-role.kubernetes.io/storage \
      node-role.kubernetes.io/app-
    
    # This should show the worker nodes only
    kubectl get nodes -l node-role.kubernetes.io/app
    At this point you may want to add extra labels to the additional worker groups, if there are any.
  30. Create secret with S3 credentials for the registry

    oc create secret generic image-registry-private-configuration-user \
    --namespace openshift-image-registry \
    --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=${EXOSCALE_S3_ACCESSKEY} \
    --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=${EXOSCALE_S3_SECRETKEY}

Setup acme-dns CNAME records for the cluster

You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates.
  1. Extract the acme-dns subdomain for the cluster after cert-manager has been deployed via Project Syn.

    fulldomain=$(kubectl -n syn-cert-manager \
      get secret acme-dns-client \
      -o jsonpath='{.data.acmedns\.json}' | \
      base64 -d  | \
      jq -r '[.[]][0].fulldomain')
    echo "$fulldomain"
  2. Setup the _acme-challenge CNAME records in the cluster’s DNS zone

    The _acme-challenge records must be created in the same zone as the cluster’s api and apps records respectively. The snippet below assumes that the cluster is configured to use the default "apps" domain in the cluster’s zone.

    for cname in "api" "apps"; do
      exo dns add CNAME "${CLUSTER_DOMAIN}" -n "_acme-challenge.${cname}" -a "${fulldomain}." -t 600
    done

Store the cluster’s admin credentials in the password manager

  1. Once the cluster’s production API certificate has been deployed, edit the cluster’s admin kubeconfig file to remove the initial API certificate CA.

    You may see the error Unable to connect to the server: x509: certificate signed by unknown authority when executing kubectl or oc commands after the cluster’s production API certificate has been deployed by Project Syn.

    This error can be addressed by removing the initial CA certificate data from the admin kubeconfig as shown in this step.

    yq e -i 'del(.clusters[0].cluster.certificate-authority-data)' \
      "${INSTALLER_DIR}/auth/kubeconfig"
  2. Save the admin credentials in the password manager. You can find the password in the file target/auth/kubeadmin-password and the kubeconfig in target/auth/kubeconfig

    ls -l ${INSTALLER_DIR}/auth/

Finalize installation

  1. Configure the apt-dater groups for the LBs.

    git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git
    pushd nodes_hieradata
    cat >"${LB_FQDNS[1]}.yaml" <<EOF
    ---
    s_apt_dater::host::group: '2200_20_night_main'
    EOF
    cat >"${LB_FQDNS[2]}.yaml" <<EOF
    ---
    s_apt_dater::host::group: '2200_40_night_second'
    EOF
    git add *.yaml
    git commit -m"Configure apt-dater groups for LBs for OCP4 cluster ${CLUSTER_ID}"
    git push origin master
    popd

    This how-to defaults to the night maintenance window on Tuesday at 22:00. Adjust the apt-dater groups according to the documented groups (VSHN-internal only) if the cluster requires a different maintenance window.

  2. Wait for deploy job on nodes hieradata to complete and run Puppet on the LBs to update the apt-dater groups.

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  3. Delete local config files

    rm -r ${INSTALLER_DIR}/

Post tasks

  1. Doing a first maintenance

  2. Add the cluster to the maintenance template