Installation on cloudscale.ch

Steps to install an OpenShift 4 cluster on cloudscale.ch.

These steps follow the Installing a cluster on bare metal docs to set up a user provisioned installation (UPI). Terraform is used to provision the cloud infrastructure.

The commands are idempotent and can be retried if any of the steps fail.

The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h.

This how-to guide is still a work in progress and will change. It’s currently very specific to VSHN and needs further changes to be more generic.

Starting situation

  • You already have a Tenant and its git repository

  • You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager

  • You want to register a new cluster in Lieutenant and are about to install Openshift 4 on Cloudscale

Prerequisites

Make sure the version of openshift-install and the rhcos image is the same, otherwise ignition will fail.

Cluster Installation

Register the new OpenShift 4 cluster in Lieutenant.

Lieutenant API endpoint

Use the following endpoint for Lieutenant:

Set up LDAP service

  1. Create an LDAP service

    Use control.vshn.net/vshn/services/_create to create a service. The name must contain the customer and the cluster name. And then put the LDAP service ID in the following variable:

    export LDAP_ID="Your_LDAP_ID_here"
    export LDAP_PASSWORD="Your_LDAP_pw_here"

Configure input

Access to cloud API
# https://control.cloudscale.ch/service/<your-project>/api-token
export CLOUDSCALE_TOKEN=<cloudscale-api-token>
export TF_VAR_lb_cloudscale_api_secret=<cloudscale-api-token-for-Floaty>
Access to various API
# From https://git.vshn.net/profile/personal_access_tokens, "api" scope is sufficient
export GITLAB_TOKEN=<gitlab-api-token>
export GITLAB_USER=<gitlab-user-name>

# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
export COMMODORE_API_TOKEN=<lieutenant-api-token>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer ${COMMODORE_API_TOKEN}" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
Configuration for hieradata commits
export GIT_AUTHOR_NAME=$(git config --global user.name)
export GIT_AUTHOR_EMAIL=$(git config --global user.email)
export TF_VAR_control_vshn_net_token=<control-vshn-net-token>
OpenShift configuration
export BASE_DOMAIN=<your-base-domain>
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.

For BASE_DOMAIN explanation, see DNS Scheme.

Set up S3 bucket for cluster bootstrap

  1. Create S3 bucket

    1. If a bucket user already exists for this cluster:

      # Use already existing bucket user
      response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_TOKEN}" \
        https://api.cloudscale.ch/v1/objects-users | \
        jq -e ".[] | select(.display_name == \"${CLUSTER_ID}\")")
    2. To create a new bucket user:

      # Create a new user
      response=$(curl -sH "Authorization: Bearer ${CLOUDSCALE_TOKEN}" \
        -F display_name=${CLUSTER_ID} \
        https://api.cloudscale.ch/v1/objects-users)
  2. Configure the Minio client

    export REGION=$(curl -sH "Authorization: Bearer ${COMMODORE_API_TOKEN}" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .facts.region)
    mc config host add \
      "${CLUSTER_ID}" "https://objects.${REGION}.cloudscale.ch" \
      $(echo $response | jq -r '.keys[0].access_key') \
      $(echo $response | jq -r '.keys[0].secret_key')
    
    mc mb --ignore-existing \
      "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition"

Upload Red Hat CoreOS image

  1. Export the Authorization header for the Cloudscale API.

    export AUTH_HEADER="Authorization: Bearer ${CLOUDSCALE_TOKEN}"

    The variable CLOUDSCALE_TOKEN could be used directly. Exporting the variable AUTH_HEADER is done to be compatible with the Cloudscale API documentation.

  2. Check if image already exists in the correct zone

    curl -sH "$AUTH_HEADER" https://api.cloudscale.ch/v1/custom-images | jq -r '.[] | select(.slug == "rhcos-4.7") | .zones[].slug'

    If a URL is printed to the output, you can skip the next steps and directly jump to the next section.

  3. Fetch and convert the latest Red Hat CoreOS image

    curl -sL https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.0/rhcos-4.7.0-x86_64-openstack.x86_64.qcow2.gz | gzip -d > rhcos-4.7.gcow
    qemu-img convert rhcos-4.7.gcow rhcos-4.7.raw
  4. Upload the image to S3 and make it public

    mc cp rhcos-4.7.raw "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/"
    mc policy set download "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.7.raw"

    The output of the above looks like an error. But when checking with the following, the result is as expected.

    mc policy get "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.7.raw"

    The output should be Access permission for `[…]-bootstrap-ignition/rhcos-4.7.raw is download`

  5. Import the image to Cloudscale

    curl -i -H "$AUTH_HEADER" \
      -F url="$(mc share download --json "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/rhcos-4.7.raw" | jq -r .url)" \
      -F name='RHCOS 4.7' \
      -F zones=rma1 \
      -F slug=rhcos-4.7 \
      -F source_format=raw \
      -F user_data_handling=pass-through \
      https://api.cloudscale.ch/v1/custom-images/import

Set secrets in Vault

Connect with Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=ldap username=<your.name>
Store various secrets in Vault
# Set the cloudscale.ch access secrets
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale \
  token=${CLOUDSCALE_TOKEN} \
  s3_access_key=$(mc config host ls ${CLUSTER_ID} -json | jq -r .accessKey) \
  s3_secret_key=$(mc config host ls ${CLUSTER_ID} -json | jq -r .secretKey)

# Put LB API key in Vault
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty \
  iam_secret=${TF_VAR_lb_cloudscale_api_secret}

# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
  httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)

# Set the LDAP password
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vshn-ldap \
  bindPassword=${LDAP_PASSWORD}

# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Copy the Dagobert OpenShift Node Collector Credentials
vault kv get -format=json "clusters/kv/template/dagobert" | jq '.data.data' \
  | vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/dagobert" -
Grab the LB hieradata repo token from Vault
export HIERADATA_REPO_SECRET=$(vault kv get \
  -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data')
export HIERADATA_REPO_USER=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.user')
export HIERADATA_REPO_TOKEN=$(echo "${HIERADATA_REPO_SECRET}" | jq -r '.token')

OpenShift Installer Setup

For the following steps, change into a clean directory (for example a directory in your home).

These are the only steps which aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps.

You can add more options to the install-config.yaml file. Have a look at the config example for more information.

For example, you could change the SDN from a default value to something a customer requests due to some network requirements.

  1. Prepare SSH key

    We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.

    SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID"
    export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub"
    
    ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N ''
    
    vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cloudscale/ssh \
      private_key=$(cat $SSH_PRIVATE_KEY | base64 --wrap 0)
    
    ssh-add $SSH_PRIVATE_KEY
  2. Prepare install-config.yaml

    export INSTALLER_DIR="$(pwd)/target"
    mkdir -p "${INSTALLER_DIR}"
    
    cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF
    apiVersion: v1
    metadata:
      name: ${CLUSTER_ID}
    baseDomain: ${BASE_DOMAIN}
    platform:
      none: {}
    pullSecret: |
      ${PULL_SECRET}
    sshKey: "$(cat $SSH_PUBLIC_KEY)"
    EOF
  3. Render install manifests (this will consume the install-config.yaml)

    openshift-install --dir "${INSTALLER_DIR}" \
      create manifests
    1. If you want to change the default "apps" domain for the cluster:

      yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \
        spec.domain apps.example.com
  4. Prepare install manifests and ignition config

    openshift-install --dir "${INSTALLER_DIR}" \
      create ignition-configs
  5. Upload ignition config

    mc cp "${INSTALLER_DIR}/bootstrap.ign" "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/"
    
    export TF_VAR_ignition_bootstrap=$(mc share download \
      --json --expire=4h \
      "${CLUSTER_ID}/${CLUSTER_ID}-bootstrap-ignition/bootstrap.ign" | jq -r '.share')

Terraform Cluster Config

Check Running Commodore for details on how to run commodore.

  1. Prepare Commodore inventory.

    mkdir -p inventory/classes/
    git clone $(curl -sH"Authorization: Bearer ${COMMODORE_API_TOKEN}" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
  2. Prepare Terraform cluster config

    CA_CERT=$(jq -r '.ignition.security.tls.certificateAuthorities[0].source' \
      "${INSTALLER_DIR}/master.ign" | \
      awk -F ',' '{ print $2 }' | \
      base64 --decode)
    
    pushd "inventory/classes/${TENANT_ID}/"
    
    yq eval -i '.applications += ["openshift4-terraform"]' ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.appsDomain = \"apps.${CLUSTER_ID}.${BASE_DOMAIN}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.base_domain = \"${BASE_DOMAIN}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.ignition_ca = \"${CA_CERT}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift4_terraform.terraform_variables.hieradata_repo_user = \"${HIERADATA_REPO_USER}\"" \
      ${CLUSTER_ID}.yml
    
    # Configure default ingress controller with 3 replicas, so that the
    # VSHN-managed LB HAproxy health check isn't complaining about a missing backend
    yq eval -i ".parameters.openshift4_ingress.ingressControllers.default.replicas = 3" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.vshnLdap.serviceId = \"${LDAP_ID}\"" \
      ${CLUSTER_ID}.yml
    
    # Configure Git author information for the CI pipeline
    yq eval -i ".parameters.openshift4_terraform.gitlab_ci.git.username = \"GitLab CI\"" \
      ${CLUSTER_ID}.yml
    yq eval -i ".parameters.openshift4_terraform.gitlab_ci.git.email = \"tech+${CLUSTER_ID}@vshn.ch\"" \
      ${CLUSTER_ID}.yml
  3. Review and commit

    # Have a look at the file ${CLUSTER_ID}.yml.
    # Override any default parameters or add more component configuration.
    
    git commit -a -m "Setup cluster ${CLUSTER_ID}"
    git push
    
    popd
  4. Compile and push Terraform setup

    commodore catalog compile ${CLUSTER_ID} --push -i

Provision Infrastructure

  1. Configure Terraform secrets

    cat <<EOF > ./terraform.env
    CLOUDSCALE_TOKEN
    TF_VAR_ignition_bootstrap
    TF_VAR_lb_cloudscale_api_secret
    TF_VAR_control_vshn_net_token
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    HIERADATA_REPO_TOKEN
    EOF
  2. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='docker run -it --rm \
      -e REAL_UID=$(id -u) \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer ${COMMODORE_API_TOKEN}" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/
    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"
  3. Create LB hieradata

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count = 0
      master_count    = 0
      infra_count     = 0
      worker_count    = 0
    }
    EOF
    terraform apply -target "module.cluster.local_file.lb_hieradata[0]"
  4. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and wait until the deploy pipeline after the merge is completed.

  5. Create LBs

    terraform apply
  6. Ensure that all the DNS records shown in output variable dns_entries from the previous step are configured in the cluster’s parent zone.

  7. Make LB FQDNs available for later steps

    Store LB FQDNs in environment
    declare -a LB_FQDNS
    for id in 1 2; do
      LB_FQDNS[$id]=$(terraform state show "module.cluster.cloudscale_server.lb[$(expr $id - 1)]" | grep fqdn | awk '{print $2}' | sed -e 's/"//g')
    done
    Verify FQDNs
    for lb in "${LB_FQDNS[@]}"; do echo $lb; done
  8. Check LB connectivity

    for lb in "${LB_FQDNS[@]}"; do
      ping -c1 "${lb}"
    done
  9. Wait until LBs are fully initialized by Puppet

    # Wait for Puppet provisioning to complete
    while true; do
      curl --connect-timeout 1 "http://api.${CLUSTER_ID}.${BASE_DOMAIN}:6443"
      if [ $? -eq 52 ]; then
        echo -e "\nHAproxy up"
        break
      else
        echo -n "."
        sleep 5
      fi
    done
    # Update sshop config, see https://wiki.vshn.net/pages/viewpage.action?pageId=40108094
    sshop_update
    # Check that you can access the LBs using your usual SSH config
    for lb in "${LB_FQDNS[@]}"; do
      ssh "${lb}" hostname -f
    done

    While you’re waiting for the LBs to be provisioned, you can check the cloud-init logs with the following SSH commands

    ssh ubuntu@"${LB_FQDNS[1]}" tail -f /var/log/cloud-init-output.log
    ssh ubuntu@"${LB_FQDNS[2]}" tail -f /var/log/cloud-init-output.log
  10. Check the "Server created" tickets for the LBs and link them to the cluster setup ticket.

  11. Deploy bootstrap node

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count = 1
      master_count    = 0
      infra_count     = 0
      worker_count    = 0
    }
    EOF
    terraform apply
  12. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and run Puppet on the LBs after the deploy job has completed

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  13. Wait for bootstrap API to come up

    API_URL=$(yq e '.clusters[0].cluster.server' "${INSTALLER_DIR}/auth/kubeconfig")
    while ! curl --connect-timeout 1 "${API_URL}/healthz" -k &>/dev/null; do
      echo -n "."
      sleep 5
    done && echo -e "\nAPI is up"
  14. Deploy control plane nodes

    cat > override.tf <<EOF
    module "cluster" {
      bootstrap_count = 1
      infra_count     = 0
      worker_count    = 0
    }
    EOF
    terraform apply
  15. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and run Puppet on the LBs after the deploy job has completed

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  16. Wait for bootstrap to complete

    openshift-install --dir "${INSTALLER_DIR}" \
      wait-for bootstrap-complete
  17. Remove bootstrap node and provision infra nodes

    cat > override.tf <<EOF
    module "cluster" {
      worker_count  = 0
    }
    EOF
    terraform apply
  18. Approve infra certs

    export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
    # Once CSRs in state Pending show up, approve them
    # Needs to be run twice, two CSRs for each node need to be approved
    kubectl get csr -w
    oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc adm certificate approve
    
    kubectl get nodes
  19. Label infra nodes

    kubectl get nodes -lnode-role.kubernetes.io/worker
    kubectl label node -lnode-role.kubernetes.io/worker \
      node-role.kubernetes.io/infra=""
  20. Review and merge the LB hieradata MR (listed in Terraform output hieradata_mr) and run Puppet on the LBs after the deploy job has completed

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  21. Wait for installation to complete

    openshift-install --dir ${INSTALLER_DIR} \
      wait-for install-complete
  22. Provision worker nodes

    rm override.tf
    terraform apply
  23. Approve worker certs

    # Once CSRs in state Pending show up, approve them
    # Needs to be run twice, two CSRs for each node need to be approved
    kubectl get csr -w
    oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
      xargs oc adm certificate approve
    
    kubectl get nodes
  24. Label worker nodes

    kubectl label --overwrite node -lnode-role.kubernetes.io/worker \
      node-role.kubernetes.io/app=""
    kubectl label node -lnode-role.kubernetes.io/infra \
      node-role.kubernetes.io/app-
    
    # This should show the worker nodes only
    kubectl get nodes -l node-role.kubernetes.io/app
  25. Create secret with S3 credentials for the registry

    oc create secret generic image-registry-private-configuration-user \
    --namespace openshift-image-registry \
    --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=$(mc config host ls ${CLUSTER_ID} -json | jq -r .accessKey) \
    --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=$(mc config host ls ${CLUSTER_ID} -json | jq -r .secretKey)
  26. Create wildcard cert for router

    kubectl get secret router-certs-default \
      -n openshift-ingress \
      -o json | \
        jq 'del(.metadata.ownerReferences) | .metadata.name = "router-certs-snakeoil"' | \
      kubectl -n openshift-ingress apply -f -
  27. Save the admin credentials in the password manager. You can find the password in the file target/auth/kubeadmin-password and the kubeconfig in target/auth/kubeconfig

    popd
    ls -l ${INSTALLER_DIR}/auth/

Finalize installation

  1. Configure the apt-dater groups for the LBs.

    git clone git@git.vshn.net:vshn-puppet/nodes_hieradata.git
    pushd nodes_hieradata
    cat >"${LB_FQDNS[1]}.yaml" <<EOF
    ---
    s_apt_dater::host::group: '2200_20_night_main'
    EOF
    cat >"${LB_FQDNS[2]}.yaml" <<EOF
    ---
    s_apt_dater::host::group: '2200_40_night_second'
    EOF
    git add *.yaml
    git commit -m"Configure apt-dater groups for LBs for OCP4 cluster ${CLUSTER_ID}"
    git push origin master
    popd

    This how-to defaults to the night maintenance window on Tuesday at 22:00. Adjust the apt-dater groups according to the documented groups (VSHN-internal only) if the cluster requires a different maintenance window.

  2. Wait for deploy job on nodes hieradata to complete and run Puppet on the LBs to update the apt-dater groups.

    for fqdn in "${LB_FQDNS[@]}"; do
      ssh "${fqdn}" sudo puppetctl run
    done
  3. Delete local config files

    rm -r ${INSTALLER_DIR}/