Install OpenShift 4 on OpenStack

Steps to install an OpenShift 4 cluster on Red Hat OpenStack.

These steps follow the Installing a cluster on OpenStack docs to set up an installer provisioned installation (IPI).

This how-to guide is an early draft. So far, we’ve setup only one cluster using the instructions in this guide.

The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h.

Starting situation

  • You already have a Project Syn Tenant and its Git repository

  • You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager

    Don’t use your personal account to login to the cluster manager for installation.
  • You want to register a new cluster in Lieutenant and are about to install Openshift 4 on OpenStack

Prerequisites

  • jq

  • yq yq YAML processor (version 4 or higher - use the go version by mikefarah, not the jq wrapper by kislyuk)

  • openshift-install (direct download: linux, macOS)

  • oc (direct download: linux, macOS)

  • kubectl

  • vault Vault CLI

  • curl

  • gzip

  • docker

  • unzip

  • openstack CLI

    The OpenStack CLI is available as a Python package.

    Ubuntu/Debian
    sudo apt install python3-openstackclient
    Arch
    sudo yay -S python-openstackclient
    MacOS
    brew install openstackclient

    Optionally, you can also install additional CLIs for object storage (swift) and images (glance).

Cluster Installation

Register the new OpenShift 4 cluster in Lieutenant.

Lieutenant API endpoint

Use the following endpoint for Lieutenant:

Set cluster facts

For customer clusters, set the following cluster facts in Lieutenant:

  • access_policy: Access-Policy of the cluster, such as regular or swissonly

  • service_level: Name of the service level agreement for this cluster, such as guaranteed-availability

  • sales_order: Name of the sales order to which the cluster is billed, such as S10000

  • release_channel: Name of the syn component release channel to use, such as stable

Set up LDAP service

  1. Create an LDAP service

    Use control.vshn.net/vshn/services/_create to create a service. The name must contain the customer and the cluster name. And then put the LDAP service ID in the following variable:

    export LDAP_ID="Your_LDAP_ID_here"
    export LDAP_PASSWORD="Your_LDAP_pw_here"

Use the same casing as the underlying LDAP service. Can be accessed by the hover text in the VSHN Control Panel.

LDAP Service hover text

Configure input

OpenStack API
export OS_AUTH_URL=<openstack authentication URL> (1)
1 Provide the URL with the leading https://
OpenStack credentials
export OS_USERNAME=<username>
export OS_PASSWORD=<password>
OpenStack project, region and domain details
export OS_PROJECT_NAME=<project name>
export OS_PROJECT_DOMAIN_NAME=<project domain name>
export OS_USER_DOMAIN_NAME=<user domain name>
export OS_REGION_NAME=<region name>
export OS_PROJECT_ID=$(openstack project show $OS_PROJECT_NAME -f json | jq -r .id) (1)
1 TBD if really needed
Cluster machine network
export MACHINE_NETWORK_CIDR=<machine network cidr>
export EXTERNAL_NETWORK_NAME=<external network name> (1)
1 The instructions create floating IPs for the API and ingress in the specified network.
VM flavors
export CONTROL_PLANE_FLAVOR=<flavor name> (1)
export INFRA_FLAVOR=<flavor name> (1)
export APP_FLAVOR=<flavor name> (1)
1 Check openstack flavor list for available options.
Access to VSHN Lieutenant
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>

# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
OpenShift configuration
export BASE_DOMAIN=<your-base-domain> # customer-provided base domain without cluster name, e.g. "zrh.customer.vshnmanaged.net"
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.

For BASE_DOMAIN explanation, see DNS Scheme.

Set secrets in Vault

Connect with Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
Store various secrets in Vault
# Store OpenStack credentials
vault kv put  clusters/kv/${TENANT_ID}/${CLUSTER_ID}/openstack/credentials \
  username=${OS_USERNAME} \
  password=${OS_PASSWORD}

# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
  httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)

# Set the LDAP password
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vshn-ldap \
  bindPassword=${LDAP_PASSWORD}

# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
  password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)

# Copy the VSHN acme-dns registration password
vault kv get -format=json "clusters/kv/template/cert-manager" | jq '.data.data' \
  | vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cert-manager" -

Setup floating IPs and DNS records for the API and ingress

  1. Create floating IPs in the OpenStack API

    export API_VIP=$(openstack floating ip create \
      --description "API ${CLUSTER_ID}.${BASE_DOMAIN}" "${EXTERNAL_NETWORK_NAME}" \
      -f json | jq -r .floating_ip_address)
    export INGRESS_VIP=$(openstack floating ip create \
      --description "Ingress ${CLUSTER_ID}.${BASE_DOMAIN}" "${EXTERNAL_NETWORK_NAME}" \
      -f json | jq -r .floating_ip_address)
  2. Create the initial DNS zone for the cluster

    cat <<EOF
    \$ORIGIN ${CLUSTER_ID}.${BASE_DOMAIN}.
    
    api       IN A     ${API_VIP}
    ingress   IN A     ${INGRESS_VIP}
    
    *.apps    IN CNAME ingress.${CLUSTER_ID}.${BASE_DOMAIN}.
    EOF

    This step assumes that DNS for the cluster is managed by VSHN. See the VSHN zonefiles repo for details.

Create security group for Cilium

  1. Create a security group

    CILIUM_SECURITY_GROUP_ID=$(openstack security group create ${CLUSTER_ID}-cilium \
      --description "Cilium CNI security group rules for ${CLUSTER_ID}" -f json | \
      jq -r .id)
  2. Create rules for Cilium traffic

    openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 4240 --description "Cilium health checks" "$CILIUM_SECURITY_GROUP_ID"
    openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 4244 --description "Cilium Hubble server" "$CILIUM_SECURITY_GROUP_ID"
    openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 4245 --description "Cilium Hubble relay" "$CILIUM_SECURITY_GROUP_ID"
    openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 6942 --description "Cilium operator metrics" "$CILIUM_SECURITY_GROUP_ID"
    openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 2112 --description "Cilium Hubble enterprise metrics" "$CILIUM_SECURITY_GROUP_ID"
    openstack security group rule create --protocol udp --remote-ip "$MACHINE_NETWORK_CIDR" \
      --dst-port 8472 --description "Cilium VXLAN" "$CILIUM_SECURITY_GROUP_ID"

Prepare Cluster Repository

Starting with this section, we recommend that you change into a clean directory (for example a directory in your home).

Check Running Commodore for details on how to run commodore.

  1. Prepare Commodore inventory.

    mkdir -p inventory/classes/
    git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
  2. Add Cilium to cluster configuration

    pushd "inventory/classes/${TENANT_ID}/"
    
    yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml
    yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml
    
    yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml
    yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml
    
    git commit -a -m "Add Cilium addon to ${CLUSTER_ID}"
    
    git push
    popd
  3. Compile catalog

    commodore catalog compile ${CLUSTER_ID} --push -i \
      --dynamic-fact kubernetesVersion.major=$(echo "1.26" | awk -F. '{print $1}') \
      --dynamic-fact kubernetesVersion.minor=$(echo "1.26" | awk -F. '{print $2}') \
      --dynamic-fact openshiftVersion.Major=$(echo "4.13" | awk -F. '{print $1}') \
      --dynamic-fact openshiftVersion.Minor=$(echo "4.13" | awk -F. '{print $2}')
    This commodore call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.

Configure the OpenShift Installer

  1. Generate SSH key

    We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.

    SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID"
    export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub"
    
    ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N ''
    
    BASE64_NO_WRAP='base64'
    if [[ "$OSTYPE" == "linux"* ]]; then
      BASE64_NO_WRAP='base64 --wrap 0'
    fi
    
    vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/openstack/ssh \
      private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP")
    
    ssh-add $SSH_PRIVATE_KEY
  2. Prepare install-config.yaml

    You can add more options to the install-config.yaml file. Have a look at the installation configuration parameters for more information.

    export INSTALLER_DIR="$(pwd)/target"
    mkdir -p "${INSTALLER_DIR}"
    
    cat > "${INSTALLER_DIR}/clouds.yaml" <<EOF
    clouds:
      shiftstack:
        auth:
          auth_url: ${OS_AUTH_URL}
          project_name: ${OS_PROJECT_NAME}
          username: ${OS_USERNAME}
          password: ${OS_PASSWORD}
          user_domain_name: ${OS_USER_DOMAIN_NAME}
          project_domain_name: ${OS_PROJECT_DOMAIN_NAME}
    EOF
    
    cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF
    apiVersion: v1
    metadata:
      name: ${CLUSTER_ID}
    baseDomain: ${BASE_DOMAIN}
    compute: (1)
      - architecture: amd64
        hyperthreading: Enabled
        name: worker
        replicas: 3
        platform:
          openstack:
            type: ${APP_FLAVOR}
            rootVolume:
              size: 100
              type: __DEFAULT__ # TODO: is this generally applicable?
            additionalSecurityGroupIDs: (2)
              - ${CILIUM_SECURITY_GROUP_ID}
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      replicas: 3
      platform:
        openstack:
          type: ${CONTROL_PLANE_FLAVOR}
          rootVolume:
            size: 100
            type: __DEFAULT__ # TODO: is this generally applicable?
          additionalSecurityGroupIDs: (2)
            - ${CILIUM_SECURITY_GROUP_ID}
    platform:
      openstack:
        cloud: shiftstack (3)
        externalNetwork: ${EXTERNAL_NETWORK_NAME}
        apiFloatingIP: ${API_VIP}
        ingressFloatingIP: ${INGRESS_VIP}
    networking:
      networkType: Cilium
      machineNetwork:
        - cidr: ${MACHINE_NETWORK_CIDR}
    pullSecret: |
      ${PULL_SECRET}
    sshKey: "$(cat $SSH_PUBLIC_KEY)"
    EOF
    1 We only provision a single compute machine set. The final machine sets will be configured through Project Syn.
    2 We attach the Cilium security group to both the control plane and the worker nodes. This ensures that there’s no issues with Cilium traffic during bootstrapping.
    3 This field must match the entry in clouds in the clouds.yaml file. If you’re following this guide, you shouldn’t need to adjust this.

    If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with less catalog/manifests/cilium/olm/*ciliumconfig.yaml.

Prepare the OpenShift Installer

The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps.
  1. Render install manifests (this will consume the install-config.yaml)

    openshift-install --dir "${INSTALLER_DIR}" \
      create manifests
    1. If you want to change the default "apps" domain for the cluster:

      yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \
        spec.domain apps.example.com
  2. Copy pre-rendered Cilium manifests

    cp catalog/manifests/cilium/olm/* target/manifests/
  3. Extract the cluster domain from the generated manifests

    export CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \
      "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml")
  4. Prepare install manifests and ignition config

    openshift-install --dir "${INSTALLER_DIR}" \
      create ignition-configs

Update Project Syn cluster config

  1. Switch to the tenant repo

    pushd "inventory/classes/${TENANT_ID}/"
  2. Include openshift4.yml if it exists

    if ls openshift4.y*ml 1>/dev/null 2>&1; then
        yq eval -i '.classes += ".openshift4"' ${CLUSTER_ID}.yml;
    fi
  3. Update cluster config

    yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.vshnLdap.serviceId = \"${LDAP_ID}\"" \
      ${CLUSTER_ID}.yml

    If you use a custom "apps" domain, make sure to set parameters.openshift.appsDomain accordingly.

    APPS_DOMAIN=your.custom.apps.domain
    yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \
      ${CLUSTER_ID}.yml

    By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set parameters.openshift4_version.spec.channel accordingly.

    # Configure the OpenShift update channel as `fast`
    yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \
      ${CLUSTER_ID}.yml
  4. Configure OpenStack parameters

    yq eval -i ".parameters.openshift.openstack.app_flavor = \"${APP_FLAVOR}\"" \
      ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.openshift.openstack.infra_flavor = \"${INFRA_FLAVOR}\"" \
      ${CLUSTER_ID}.yml

Commit changes and compile cluster catalog

  1. Review changes. Have a look at the file ${CLUSTER_ID}.yml. Override default parameters or add more component configurations as required for your cluster.

  2. Commit changes

    git commit -a -m "Setup cluster ${CLUSTER_ID}"
    git push
    
    popd
  3. Compile and push cluster catalog

    commodore catalog compile ${CLUSTER_ID} --push -i \
      --dynamic-fact kubernetesVersion.major=$(echo "1.26" | awk -F. '{print $1}') \
      --dynamic-fact kubernetesVersion.minor=$(echo "1.26" | awk -F. '{print $2}') \
      --dynamic-fact openshiftVersion.Major=$(echo "4.13" | awk -F. '{print $1}') \
      --dynamic-fact openshiftVersion.Minor=$(echo "4.13" | awk -F. '{print $2}')
    This commodore call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.

Provision the cluster

The steps in this section must be run on a host which can reach the OpenStack API. If you can’t reach the OpenStack API directly, but a SSH jumphost is available, you can setup a SOCKS5 proxy with the following commands:

export JUMPHOST_FQDN=<jumphost fqdn or alias from your SSH config> (1)
ssh -D 12000 -q -f -N ${JUMPHOST_FQDN} (2)
export https_proxy=socks5://localhost:12000 (3)
export CURL_OPTS="-xsocks5h://localhost:12000"
1 The FQDN or SSH alias of the host which can reach the OpenStack API
2 This command expects that your SSH config is setup so that ssh ${JUMPHOST_FQDN} works without further configuration
3 The openshift-install tool respects the https_proxy environment variable
  1. Run the OpenShift installer

    openshift-install --dir "${INSTALLER_DIR}" \
      create cluster --log-level=debug

Access cluster API

  1. Export kubeconfig

    export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
  2. Verify API access

    kubectl cluster-info

If the cluster API is only reachable with a SOCKS5 proxy, run the following commands instead:

cp ${INSTALLER_DIR}/auth/kubeconfig ${INSTALLER_DIR}/auth/kubeconfig-socks5
yq eval -i '.clusters[0].cluster.proxy-url="socks5://localhost:12000"' \
    ${INSTALLER_DIR}/auth/kubeconfig-socks5
export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig-socks5"

Create a server group for the infra nodes

  1. Create the server group

    openstack server group create $(jq -r '.infraID "${INSTALLER_DIR}/metadata.json")-infra \
      --policy soft-anti-affinity

Configure registry S3 credentials

  1. Create secret with S3 credentials for the registry

    oc create secret generic image-registry-private-configuration-user \
      --namespace openshift-image-registry \
      --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=<TBD> \
      --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=<TBD>

    If the registry S3 credentials are created too long after the initial cluster setup, it’s possible that the openshift-samples operator has disabled itself because it couldn’t find a working in-cluster registry.

    If the samples operator is disabled, no templates and builder images will be available on the cluster.

    You can check the samples-operator’s state with the following command:

    kubectl get config.samples cluster -ojsonpath='{.spec.managementState}'

    If the command returns Removed, verify that the in-cluster registry pods are now running, and enable the samples operator again:

    kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'

    See the upstream documentation for more details on the samples operator.

Setup acme-dns CNAME records for the cluster

You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates.
  1. Extract the acme-dns subdomain for the cluster after cert-manager has been deployed via Project Syn.

    fulldomain=$(kubectl -n syn-cert-manager \
      get secret acme-dns-client \
      -o jsonpath='{.data.acmedns\.json}' | \
      base64 -d  | \
      jq -r '[.[]][0].fulldomain')
    echo "$fulldomain"
  2. Add the following CNAME records to the cluster’s DNS zone

    The _acme-challenge records must be created in the same zone as the cluster’s api and apps records respectively.

    $ORIGIN <cluster-zone> (2)
    _acme-challenge.api  IN CNAME <fulldomain>. (1)
    $ORIGIN <apps-base-domain> (3)
    _acme-challenge.apps IN CNAME <fulldomain>. (1)
    1 Replace <fulldomain> with the output of the previous step.
    2 The _acme-challenge.api record must be created in the same origin as the api record.
    3 The _acme-challenge.apps record must be created in the same origin as the apps record.

Store the cluster’s admin credentials in the password manager

  1. Once the cluster’s production API certificate has been deployed, edit the cluster’s admin kubeconfig file to remove the initial API certificate CA.

    You may see the error Unable to connect to the server: x509: certificate signed by unknown authority when executing kubectl or oc commands after the cluster’s production API certificate has been deployed by Project Syn.

    This error can be addressed by removing the initial CA certificate data from the admin kubeconfig as shown in this step.

    yq e -i 'del(.clusters[0].cluster.certificate-authority-data)' \
      "${INSTALLER_DIR}/auth/kubeconfig"
  2. Save the admin credentials in the password manager. You can find the password in the file target/auth/kubeadmin-password and the kubeconfig in target/auth/kubeconfig

    ls -l ${INSTALLER_DIR}/auth/

Finalize installation

  1. Verify that the Project Syn-managed machine sets have been provisioned

    kubectl -n openshift-machine-api get machineset -l argocd.argoproj.io/instance

    The command should show something like

    NAME    DESIRED   CURRENT   READY   AVAILABLE   AGE
    app     3         3         3       3           4d5h (1)
    infra   4         4         4       4           4d5h (1)
    1 The values for DESIRED and AVAILABLE should match.

    If there’s discrepancies between the desired and available counts of the machine sets, you can list the machine objects which aren’t in phase "Running":

    kubectl -n openshift-machine-api get machine | grep -v Running

    You can see errors by looking at an individual machine object with kubectl describe.

  2. If the Project Syn-managed machine sets are healthy, scale down the initial worker machine set

    If the Project Syn-managed machine sets aren’t healthy, this step may reduce the cluster capacity to the point where infrastructure components can’t run. Make sure you have sufficient cluster capacity before continuing.

    INFRA_ID=$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")
    kubectl -n openshift-machine-api patch machineset ${INFRA_ID}-worker \
      -p '{"spec": {"replicas": 0}}' --type merge
  3. Once the initial machine set is scaled down, verify that all pods are still running. The command below should produce no output.

    kubectl get pods -A | grep -vw -e Running -e Completed
  4. If all pods are still running, delete the initial machine set

    kubectl -n openshift-machine-api delete machineset ${INFRA_ID}-worker
  5. Delete local config files

    rm -r ${INSTALLER_DIR}/

Post tasks

VSHN

  1. Enable automated upgrades

  2. Add the cluster to the maintenance template, if necessary

Generic