Install OpenShift 4 on OpenStack
Steps to install an OpenShift 4 cluster on Red Hat OpenStack.
These steps follow the Installing a cluster on OpenStack docs to set up an installer provisioned installation (IPI).
This how-to guide is an early draft. So far, we’ve setup only one cluster using the instructions in this guide. |
The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h. |
Starting situation
-
You already have a Project Syn Tenant and its Git repository
-
You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager
Don’t use your personal account to login to the cluster manager for installation. -
You want to register a new cluster in Lieutenant and are about to install Openshift 4 on OpenStack
Prerequisites
-
jq
-
yq
yq YAML processor (version 4 or higher - use the go version by mikefarah, not the jq wrapper by kislyuk) -
vault
Vault CLI -
curl
-
emergency-credentials-receive
Install instructions -
gzip
-
docker
-
unzip
-
openstack
CLIThe OpenStack CLI is available as a Python package.
Ubuntu/Debiansudo apt install python3-openstackclient
Archsudo yay -S python-openstackclient
MacOSbrew install openstackclient
Optionally, you can also install additional CLIs for object storage (
swift
) and images (glance
).
Cluster Installation
Register the new OpenShift 4 cluster in Lieutenant.
Set cluster facts
For customer clusters, set the following cluster facts in Lieutenant:
-
access_policy
: Access-Policy of the cluster, such asregular
orswissonly
-
service_level
: Name of the service level agreement for this cluster, such asguaranteed-availability
-
sales_order
: Name of the sales order to which the cluster is billed, such asS10000
-
release_channel
: Name of the syn component release channel to use, such asstable
-
cilium_addons
: Comma-separated list of cilium addons the customer gets billed for, such asadvanced_networking
ortetragon
. Set toNONE
if no addons should be billed.
Set up Keycloak service
-
Create a Keycloak service
Use control.vshn.net/vshn/services/_create to create a service. The name and ID must be clusters name. For the optional URL use the OpenShift console URL.
Configure input
export OS_AUTH_URL=<openstack authentication URL> (1)
1 | Provide the URL with the leading https:// |
export OS_USERNAME=<username>
export OS_PASSWORD=<password>
export OS_PROJECT_NAME=<project name>
export OS_PROJECT_DOMAIN_NAME=<project domain name>
export OS_USER_DOMAIN_NAME=<user domain name>
export OS_REGION_NAME=<region name>
export OS_PROJECT_ID=$(openstack project show $OS_PROJECT_NAME -f json | jq -r .id) (1)
1 | TBD if really needed |
export MACHINE_NETWORK_CIDR=<machine network cidr>
export EXTERNAL_NETWORK_NAME=<external network name> (1)
1 | The instructions create floating IPs for the API and ingress in the specified network. |
export CONTROL_PLANE_FLAVOR=<flavor name> (1)
export INFRA_FLAVOR=<flavor name> (1)
export APP_FLAVOR=<flavor name> (1)
1 | Check openstack flavor list for available options. |
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
export BASE_DOMAIN=<your-base-domain> # customer-provided base domain without cluster name, e.g. "zrh.customer.vshnmanaged.net"
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.
For BASE_DOMAIN
explanation, see DNS Scheme.
Set secrets in Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
# Store OpenStack credentials
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/openstack/credentials \
username=${OS_USERNAME} \
password=${OS_PASSWORD}
# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)
# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
Setup floating IPs and DNS records for the API and ingress
-
Create floating IPs in the OpenStack API
export API_VIP=$(openstack floating ip create \ --description "API ${CLUSTER_ID}.${BASE_DOMAIN}" "${EXTERNAL_NETWORK_NAME}" \ -f json | jq -r .floating_ip_address) export INGRESS_VIP=$(openstack floating ip create \ --description "Ingress ${CLUSTER_ID}.${BASE_DOMAIN}" "${EXTERNAL_NETWORK_NAME}" \ -f json | jq -r .floating_ip_address)
-
Create the initial DNS zone for the cluster
cat <<EOF \$ORIGIN ${CLUSTER_ID}.${BASE_DOMAIN}. api IN A ${API_VIP} ingress IN A ${INGRESS_VIP} *.apps IN CNAME ingress.${CLUSTER_ID}.${BASE_DOMAIN}. EOF
This step assumes that DNS for the cluster is managed by VSHN. See the VSHN zonefiles repo for details.
Create security group for Cilium
-
Create a security group
CILIUM_SECURITY_GROUP_ID=$(openstack security group create ${CLUSTER_ID}-cilium \ --description "Cilium CNI security group rules for ${CLUSTER_ID}" -f json | \ jq -r .id)
-
Create rules for Cilium traffic
openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 4240 --description "Cilium health checks" "$CILIUM_SECURITY_GROUP_ID" openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 4244 --description "Cilium Hubble server" "$CILIUM_SECURITY_GROUP_ID" openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 4245 --description "Cilium Hubble relay" "$CILIUM_SECURITY_GROUP_ID" openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 6942 --description "Cilium operator metrics" "$CILIUM_SECURITY_GROUP_ID" openstack security group rule create --protocol tcp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 2112 --description "Cilium Hubble enterprise metrics" "$CILIUM_SECURITY_GROUP_ID" openstack security group rule create --protocol udp --remote-ip "$MACHINE_NETWORK_CIDR" \ --dst-port 8472 --description "Cilium VXLAN" "$CILIUM_SECURITY_GROUP_ID"
Prepare Cluster Repository
Starting with this section, we recommend that you change into a clean directory (for example a directory in your home). |
Check Running Commodore for details on how to run commodore. |
-
Prepare Commodore inventory.
mkdir -p inventory/classes/ git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
-
Configure the cluster’s domain in Project Syn
export CLUSTER_DOMAIN="${CLUSTER_ID}.${BASE_DOMAIN}" (1)
1 Adjust this as necessary if you’re using a non-standard cluster domain. The cluster domain configured here must be correct. The value is used to configure how Cilium connects to the cluster’s K8s API.
pushd "inventory/classes/${TENANT_ID}/" yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \ ${CLUSTER_ID}.yml git commit -a -m "Configure cluster domain for ${CLUSTER_ID}"
-
Include
openshift4.yml
in the cluster’s config if it existsFor some tenants, this may already configure some of the settings shown in this how-to. if ls openshift4.y*ml 1>/dev/null 2>&1; then yq eval -i '.classes += ".openshift4"' ${CLUSTER_ID}.yml; git commit -a -m "Include openshift4 class for ${CLUSTER_ID}" fi
-
Add Cilium to cluster configuration
These instructions assume that Cilium is configured to use
api-int.${CLUSTER_DOMAIN}:6443
to connect to the cluster’s K8s API. To ensure that that’s the case, add the configuration shown below somewhere in the Project Syn config hierarchy.parameters: cilium: cilium_helm_values: k8sServiceHost: api-int.${openshift:baseDomain} k8sServicePort: "6443"
For VSHN, this configuration is set in the Commodore global defaults (internal).
yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml git commit -a -m "Add Cilium addon to ${CLUSTER_ID}" git push popd
-
Compile catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.29" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.29" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.16" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.16" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Configure the OpenShift Installer
-
Generate SSH key
We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.
SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID" export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub" ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N '' BASE64_NO_WRAP='base64' if [[ "$OSTYPE" == "linux"* ]]; then BASE64_NO_WRAP='base64 --wrap 0' fi vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/openstack/ssh \ private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP") ssh-add $SSH_PRIVATE_KEY
-
Prepare
install-config.yaml
You can add more options to the
install-config.yaml
file. Have a look at the installation configuration parameters for more information.export INSTALLER_DIR="$(pwd)/target" mkdir -p "${INSTALLER_DIR}" cat > "${INSTALLER_DIR}/clouds.yaml" <<EOF clouds: shiftstack: auth: auth_url: ${OS_AUTH_URL} project_name: ${OS_PROJECT_NAME} username: ${OS_USERNAME} password: ${OS_PASSWORD} user_domain_name: ${OS_USER_DOMAIN_NAME} project_domain_name: ${OS_PROJECT_DOMAIN_NAME} EOF cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF apiVersion: v1 metadata: name: ${CLUSTER_ID} (1) baseDomain: ${BASE_DOMAIN} (1) compute: (2) - architecture: amd64 hyperthreading: Enabled name: worker replicas: 3 platform: openstack: type: ${APP_FLAVOR} rootVolume: size: 100 type: __DEFAULT__ # TODO: is this generally applicable? additionalSecurityGroupIDs: (3) - ${CILIUM_SECURITY_GROUP_ID} controlPlane: architecture: amd64 hyperthreading: Enabled name: master replicas: 3 platform: openstack: type: ${CONTROL_PLANE_FLAVOR} rootVolume: size: 100 type: __DEFAULT__ # TODO: is this generally applicable? additionalSecurityGroupIDs: (3) - ${CILIUM_SECURITY_GROUP_ID} platform: openstack: cloud: shiftstack (4) externalNetwork: ${EXTERNAL_NETWORK_NAME} apiFloatingIP: ${API_VIP} ingressFloatingIP: ${INGRESS_VIP} networking: networkType: Cilium machineNetwork: - cidr: ${MACHINE_NETWORK_CIDR} pullSecret: | ${PULL_SECRET} sshKey: "$(cat $SSH_PUBLIC_KEY)" EOF
1 Make sure that the values here match the value of $CLUSTER_DOMAIN
when combined as<metadata.name>.<baseDomain>
. Otherwise, the installation will most likely fail.2 We only provision a single compute machine set. The final machine sets will be configured through Project Syn. 3 We attach the Cilium security group to both the control plane and the worker nodes. This ensures that there’s no issues with Cilium traffic during bootstrapping. 4 This field must match the entry in clouds
in theclouds.yaml
file. If you’re following this guide, you shouldn’t need to adjust this.If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with
less catalog/manifests/cilium/olm/*ciliumconfig.yaml
.
Prepare the OpenShift Installer
The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps. |
-
Render install manifests (this will consume the
install-config.yaml
)openshift-install --dir "${INSTALLER_DIR}" \ create manifests
-
If you want to change the default "apps" domain for the cluster:
yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \ spec.domain apps.example.com
-
-
Copy pre-rendered extra machine configs
machineconfigs=catalog/manifests/openshift4-nodes/10_machineconfigs.yaml if [ -f $machineconfigs ]; then yq --no-doc -s \ "\"${INSTALLER_DIR}/openshift/99x_openshift-machineconfig_\" + .metadata.name" \ $machineconfigs fi
-
Copy pre-rendered Cilium manifests
cp catalog/manifests/cilium/olm/* ${INSTALLER_DIR}/manifests/
-
Verify that the generated cluster domain matches the desired cluster domain
GEN_CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \ "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml") if [ "$GEN_CLUSTER_DOMAIN" != "$CLUSTER_DOMAIN" ]; then echo -e "\033[0;31mGenerated cluster domain doesn't match expected cluster domain: Got '$GEN_CLUSTER_DOMAIN', want '$CLUSTER_DOMAIN'\033[0;0m" else echo -e "\033[0;32mGenerated cluster domain matches expected cluster domain.\033[0;0m" fi
-
Prepare install manifests and ignition config
openshift-install --dir "${INSTALLER_DIR}" \ create ignition-configs
Update Project Syn cluster config
-
Switch to the tenant repo
pushd "inventory/classes/${TENANT_ID}/"
-
Include no-opsgenie class to prevent monitoring noise during cluster setup
yq eval -i '.classes += "global.distribution.openshift4.no-opsgenie"' ${CLUSTER_ID}.yml;
-
Update cluster config
yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \ ${CLUSTER_ID}.yml
If you use a custom "apps" domain, make sure to set
parameters.openshift.appsDomain
accordingly.APPS_DOMAIN=your.custom.apps.domain yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \ ${CLUSTER_ID}.yml
By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set
parameters.openshift4_version.spec.channel
accordingly.# Configure the OpenShift update channel as `fast` yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \ ${CLUSTER_ID}.yml
-
Configure OpenStack parameters
yq eval -i ".parameters.openshift.openstack.app_flavor = \"${APP_FLAVOR}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.openstack.infra_flavor = \"${INFRA_FLAVOR}\"" \ ${CLUSTER_ID}.yml
Commit changes and compile cluster catalog
-
Review changes. Have a look at the file
${CLUSTER_ID}.yml
. Override default parameters or add more component configurations as required for your cluster. -
Commit changes
git commit -a -m "Setup cluster ${CLUSTER_ID}" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.29" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.29" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.16" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.16" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Provision the cluster
The steps in this section must be run on a host which can reach the OpenStack API. If you can’t reach the OpenStack API directly, but a SSH jumphost is available, you can setup a SOCKS5 proxy with the following commands:
|
-
Run the OpenShift installer
openshift-install --dir "${INSTALLER_DIR}" \ create cluster --log-level=debug
Access cluster API
-
Export kubeconfig
export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
-
Verify API access
kubectl cluster-info
If the cluster API is only reachable with a SOCKS5 proxy, run the following commands instead:
|
Create a server group for the infra nodes
-
Create the server group
openstack server group create $(jq -r '.infraID "${INSTALLER_DIR}/metadata.json")-infra \ --policy soft-anti-affinity
Configure registry S3 credentials
-
Create secret with S3 credentials for the registry
oc create secret generic image-registry-private-configuration-user \ --namespace openshift-image-registry \ --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=<TBD> \ --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=<TBD>
If the registry S3 credentials are created too long after the initial cluster setup, it’s possible that the
openshift-samples
operator has disabled itself because it couldn’t find a working in-cluster registry.If the samples operator is disabled, no templates and builder images will be available on the cluster.
You can check the samples-operator’s state with the following command:
kubectl get config.samples cluster -ojsonpath='{.spec.managementState}'
If the command returns
Removed
, verify that the in-cluster registry pods are now running, and enable the samples operator again:kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'
See the upstream documentation for more details on the samples operator.
Setup acme-dns CNAME records for the cluster
You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates. |
-
Extract the acme-dns subdomain for the cluster after
cert-manager
has been deployed via Project Syn.fulldomain=$(kubectl -n syn-cert-manager \ get secret acme-dns-client \ -o jsonpath='{.data.acmedns\.json}' | \ base64 -d | \ jq -r '[.[]][0].fulldomain') echo "$fulldomain"
-
Add the following CNAME records to the cluster’s DNS zone
The
_acme-challenge
records must be created in the same zone as the cluster’sapi
andapps
records respectively.$ORIGIN <cluster-zone> (2) _acme-challenge.api IN CNAME <fulldomain>. (1) $ORIGIN <apps-base-domain> (3) _acme-challenge.apps IN CNAME <fulldomain>. (1)
1 Replace <fulldomain>
with the output of the previous step.2 The _acme-challenge.api
record must be created in the same origin as theapi
record.3 The _acme-challenge.apps
record must be created in the same origin as theapps
record.
Ensure emergency admin access to the cluster
-
Check that emergency credentials were uploaded and are accessible:
emergency-credentials-receive "${CLUSTER_ID}" # Follow the instructions to use the downloaded kubeconfig file
You need to be in the passbolt group
VSHN On-Call
.If the command fails, check if the controller is already deployed, running, and if the credentials are uploaded:
kubectl -n appuio-emergency-credentials-controller get emergencyaccounts.cluster.appuio.io -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.lastTokenCreationTimestamp}{"\n"}{end}'
-
Follow the instructions from
emergency-credentials-receive
to use the downloadedkubeconfig
file.export KUBECONFIG="em-${CLUSTER_ID}" kubectl get nodes oc whoami # should output system:serviceaccount:appuio-emergency-credentials-controller:*
-
Invalidate the 10 year admin kubeconfig.
kubectl -n openshift-config patch cm admin-kubeconfig-client-ca --type=merge -p '{"data": {"ca-bundle.crt": ""}}'
Enable Opsgenie alerting
-
Create the standard silence for alerts that don’t have the
syn
labeloc --as cluster-admin -n openshift-monitoring create job --from=cronjob/silence silence-manual oc wait -n openshift-monitoring --for=condition=complete job/silence-manual oc --as cluster-admin -n openshift-monitoring delete job/silence-manual
-
Check the remaining active alerts and address them where neccessary
kubectl --as=cluster-admin -n openshift-monitoring exec sts/alertmanager-main -- \ amtool --alertmanager.url=http://localhost:9093 alert --active
-
Remove the "no-opsgenie" class from the cluster’s configuration
pushd "inventory/classes/${TENANT_ID}/" yq eval -i 'del(.classes[] | select(. == "*.no-opsgenie"))' ${CLUSTER_ID}.yml git commit -a -m "Enable opsgenie alerting on cluster ${CLUSTER_ID}" git push popd
Finalize installation
-
Verify that the Project Syn-managed machine sets have been provisioned
kubectl -n openshift-machine-api get machineset -l argocd.argoproj.io/instance
The command should show something like
NAME DESIRED CURRENT READY AVAILABLE AGE app 3 3 3 3 4d5h (1) infra 4 4 4 4 4d5h (1)
1 The values for DESIRED
andAVAILABLE
should match.If there’s discrepancies between the desired and available counts of the machine sets, you can list the machine objects which aren’t in phase "Running":
kubectl -n openshift-machine-api get machine | grep -v Running
You can see errors by looking at an individual machine object with
kubectl describe
. -
If the Project Syn-managed machine sets are healthy, scale down the initial worker machine set
If the Project Syn-managed machine sets aren’t healthy, this step may reduce the cluster capacity to the point where infrastructure components can’t run. Make sure you have sufficient cluster capacity before continuing.
INFRA_ID=$(jq -r .infraID "${INSTALLER_DIR}/metadata.json") kubectl -n openshift-machine-api patch machineset ${INFRA_ID}-worker-0 \ -p '{"spec": {"replicas": 0}}' --type merge
-
Once the initial machine set is scaled down, verify that all pods are still running. The command below should produce no output.
kubectl get pods -A | grep -vw -e Running -e Completed
-
If all pods are still running, delete the initial machine set
kubectl -n openshift-machine-api delete machineset ${INFRA_ID}-worker-0
-
Delete local config files
rm -r ${INSTALLER_DIR}/
Post tasks
VSHN
-
Add the cluster to the maintenance template, if necessary
Generic
-
Do a first maintenance