Install OpenShift 4 on vSphere
Steps to install an OpenShift 4 cluster on VMWare vSphere.
These steps follow the Installing a cluster on vSphere docs to set up an installer provisioned installation (IPI).
This how-to guide is an early draft. So far, we’ve setup only one cluster using the instructions in this guide. |
The certificates created during bootstrap are only valid for 24h. So make sure you complete these steps within 24h. |
Starting situation
-
You already have a Project Syn Tenant and its Git repository
-
You have a CCSP Red Hat login and are logged into Red Hat Openshift Cluster Manager
Don’t use your personal account to login to the cluster manager for installation. -
You have credentials for the target vSphere cluster with the permissions described in the upstream documentation.
-
You want to register a new cluster in Lieutenant and are about to install Openshift 4 on vSphere
Cluster Installation
Register the new OpenShift 4 cluster in Lieutenant.
Set up LDAP service
-
Create an LDAP service
Use control.vshn.net/vshn/services/_create to create a service. The name must contain the customer and the cluster name. And then put the LDAP service ID in the following variable:
export LDAP_ID="Your_LDAP_ID_here" export LDAP_PASSWORD="Your_LDAP_pw_here"
Configure input
export VCENTER_HOSTNAME=<vcenter hostname> (1)
1 | The vCenter hostname must be provided without the leading https:// . |
export VSPHERE_USERNAME=<username>
export VSPHERE_PASSWORD=<password>
export VSPHERE_CLUSTER=<cluster name>
export VSPHERE_DATACENTER=<datacenter name>
export VSPHERE_DATASTORE=<datastore name>
export VSPHERE_NETWORK=<network name>
export MACHINE_NETWORK_CIDR=<machine network cidr>
export API_VIP=<api vip>
export INGRESS_VIP=<ingress vip>
# For example: https://api.syn.vshn.net
# IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
export COMMODORE_API_URL=<lieutenant-api-endpoint>
# Set Project Syn cluster and tenant ID
export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
export BASE_DOMAIN=<your-base-domain> # customer-provided base domain without cluster name, e.g. "zrh.customer.vshnmanaged.net"
export PULL_SECRET='<redhat-pull-secret>' # As copied from https://cloud.redhat.com/openshift/install/pull-secret "Copy pull secret". value must be inside quotes.
For BASE_DOMAIN
explanation, see DNS Scheme.
Set secrets in Vault
export VAULT_ADDR=https://vault-prod.syn.vshn.net
vault login -method=oidc
# Store vSphere credentials
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vsphere/credentials \
username=${VSPHERE_USERNAME} \
password=${VSPHERE_PASSWORD}
# Generate an HTTP secret for the registry
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/registry \
httpSecret=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 128)
# Set the LDAP password
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vshn-ldap \
bindPassword=${LDAP_PASSWORD}
# Generate a master password for K8up backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/global-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Generate a password for the cluster object backups
vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cluster-backup \
password=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32)
# Copy the VSHN acme-dns registration password
vault kv get -format=json "clusters/kv/template/cert-manager" | jq '.data.data' \
| vault kv put -cas=0 "clusters/kv/${TENANT_ID}/${CLUSTER_ID}/cert-manager" -
Prepare Cluster Repository
Starting with this section, we recommend that you change into a clean directory (for example a directory in your home). |
Check Running Commodore for details on how to run commodore. |
-
Prepare Commodore inventory.
mkdir -p inventory/classes/ git clone $(curl -sH"Authorization: Bearer $(commodore fetch-token)" "${COMMODORE_API_URL}/tenants/${TENANT_ID}" | jq -r '.gitRepo.url') inventory/classes/${TENANT_ID}
-
Add Cilium to cluster configuration
pushd "inventory/classes/${TENANT_ID}/" yq eval -i '.applications += ["cilium"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.networkpolicy.ignoredNamespaces = ["openshift-oauth-apiserver"]' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift4_monitoring.upstreamRules.networkPlugin = "cilium"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.infraID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml yq eval -i '.parameters.openshift.clusterID = "TO_BE_DEFINED"' ${CLUSTER_ID}.yml git commit -a -m "Add Cilium addon to ${CLUSTER_ID}" git push popd
-
Compile catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.26" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.26" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.13" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.13" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Configure the OpenShift Installer
-
Generate SSH key
We generate a unique SSH key pair for the cluster as this gives us troubleshooting access.
SSH_PRIVATE_KEY="$(pwd)/ssh_$CLUSTER_ID" export SSH_PUBLIC_KEY="${SSH_PRIVATE_KEY}.pub" ssh-keygen -C "vault@$CLUSTER_ID" -t ed25519 -f $SSH_PRIVATE_KEY -N '' BASE64_NO_WRAP='base64' if [[ "$OSTYPE" == "linux"* ]]; then BASE64_NO_WRAP='base64 --wrap 0' fi vault kv put clusters/kv/${TENANT_ID}/${CLUSTER_ID}/vsphere/ssh \ private_key=$(cat $SSH_PRIVATE_KEY | eval "$BASE64_NO_WRAP") ssh-add $SSH_PRIVATE_KEY
-
Prepare
install-config.yaml
You can add more options to the
install-config.yaml
file. Have a look at the config example for more information.For example, you could change the SDN from a default value to something a customer requests due to some network requirements.
export INSTALLER_DIR="$(pwd)/target" mkdir -p "${INSTALLER_DIR}" cat > "${INSTALLER_DIR}/install-config.yaml" <<EOF apiVersion: v1 metadata: name: ${CLUSTER_ID} baseDomain: ${BASE_DOMAIN} compute: (1) - architecture: amd64 hyperthreading: Enabled name: worker replicas: 3 platform: vsphere: cpus: 4 coresPerSocket: 4 memoryMB: 16384 osDisk: diskSizeGB: 100 controlPlane: architecture: amd64 hyperthreading: Enabled name: master replicas: 3 platform: vsphere: cpus: 4 coresPerSocket: 4 memoryMB: 16384 osDisk: diskSizeGB: 100 platform: vsphere: apiVIPs: - ${API_VIP} cluster: ${VSPHERE_CLUSTER} datacenter: ${VSPHERE_DATACENTER} defaultDatastore: ${VSPHERE_DATASTORE} ingressVIPs: - ${INGRESS_VIP} network: ${VSPHERE_NETWORK} username: ${VSPHERE_USERNAME} password: ${VSPHERE_PASSWORD} vCenter: ${VCENTER_HOSTNAME} networking: networkType: Cilium machineNetwork: - cidr: ${MACHINE_NETWORK_CIDR} pullSecret: | ${PULL_SECRET} sshKey: "$(cat $SSH_PUBLIC_KEY)" EOF
1 We only provision a single compute machine set. The final machine sets will be configured through Project Syn. If setting custom CIDR for the OpenShift networking, the corresponding values should be updated in your Commodore cluster definitions. See Cilium Component Defaults and Parameter Reference. Verify with
less catalog/manifests/cilium/olm/*ciliumconfig.yaml
.
Prepare the OpenShift Installer
The steps in this section aren’t idempotent and have to be completed uninterrupted in one go. If you have to recreate the install config or any of the generated manifests you need to rerun all of the subsequent steps. |
-
Render install manifests (this will consume the
install-config.yaml
)openshift-install --dir "${INSTALLER_DIR}" \ create manifests
-
If you want to change the default "apps" domain for the cluster:
yq w -i "${INSTALLER_DIR}/manifests/cluster-ingress-02-config.yml" \ spec.domain apps.example.com
-
-
Copy pre-rendered Cilium manifests
cp catalog/manifests/cilium/olm/* target/manifests/
-
Extract the cluster domain from the generated manifests
export CLUSTER_DOMAIN=$(yq e '.spec.baseDomain' \ "${INSTALLER_DIR}/manifests/cluster-dns-02-config.yml")
-
Prepare install manifests and ignition config
openshift-install --dir "${INSTALLER_DIR}" \ create ignition-configs
Update Project Syn cluster config
-
Switch to the tenant repo
pushd "inventory/classes/${TENANT_ID}/"
-
Include openshift4.yml if it exists
if ls openshift4.y*ml 1>/dev/null 2>&1; then yq eval -i '.classes += ".openshift4"' ${CLUSTER_ID}.yml; fi
-
Update cluster config
yq eval -i ".parameters.openshift.baseDomain = \"${CLUSTER_DOMAIN}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.infraID = \"$(jq -r .infraID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.clusterID = \"$(jq -r .clusterID "${INSTALLER_DIR}/metadata.json")\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.ssh_key = \"$(cat ${SSH_PUBLIC_KEY})\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.vshnLdap.serviceId = \"${LDAP_ID}\"" \ ${CLUSTER_ID}.yml
If you use a custom "apps" domain, make sure to set
parameters.openshift.appsDomain
accordingly.APPS_DOMAIN=your.custom.apps.domain yq eval -i ".parameters.openshift.appsDomain = \"${APPS_DOMAIN}\"" \ ${CLUSTER_ID}.yml
By default, the cluster’s update channel is derived from the cluster’s reported OpenShift version. If you want to use a custom update channel, make sure to set
parameters.openshift4_version.spec.channel
accordingly.# Configure the OpenShift update channel as `fast` yq eval -i ".parameters.openshift4_version.spec.channel = \"fast-{ocp-minor-version}\"" \ ${CLUSTER_ID}.yml
-
Configure vSphere parameters
yq eval -i ".parameters.openshift.vsphere.network_name = \"${VSPHERE_NETWORK}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.vsphere.datacenter = \"${VSPHERE_DATACENTER}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.vsphere.datastore = \"${VSPHERE_DATASTORE}\"" \ ${CLUSTER_ID}.yml yq eval -i ".parameters.openshift.vsphere.server = \"${VCENTER_HOSTNAME}\"" \ ${CLUSTER_ID}.yml
Commit changes and compile cluster catalog
-
Review changes. Have a look at the file
${CLUSTER_ID}.yml
. Override default parameters or add more component configurations as required for your cluster. -
Commit changes
git commit -a -m "Setup cluster ${CLUSTER_ID}" git push popd
-
Compile and push cluster catalog
commodore catalog compile ${CLUSTER_ID} --push -i \ --dynamic-fact kubernetesVersion.major=$(echo "1.26" | awk -F. '{print $1}') \ --dynamic-fact kubernetesVersion.minor=$(echo "1.26" | awk -F. '{print $2}') \ --dynamic-fact openshiftVersion.Major=$(echo "4.13" | awk -F. '{print $1}') \ --dynamic-fact openshiftVersion.Minor=$(echo "4.13" | awk -F. '{print $2}')
This commodore
call requires Commodore v1.5.0 or newer. Please make sure to update your local installation.
Provision the cluster
The steps in this section must be run on a host which can reach the vSphere API. If you can’t reach the vSphere API directly, you can setup a SOCKS5 proxy with the following commands:
|
-
Trust the vSphere CA certificate
Ubuntu 22.04curl ${CURL_OPTS:-} -kLo vsphere-ca.zip https://"${VCENTER_HOSTNAME}"/certs/download.zip unzip vsphere-ca.zip for cert in certs/lin/*.0; do sudo cp $cert /usr/local/share/ca-certificates/$(basename ${cert}.crt); done rm vsphere-ca.zip sudo update-ca-certificates
-
Run the OpenShift installer
openshift-install --dir "${INSTALLER_DIR}" \ create cluster --log-level=debug
Access cluster API
-
Export kubeconfig
export KUBECONFIG="${INSTALLER_DIR}/auth/kubeconfig"
-
Verify API access
kubectl cluster-info
If the cluster API is only reachable with a SOCKS5 proxy, run the following commands instead:
|
Configure registry S3 credentials
-
Create secret with S3 credentials for the registry
oc create secret generic image-registry-private-configuration-user \ --namespace openshift-image-registry \ --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=<TBD> \ --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=<TBD>
If the registry S3 credentials are created too long after the initial cluster setup, it’s possible that the
openshift-samples
operator has disabled itself because it couldn’t find a working in-cluster registry.If the samples operator is disabled, no templates and builder images will be available on the cluster.
You can check the samples-operator’s state with the following command:
kubectl get config.samples cluster -ojsonpath='{.spec.managementState}'
If the command returns
Removed
, verify that the in-cluster registry pods are now running, and enable the samples operator again:kubectl patch config.samples cluster -p '{"spec":{"managementState":"Managed"}}'
See the upstream documentation for more details on the samples operator.
Set default storage class
-
Set storage class
thin-csi
as defaultkubectl annotate storageclass thin storageclass.kubernetes.io/is-default-class- kubectl annotate storageclass thin-csi storageclass.kubernetes.io/is-default-class=true
Setup acme-dns CNAME records for the cluster
You can skip this section if you’re not using Let’s Encrypt for the cluster’s API and default wildcard certificates. |
-
Extract the acme-dns subdomain for the cluster after
cert-manager
has been deployed via Project Syn.fulldomain=$(kubectl -n syn-cert-manager \ get secret acme-dns-client \ -o jsonpath='{.data.acmedns\.json}' | \ base64 -d | \ jq -r '[.[]][0].fulldomain') echo "$fulldomain"
Store the cluster’s admin credentials in the password manager
-
Once the cluster’s production API certificate has been deployed, edit the cluster’s admin kubeconfig file to remove the initial API certificate CA.
You may see the error
Unable to connect to the server: x509: certificate signed by unknown authority
when executingkubectl
oroc
commands after the cluster’s production API certificate has been deployed by Project Syn.This error can be addressed by removing the initial CA certificate data from the admin kubeconfig as shown in this step.
yq e -i 'del(.clusters[0].cluster.certificate-authority-data)' \ "${INSTALLER_DIR}/auth/kubeconfig"
-
Save the admin credentials in the password manager. You can find the password in the file
target/auth/kubeadmin-password
and the kubeconfig intarget/auth/kubeconfig
ls -l ${INSTALLER_DIR}/auth/
Finalize installation
-
Verify that the Project Syn-managed machine sets have been provisioned
kubectl -n openshift-machine-api get machineset -l argocd.argoproj.io/instance
The command should show something like
NAME DESIRED CURRENT READY AVAILABLE AGE app 3 3 3 3 4d5h (1) infra 4 4 4 4 4d5h (1)
1 The values for DESIRED
andAVAILABLE
should match.If there’s discrepancies between the desired and available counts of the machine sets, you can list the machine objects which aren’t in phase "Running":
kubectl -n openshift-machine-api get machine | grep -v Running
You can see errors by looking at an individual machine object with
kubectl describe
. -
If the Project Syn-managed machine sets are healthy, scale down the initial worker machine set
If the Project Syn-managed machine sets aren’t healthy, this step may reduce the cluster capacity to the point where infrastructure components can’t run. Make sure you have sufficient cluster capacity before continuing.
INFRA_ID=$(jq -r .infraID "${INSTALLER_DIR}/metadata.json") kubectl -n openshift-machine-api patch machineset ${INFRA_ID}-worker \ -p '{"spec": {"replicas": 0}}' --type merge
-
Once the initial machine set is scaled down, verify that all pods are still running. The command below should produce no output.
kubectl get pods -A | grep -vw -e Running -e Completed
-
If all pods are still running, delete the initial machine set
kubectl -n openshift-machine-api delete machineset ${INFRA_ID}-worker
-
Clean up the vSphere CA certificate
for cert in certs/lin/*.0; do sudo rm /usr/local/share/ca-certificates/$(basename ${cert}.crt); done sudo update-ca-certificates
-
Delete local config files
rm -r ${INSTALLER_DIR}/