Names and name length on Google Cloud Platform

Abstract
Node names are limited to a length of 63 characters [1]. If the GCP project and region names are too long, provisioning of nodes won’t work and the installer will fail.

When installing a new OpenShift cluster, the installer will create a lot of names automatically. Some parts of those names are generated by the installer, others are derived from the underlying cloud.

Let us have a look at how the name of a node is built.

<node/hostname> ::= <machine name>.<region>-<zone>.c.<project name>.internal
<machine name> ::= <machine set name>-<random>
<machine set name> ::= <cluster id>-w-<zone>

<cluster id> and <random> are generated by the installer. <zone>, <region> and <project> are derived from GCP. As node names are limited to 63 characters [1], this can become an issue.

The <cluster id> will have a length of twelve characters, <zone> is just one characters and <random> has a length of five. Summing up all the characters that are static and or are generated by the installer, we end up at 37 (see example below). This leaves us with 26 characters to be distributed between the project name and the region.

The length of GCP region names vary between eight and 23. In the best case, the project can be 18 (\$63 - 37 - 8\$) characters long. In the worst case, only three (3, \$63 - 37 - 23\$) characters are available.

For Zürich (europe-west6), the project length must not exceed 14 (\$63 - 37 - 12\$) characters.

Example from an actual cluster which exceeded the maximum.
opensh-5g8lh-w-a-xbtr4.europe-west6-c.c.openshift-4-poc-1.internal
opensh-5g8lh-w-a-xbtr4.<region>-c.c.<project>.internal
opensh-5g8lh-w-a-xbtr4.-c.c..internal
  1. What happens when the node name exceeds 63 characters?

    The status of the Machine object will be Provisioned but no Node object will show up. The CertificateSigningRequest won’t get approved (remains in Pending) and a new one will be created every few seconds.

    When SSH into the affected VM, one can observe that there is no /etc/hostname file and that the hostname is identified as localhost.

    The kublet log will contain something that looks like the following:

    May 13 13:38:18 opensh-5g8lh-w-a-xbtr4.europe-west6-a.c.openshift-4-poc-1.intern hyperkube[5777]: E0513 13:38:18.478461    5777 kubelet_node_status.go:92] Unable to register node "opensh-5g8lh-w-a-xbtr4.europe-west6-a.c.openshift-4-poc-1.intern" with API server: Node "opensh-5g8lh-w-a-xbtr4.europe-west6-a.c.openshift-4-poc-1.intern" is invalid: metadata.labels: Invalid value: "opensh-5g8lh-w-a-xbtr4.europe-west6-a.c.openshift-4-poc-1.intern": must be no more than 63

    When installing a new cluster, the installer log will look something like the following:

    […]
    INFO Waiting up to 30m0s for the cluster at https://api.openshift-4-poc-1.openshift-4-poc-1.appuio-beta.ch:6443 to initialize...
    ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::RouteStatus_FailedHost: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
    RouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.openshift-4-poc-1.openshift-4-poc-1.appuio-beta.ch: []
    INFO Cluster operator authentication Progressing is Unknown with NoData:
    INFO Cluster operator authentication Available is Unknown with NoData:
    INFO Cluster operator console Progressing is True with RouteSync_FailedHost: RouteSyncProgressing: route is not available at canonical host []
    INFO Cluster operator console Available is Unknown with NoData:
    INFO Cluster operator image-registry Available is False with NoReplicasAvailable: The deployment doesn't have available replicas
    INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: The deployment has not completed
    ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: default
    INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.
    Moving to release version "4.4.3".
    Moving to ingress-controller image version "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:437b6cd81ccb4fc44d77904d1c2cbc3fceb95c96e0338b4f63b5577035f84a64".
    INFO Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available.
    INFO Cluster operator insights Disabled is False with :
    INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available
    ERROR Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for RouteReady of alertmanager-main: no status available for alertmanager-main
    INFO Cluster operator monitoring Available is False with :
    INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
    FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, csi-snapshot-controller, image-registry, ingress, kube-storage-version-migrator, monitoring
  2. What to do if the length will be exceed and the project name can not be shortened?

    Do the cluster setup as normal. The API will come up successfully but the installer will fail. Once this happened, export the MachineSet objects created by the installer. Delete them and apply them again from the export but with a shorter name.