Admin kubeconfig management

Problem

We currently store the kubeconfig for the system:admin user which is generated by the openshift-install program in Passbolt for emergency access to clusters. The client certificate generated for that kubeconfig has a lifetime of 10 years. Unfortunately, Kubernetes doesn’t support revoking client certificates, see this GitHub issue (kubernetes/kubernetes#18982).

We would like to have another form of emergency access to OpenShift 4 clusters. The main reason is that having credentials with a lifetime of 10 years which can’t be revoked is less than ideal.

Goals

  • Define a method to manage emergency access credentials for OpenShift 4 clusters

  • The credentials should be relatively short-lived and it must be possible to rotate them

Non-Goals

  • Replace regular authentication

Proposals

While writing out the proposals, we identified that any solution to manage admin credentials is composed from two largely independent choices: First, there’s multiple credential types (client certificates or service account tokens) which can be used for the admin credentials. Second, there’s multiple possible implementations for managing the admin credentials on each cluster.

Credential type

In this section, we briefly outline the possible credential types that we can use for the admin credentials.

Issue short-lived certificates with cluster-admin privileges

The first approach is that we issue client certificates with cluster-admin privileges. This can be done either through Kubernetes' CertificateSigningRequest (CSR) resources, or by manually issuing certificates against a self-signed CA certificate which is installed as a client CA certificate in the cluster.

One point to consider is that Kubernetes doesn’t support issuing client certificates for group system:masters through CSR signer kubernetes.io/kube-apiserver-client. However, group system:cluster-admins is allowed, and functionally equivalent on OpenShift 4.

Note that we can’t revoke certificates issued through CSRs or through a self-managed CA certificate. However, if we use a self-managed CA certificate, we can invalidate any existing certificates by rotating the CA and issuing a new certificate from the new CA.

Use service account tokens with cluster-admin privileges

The second approach is that we setup a Kubernetes service account which is granted cluster-admin privileges through a ClusterRoleBinding and issue service account tokens for that service account.

We’ve got two options to generate tokens for service accounts:

  1. The TokenRequest API allows us to generate service account tokens which expire after a defined amount of time. However, tokens which are manually created through the TokenRequest API (for example with kubectl create token) can’t be invalidated before they expire.

  2. [Non-expiring API tokens] are created by defining a secret of type kubernetes.io/service-account-token. As the name suggests, these tokens don’t expire.

The only way to permanently invalidate service account tokens (both non-expiring and time-bound) is to delete the service account. Creating a new service account with the same name in the same namespace doesn’t reactivate tokens associated with a previous service account, since the tokens contain the service account’s Kubernetes resource UID.

The proposed approach for using service account tokens is to use the TokenRequest API to create short-lived API tokens to generate expiring admin credentials by default. Additionally, introduce a mechanism to force the tool to recreate the service account to invalidate any old tokens that might have leaked. That mechanism might be as simple as having the tool reconcile the service account and recreate it if it gets deleted.

Credential management

In this section, we outline some possible approaches for managing the admin credentials on each cluster.

Extend Steward to manage credentials and write them to Vault

We can extend Steward to manage and renew the credentials and store them in Vault.

This allows us to issue relatively short-lived credentials (on the order of days), which limits the attack surface presented by engineers accessing admin credentials in emergency situations.

Optionally, we can also extend Steward to render a full kubeconfig file based on the managed credentials and store that file in Vault in addition to the raw credentials. If we store a full kubeconfig file in Vault, we can document a single vault CLI command which fetches the emergency kubeconfig for a cluster.

Create a new custom controller which manages the credentials on the cluster and writes them to an external secrets store

Instead of extending Steward, we could also create a new controller which manages admin credentials and writes them to an external secrets store. This would provide some level of separation of concerns, since managing admin credentials isn’t necessarily part of the Project Syn bootstrap process. Additionally, having a separate tool allows us to have releases independent of the fairly complex Steward release process. Finally, this gives us some freedom, as we’re more decoupled from Project Syn and don’t necessarily need to write the credentials to the Project Syn Vault.

Manage credentials by hand

Another approach is to manage and renew the admin credentials by hand.

Decision

Use service account tokens generated through the TokenRequest API and implement a custom controller to manage the service account, cluster role binding and tokens.

Rationale

We’ve decided to use service account tokens generated through the TokenRequest API, since that’s the approach which needs the least amount of custom work. By using service account tokens, we’ve got a simple mechanism to revoke old access credentials (delete the service account). Additionally, we don’t need to manage a custom CA with this approach.

We’ve decided to implement a custom controller over extending Steward’s functionality for multiple reasons:

  • By implementing a separate controller, we aren’t bound to Steward’s release process to implement and improve the admin certificate management.

  • A separate controller can be tested and developed in isolation without having to worry about a locally executed Steward breaking a cluster’s Project Syn setup.

  • A tool like this may be useful outside Project Syn

  • By not closely coupling this tool with Steward, we keep our options open in regard to where we save the credentials (Vault or Passbolt). If this would be integrated with Steward, it would be almost mandatory to save the credentials in the Project Syn Vault.

Finally, by storing the credentials in an external service (such as Passbolt), we ensure that we don’t store the emergency access credentials on the system itself (for the cluster which hosts the Project Syn Vault).