Espejote: An in-cluster templating controller
Problem Statement
Not every cluster configuration is fully static and can be managed through GitOps.
Some configurations depend on the cluster state or might want to be managed by upstream, by end-users, or by third-party products directly.
We got an extensive array of tools to manage these configurations, from bash "reconcilers," to cron jobs, to policy engines to fully fledged custom controllers. Those tools are all specialized and don’t cover all use-cases.
A maintainable, fully instrumented, and extensible controller that can manage arbitrary configurations in a cluster is missing. We decided to try our hand at creating one.
High Level Goals
-
We can manage arbitrary configurations in a cluster
-
The configuration may depend on state managed by other controllers inside the cluster
-
We can apply partial configurations to externally managed resources
-
-
The amount of requests and caching is tuneable
-
We can monitor failures and successes and have a tight feedback loop
Non-Goals
-
Centralized configuration management or GitOps
-
Policy enforcement
-
Security or access control
Name
"Espejote" for "big mirror" in Spanish. It’s a play on the earlier project "Espejo" which means "mirror." "Espejo" is a tool to mirror resources to multiple namespaces. "Espejote" is a more general tool to manage arbitrary configurations in a cluster.
Implementation
The controller is based on controller-runtime
.
It’s managed through a Custom Resource Definition (CRD) called (Cluster)ManagedResource
.
The controller uses Jsonnet to render manifests.
Any Kubernetes manifest can be added to the templating context.
Reconcile triggers are separated from the context resources to allow for more fine-grained control over the reconciliation.
Reconcile triggers
Reconcile triggers are used to re-render the template. They’re separated from the context resources to allow for more fine-grained control.
triggers:
- interval: 10s
- watchResource:
apiVersion: v1
kind: ConfigMap
labelSelector:
matchExpressions:
- key: espejote.io/created-by
operator: DoesNotExist
minTimeBetweenTriggers: 10s
- watchResource:
apiVersion: v1
kind: ConfigMap
name: my-configmap
The trigger that fires is exposed to the Jsonnet as the trigger
variable.
local trigger = std.extVar('trigger');
Currently, we support two types of triggers:
watchResource
The template is re-rendered whenever the watched resource is created, changed, or deleted.
The minTimeBetweenTriggers
field can be used to limit the rate of re-renders.
This is useful to prevent a high rate of re-renders when a resource is changed multiple times in a short period.
If a list of resources is watched, the minTimeBetweenTriggers
is applied to each resource individually.
The watchResource
trigger contains the full Kubernetes resource definition of the watched resource.
Context
The context is a list of resources that are available in the Jsonnet template. They’re not watched. If a watch is required, it should be added as a trigger.
context:
- def: configs
resource:
apiVersion: v1
kind: ConfigMap
labelSelector:
matchExpressions:
- key: espejote.io/created-by
operator: DoesNotExist
cache: true
- def: config
resource:
apiVersion: v1
kind: ConfigMap
name: my-configmap
The context is exposed to the Jsonnet as the context
variable.
local configs = std.extVar('context').configs;
local config = std.extVar('context').config;
Currently, we support one type of context resource:
resource
A Kubernetes resource that and available in the Jsonnet template.
The resource can be selected by name
or labelSelector
.
matchNames
and ignoreNames
can be used to filter the resources after the labelSelector is applied.
matchNames
is a list of names that should be matched.
ignoreNames
is a list of names that should be ignored.
Label selectors should be preferred over matchNames
and ignoreNames
because the resource can already be filtered on the API server.
It’s always a list with zero or more items. This is true even if a single resource is selected using the name field.
The cache
field can be used to setup a controller-runtime cache for the resource.
This is a in-memory cache with a watch on the resource.
This is true by default.
Can be disabled for a trade-off between memory usage and API requests.
Template
The template is a Jsonnet template that returns a list of Kubernetes resources.
It can pull in external resources through the context and triggers.
It can return a list of resources or a single resource. The resources are applied in the order they’re returned.
Template libraries
The template can use libraries to share code between templates.
The libraries are stored as JsonnetLibrary
resources.
The name of the resource is used as the library name.
// local myLibrary = import 'lib/RESOURCE_NAME/KEY';
local myLibrary = import 'lib/my-library/my-function.libsonnet';
Deletion
Deletion can be achieved by returning a resource with a special deletion marker.
{
'$DELETE': true,
apiVersion: 'v1',
kind: 'ConfigMap',
metadata: {
namespace: 'my-namespace',
name: 'my-configmap',
}
}
Instrumentation
Prometheus metrics
The controller is instrumented with Prometheus metrics.
The metrics are exposed on the /metrics
endpoint.
The metrics should contain the following:
-
Amount of re-renders per trigger
-
Amount of re-renders per resource
-
Amount of errors per re-render
-
Amount of resources applied per re-render
The metrics should be labeled with the ManagedResource
name and namespace.
Apply options
-
applyOptions.forceConflicts
can be used to resolve field conflicts. The ManagedResource will become the sole owner of the field. -
applyOptions.createOnly
can be used to only create the resource. If the resource already exists, the ManagedResource won’t update it.
Permissions and RBAC
The ManagedResources
should run with the least permissions possible.
To start the ManagedResources
will run in the context of the controller.
Cluster scoped ManagedResources
will have access to the whole cluster.
Namespace scoped ManagedResources
will have access to the namespace they’re in.
The namespace isolation is only enforced by the controller and not by Kubernetes.
Running ManagedResources
in the context of a service account
The end goal is to run the ManagedResources
in the context of a service account, defined in the ManagedResource
.
This will allow for more fine-grained access control.
The RedHat patch operator is a good example of how this can be implemented.
Testing
The controller should have a testing utility integrated that will render the template with a given context and triggers. The rendered manifests could then be compared to a golden file, applied to a cluster, or parts matched using further testing frameworks.
For each trigger a rendered YAML is created with the output of the template.
The utility checks that the triggers and contexts in the test file match the triggers and contexts in the ManagedResource
.
triggers: (1)
- interval: {}
- watchResource:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-configmap
data:
key: value
context:
- def: configs
resources:
- apiVersion: v1
kind: ConfigMap
metadata:
name: my-configmap
data:
key: value
1 | The template is rendered for every trigger. If multiple different contexts variations are required, a new file should be created for each combination. |
espejote render --managed-resource my-template.yaml --inputs my-input-1.yaml my-input-2.yaml (1)
1 | The command can be used with multiple input files to test different context variations. |
Manifests
JsonnetLibrary
apiVersion: espejote.io/v1alpha1
kind: JsonnetLibrary
metadata:
name: my-library
namespace: controller-namespace
data:
my-function.libsonnet: |
{
myFunction: function() 42,
}
hello-world.libsonnet: |
{
helloWorld: function(name) 'Hello, %s!' % [name],
}
ManagedResource
apiVersion: espejote.io/v1alpha1
kind: ManagedResource
metadata:
name: copy-configmap
namespace: my-namespace (1)
spec:
applyOptions: (2)
forceConflicts: true
triggers:
- interval: 10s (3)
- watchResource: (4)
apiVersion: v1
kind: ConfigMap
# name: my-configmap (5)
labelSelector: (6)
matchExpressions:
- key: espejote.io/created-by
operator: DoesNotExist
# matchNames: [] (7)
# ignoreNames: [] (8)
minTimeBetweenTriggers: 10s (9)
context: (10)
- def: cm (11)
resource: (12)
apiVersion: v1
kind: ConfigMap
# name: my-configmap
# matchNames: []
# ignoreNames: []
labelSelector:
matchExpressions:
- key: espejote.io/created-by
operator: DoesNotExist
cache: true (13)
template: |
local trigger = std.extVar('trigger'); (14)
local cm = std.extVar('context').cm; (15)
[ (16)
c {
metadata: {
name: 'copy-of-' + c.metadata.name,
// namespace: c.metadata.namespace, (17)
labels+: {
'espejote.io/created-by': 'copy-configmap'
}
}
},
for c in cm.items (18)
]
serviceAccountName: my-service-account (19)
1 | ManagedResource are always namespace-scoped and can’t access any resources outside of their namespace. |
2 | applyOptions can be used to resolve field conflicts. |
3 | interval triggers the template every 10 seconds. |
4 | watchResource triggers the template whenever the watched resource is created, changed, or deleted. |
5 | name is optional and can be used to select a specific resource. |
6 | labelSelector is optional and can be used to a specific set of resources. |
7 | matchNames is optional and can be used to filter the resources after the labelSelector is applied. |
8 | ignoreNames is optional and can be used to filter the resources after the labelSelector is applied. |
9 | minTimeBetweenTriggers is optional and can be used to limit the rate of re-renders.
The re-renders are limited by the specific unique resource. |
10 | Context is a list of resources that are available in the Jsonnet template. They’re not watched. |
11 | def defines a variable in the Jsonnet template. |
12 | resource is a Kubernetes resource that’s available in the Jsonnet template. |
13 | cache is optional and can be used to setup a controller-runtime cache for the resource.
The default is true. |
14 | std.extVar is used to access the trigger data.
The full manifest is available if using the watchResource trigger. |
15 | std.extVar is used to access the context.
It returns a object with all defined resources as keys. |
16 | The template can return a list of resources or a single resource. |
17 | The namespace of the resource will always be overwritten to the namespace of the ManagedResource . |
18 | Resource variables are always a list with zero or more items. |
19 | serviceAccountName is optional and can be used to run the ManagedResource in the context of a service account.
This will be implemented in the future but not in the initial versions. |
ClusterManagedResource
apiVersion: espejote.io/v1alpha1
kind: ClusterManagedResource
metadata:
name: inject-configmaps
spec:
triggers:
- watchResource:
apiVersion: v1
kind: Namespace (1)
labelSelector:
matchExpressions:
- key: inject-cm.syn.tools
operator: Exists
context:
- def: base
resource:
apiVersion: v1
kind: ConfigMap
name: cm-to-inject
template: |
local ns = std.extVar('trigger').resource; (2)
local base = std.extVar('context').base[0];
[
{
apiVersion: 'v1',
kind: 'ConfigMap',
metadata: {
name: 'injected-cm',
namespace: ns.metadata.name, (3)
}
data: base.data,
}
]
1 | Cluster scoped ClusterManagedResource can access all resources in the cluster.
This includes cluster scoped resources such as Namespace . |
2 | The full manifest of the triggering resource is available. |
3 | A cluster managed resource can create resources in any namespace and thus needs to set the namespace. |
Sample use-cases
Sync upgrade notifications with a ArgoCD sync hook and bash script
We want to show a notification in the OpenShift Console when a minor upgrade is scheduled. To implement this we added a job triggered by an ArgoCD sync. The job is reading another resource and creating a console notification manifest from it.
Sync OpenShift Console TLS Secret with a bash "reconciler"
Cert-manager only allows to create the secret holding the certificate data in the same namespace as the Certificate
resource.
The OpenShift Console route is in the openshift-console
namespace, but the secret needs to be applied in the openshift-config
namespace.
However, we must create the Certificate
resource in openshift-console
, since otherwise the OpenShift ingress doesn’t admit the HTTP challenge ingress.
We use a bash reconciler to create the secret in the correct namespace.
source_namespace="openshift-console"
target_namespace="openshift-config"
# # Wait for the secret to be created before trying to get it.
# # TODO: --for=create is included with OCP 4.17
# kubectl -n "${source_namespace}" wait secret "${SECRET_NAME}" --for=create --timeout=30m
echo "Waiting for secret ${SECRET_NAME} to be created"
while test -z "$(kubectl -n "${source_namespace}" get secret "${SECRET_NAME}" --ignore-not-found -oname)" ; do
printf "."
sleep 1
done
printf "\n"
# When using -w flag kubectl returns the secret once on startup and then again when it changes.
kubectl -n "${source_namespace}" get secret "${SECRET_NAME}" -ojson -w | jq -c --unbuffered | while read -r secret ; do
echo "Syncing secret: $(printf "%s" "$secret" | jq -r '.metadata.name')"
kubectl -n "$target_namespace" apply --server-side -f <(printf "%s" "$secret" | jq '{"apiVersion": .apiVersion, "kind": .kind, "metadata": {"name": .metadata.name}, "type": .type, "data": .data}')
done
Default network policies for namespaces with espejo
We apply default network policies to all namespaces using espejo
.
apiVersion: sync.appuio.ch/v1alpha1
kind: SyncConfig
[...]
spec:
namespaceSelector:
ignoreNames:
- my-ignored-namespace
labelSelector:
matchExpressions:
- key: network-policies.syn.tools/no-defaults
operator: DoesNotExist
syncItems:
- apiVersion: networking.k8s.io/v1
[...]