Manage Operator Managed PrometheusRules
Problem
OpenShift operators manage their PrometheusRules (and alerts), we can’t alter their definition.
The current solution is to find the upstream rules in the source repositories, copy those rules, label alerts we see as useful with syn=true
, and silence alerts without that label.
This is a manual process, as the source location may change with every change in the upstream repository.
Some rules exist only embedded in Go code.
Rollout of new PrometheusRules must be coordinated with the corresponding change in the operator.
Proposals
Option 1: Use a policy tool
We could evaluate a policy tool that helps us meet our requirements. Such a tool could also help with other tasks we may want to automate.
The policy tools we’ve evaluated in the past, like Kyverno, have a lot of features that we don’t need. Those features make the tool more complex to use and run than necessary.
Option 2: Create own dedicated controller
We can create our own dedicated operator that watches for changes in OpenShift operator managed PrometheusRules and dynamically copy/update and label these alerts.
Implementing a dedicated operator for managing these PrometheusRules would be straightforward. We already implemented other controller/operator in situations where we run into limitations of existing tools.
Option 3: Create more generalized copy/patch operator
We’ve got quite a few other edge-cases where we need to copy or patch resources based on other resources. We use a mix of custom scripts, cron jobs, controllers and other tools to solve those problems. We could implement a more generalized copy/patch operator that could be used for other resources as well.
This would allow us to replace multiple tools and lower our operational overhead tracking and rolling out upstream changes of those tools.
By using Jsonnet as a templating engine we can create a very powerful and flexible tool that can be used for many different use-cases.
Option 4: Use Crossplane Compositions
Compositions are a template for creating multiple managed resources as a single object.
We could use Crossplane Compositions to create a template for creating PrometheusRules.
Composition functions allow Go code to be executed to generate resources.
This would allow us templating in a fully fledged programming language.
Crossplane was primarily designed to manage external resources.
It’s a CNCF project and moved to Incubating
status in 2021.
Using Crossplane comes with a huge overhead in both learning and operational costs.
We would need to learn a complex new framework and tooling.
Since functions need to be compiled and deployed the iteration cycle is much slower and more complex to debug.
Composition functions don’t seem to always be enough and provider-kubernetes
is also required.
We’re not sure how well Crossplane handles resources primarily managed by an external party and how well server-side apply works.
While we’d use Go for functions, there’s still an amount of YAML that needs to be written. This removes the most positive aspect of having the full Go testing and linting toolchain available.
VSHNs flagship project, Servala, also uses Crossplane behind the scenes. Servala is installed on almost every cluster and we’d most likely need to solve issues of interdependencies between the two projects.
Crossplane constantly fights with performance issues and the complexity of the project. See Crossplane issue 316 for an example.
Internal reviews of Crossplane also note the complexity of compositions, the steep learning curve, and the issues with debugging. It’s a footgun that’s loaded and with the safety off.
Rationale
By implementing our own generalized copy/patch operator we can adapt better to changes in the upstream PrometheusRules.
Creating or patching resources based on other resources is an issue we encounter constantly. We already have tools in place to solve those problems, but all of them address a special case which could be unified in a more general approach. This would allow us to replace multiple tools and lower our operational overhead tracking and rolling out upstream changes of those tools.