VSHN Managed PostgreSQL
Problem
Creating our own operator to deploy and manage PostgreSQL proved to be a gargantuan effort. In order to keep in line with the newly specified "more buy than make" approach, we’re evaluating the viability of using existing operators for the use of converged services.
-
Why build on our own when there are already plenty of viable alternatives around?
-
Unpredictable amount of maintenance needed to keep our own solution up-to-date and maintained
-
What do we gain when building our own vs. using an existing solution and integrating it into our systems
Evaluated Operators
Requirements |
||||||
Superuser Access |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Database Management [1] |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
User Management [1] |
❌ |
❌ |
❌ |
✅ |
✅ |
✅ |
Service Metrics |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
Maintenance Schedule |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
Backup Schedule |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
Self Service Backup Restore |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
Encryption at rest [2] |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Connection limit of 25/Gb Memory [3] |
✅ |
✅ |
✅ |
❌ |
✅ |
✅ |
Extension Management [1] |
❌ |
✅ |
❌ |
✅ |
❌ |
✅ |
Custom PostgreSQL Settings |
✅ |
✅ |
✅ |
❌ |
✅ |
✅ |
In-Place Major Upgrade |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
License |
Apache License 2.0 |
AGPL-3.0 |
Apache License 2.0 |
MIT |
Apache License 2.0 |
Apache License 2.0 |
Upstream Support |
Some support available not clear if for operators |
✅ |
✅ (via EDB) |
❌ |
✅ |
❌ |
See Operators Compared for a more detailed comparison.
For the POC on which this decision is based on, we opted to test out Stackgres. It’s feature set and general feel were good during the evaluation phase.
Pros
-
Brings cluster functionality
-
Backups out of the box, incl. Point-In-Time
-
Automatic deployments of Prometheus service monitors
-
Brings DbOps CRs that handle some operations in a k8s native way
-
Security update schedules can be implemented
-
The operator is maintained by a third party
-
Commercial support available
Cons
-
We can’t support PostgreSQL version or extensions that the operator doesn’t provide
-
We have to deal with some opinionated decision by the operator developers and maybe need to work around them, or accept them as given
-
The operator is maintained by a third party, but is open source and we could contribute if needed
-
Upgrading the operator could introduce breaking changes
Risks
Operator Unmaintained
- Risk
-
The Operator becomes unmaintained
- Option 1
-
We or the community carries the project further
- Option 2
-
Migration to another operator
Decision
The team had no objections by going forward with Stackgres and Crossplane to develop the VSHN Managed PostgreSQL service.
Operators Compared
Percona
Install
- Operator
helm install my-operator percona/pg-operator
Seems to run a k8s job that runs Ansible, which in turn deploys the operator.
- Instance/Cluster
helm install my-db percona/pg-db
Prometheus
They have their own so called Percona Monitoring and Management (PMM). Based on Vicoria Metrics and thus Prometheus conform.
However we’d probably have to bring our own exporters if we want to integrate with Platform Monitoring. docs.percona.com/percona-monitoring-and-management/details/architecture.html#pmm-server
Postgres Config
Supports global and instance scoped configs. www.percona.com/doc/kubernetes-operator-for-postgresql/options.html
Backup
By default it creates a backup to a local PVC. S3 backup can be configured. www.percona.com/doc/kubernetes-operator-for-postgresql/backups.html
User/Database Management
Simple password management for the created users. Doesn’t seem to have the ability to create new users. www.percona.com/doc/kubernetes-operator-for-postgresql/users.html
Updates
- Operator
-
The clusters have to be stopped in order for the operator to be updated.
- Instances
-
Provides fully automatic and schedulable instance/cluster upgrades: www.percona.com/doc/kubernetes-operator-for-postgresql/update.html#automatic-upgrade
Stackgres
Install
- Operator
kubectl apply -f 'https://sgres.io/install'
Also helm available.
helm install --namespace stackgres stackgres-operator https://stackgres.io/downloads/stackgres-k8s/stackgres/latest/helm/stackgres-operator.tgz
- Instance/Cluster
cat << 'EOF' | kubectl create -f -
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: simple
spec:
instances: 2
postgres:
version: 'latest'
pods:
persistentVolume:
size: '5Gi'
EOF
Prometheus
Has Prometheus integration. Can also be integrated into existing Prometheus operator and Grafana. Brings some dashboards. stackgres.io/doc/latest/install/prerequisites/monitoring/
Postgres Config
Global and instance specific. stackgres.io/doc/latest/administration/custom/postgres/config/
Some configs are not adjustable to guarantee a working cluster: stackgres.io/doc/latest/reference/crd/sgpgconfig/
Backup
Backup to S3 and other object storage. No backup to PVC available. stackgres.io/doc/latest/install/prerequisites/backups/
User/Database Management
No user or database management at all. Gives you the postgres user by default.
If we want that then we’d have to use crossplane-provider-sql.
Misc
-
GUI
-
runbooks stackgres.io/doc/latest/runbooks/
-
extensions: stackgres.io/doc/latest/administration/extensions/
-
timescaledb, see extensions
-
rest API
-
TLS (needs to be enabled per cluster), certificates have to be generated
Updates
After upgrading the operator, all new clusters are provisioned with updated versions. Existing clusters need to be restarted in order to get the updates. stackgres.io/doc/latest/install/helm/upgrade/
For the clusters themselves the operator brings an ops CRD. With this CRD various operations can be triggered, like minor and security updates. As well as major updates. These can be scheduled.
CloudnativePG
Install
- Operator
kubectl apply -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.15.1.yaml
- Instance/Cluster
cat << 'EOF' | kubectl create -f -
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 3
# Example of rolling update strategy:
# - unsupervised: automated update of the primary once all
# replicas have been upgraded (default)
# - supervised: requires manual supervision to perform
# the switchover of the primary
primaryUpdateStrategy: unsupervised
# Require 1Gi of space
storage:
size: 1Gi
EOF
Prometheus
Brings exporters for each instance. A unique feature of this exporter is, that any given SQL query can be transformed into a metric. But it doesn’t provide that many metrics out of the box.
User/Database Management
No user or database management at all. Gives you the postgres user by default.
Misc
-
uses custom clustering technology: cloudnative-pg.io/documentation/1.16/operator_capability_levels/#self-contained-instance-manager
-
docs not well structured
Zalando
Install
- Operator
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm install postgres-operator postgres-operator-charts/postgres-operator
- Instance/Cluster
cat << 'EOF' | kubectl create -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
namespace: default
spec:
teamId: "acid"
volume:
size: 1Gi
numberOfInstances: 2
users:
zalando: # database owner
- superuser
- createdb
foo_user: [] # role for application foo
databases:
foo: zalando # dbname: owner
preparedDatabases:
bar: {}
postgresql:
version: "14"
EOF
Backup
The operator can do physical backups as well as logical backups. The logical backups lack some features. It can only backup all databases and it has no retention management. No backup to PVC, only to object storages.
User/Database Management
The operator provides full user and database management, see installation of cluster.
Crunchydata
Install
- Operator
-
Clone github.com/CrunchyData/postgres-operator-examples locally.
cd postgres-operator-examples
kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default
- Instance/Cluster
kubectl apply -k kustomize/postgres
Prometheus
Exporters can easily be enabled. They provide prometheus configs and Grafana dashboards in the example repository.
Updates
- Operator
-
They provide good release notes and upgrade instructions: access.crunchydata.com/documentation/postgres-operator/5.1.2/upgrade/kustomize/
- Instances
-
The oprerator supports minor and bugfix updates of the instances/clusters.
Bitnami Helmchart
While technically not an operator, it’s included for completeness sake.
Install
- Instance/Cluster
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/postgresql-ha
Updates
Bitnami keeps track of breaking changes in the chart: github.com/bitnami/charts/tree/master/bitnami/postgresql-ha/#upgrading
It looks like the chart provided images only support minor updates. No mention of major upgrade support.