VSHN Managed PostgreSQL

Problem

Creating our own operator to deploy and manage PostgreSQL proved to be a gargantuan effort. In order to keep in line with the newly specified "more buy than make" approach, we’re evaluating the viability of using existing operators for the use of converged services.

  • Why build on our own when there are already plenty of viable alternatives around?

  • Unpredictable amount of maintenance needed to keep our own solution up-to-date and maintained

  • What do we gain when building our own vs. using an existing solution and integrating it into our systems

Evaluated Operators

Requirements

Percona

Stackgres

CloudnativePG

Zalando

Crunchydata

Bitnami Helmchart

Superuser Access

Database Management [1]

User Management [1]

Service Metrics

Maintenance Schedule

Backup Schedule

Self Service Backup Restore

Encryption at rest [2]

Connection limit of 25/Gb Memory [3]

Extension Management [1]

Custom PostgreSQL Settings

In-Place Major Upgrade

License

Apache License 2.0

AGPL-3.0

Apache License 2.0

MIT

Apache License 2.0

Apache License 2.0

Upstream Support

Some support available not clear if for operators

✅ (via EDB)

See Operators Compared for a more detailed comparison.

For the POC on which this decision is based on, we opted to test out Stackgres. It’s feature set and general feel were good during the evaluation phase.

Pros

  • Brings cluster functionality

  • Backups out of the box, incl. Point-In-Time

  • Automatic deployments of Prometheus service monitors

  • Brings DbOps CRs that handle some operations in a k8s native way

  • Security update schedules can be implemented

  • The operator is maintained by a third party

  • Commercial support available

Cons

  • We can’t support PostgreSQL version or extensions that the operator doesn’t provide

  • Crossplane’s limitations, as discussed here

  • We have to deal with some opinionated decision by the operator developers and maybe need to work around them, or accept them as given

  • The operator is maintained by a third party, but is open source and we could contribute if needed

  • Upgrading the operator could introduce breaking changes

Risks

Operator Unmaintained

Risk

The Operator becomes unmaintained

Option 1

We or the community carries the project further

Option 2

Migration to another operator

CRD Versioning

Risk

CRDs could contain breaking changes that require a manual upgrade path

Option 1

We can handle the changes in the compositions, transparent for the user

Option 2

We inform the customers they have to do an action after the upgrade

License Changes

Risk

The licensing of the operator could change to a hostile license

Option 1

Check availability of forks

Option 2

Migrate to another operator

Decision

The team had no objections by going forward with Stackgres and Crossplane to develop the VSHN Managed PostgreSQL service.

Operators Compared

Percona

Install

Operator
helm install my-operator percona/pg-operator

Seems to run a k8s job that runs Ansible, which in turn deploys the operator.

Instance/Cluster
helm install my-db percona/pg-db

Prometheus

They have their own so called Percona Monitoring and Management (PMM). Based on Vicoria Metrics and thus Prometheus conform.

However we’d probably have to bring our own exporters if we want to integrate with Platform Monitoring. docs.percona.com/percona-monitoring-and-management/details/architecture.html#pmm-server

Postgres Config

Supports global and instance scoped configs. www.percona.com/doc/kubernetes-operator-for-postgresql/options.html

Backup

By default it creates a backup to a local PVC. S3 backup can be configured. www.percona.com/doc/kubernetes-operator-for-postgresql/backups.html

User/Database Management

Simple password management for the created users. Doesn’t seem to have the ability to create new users. www.percona.com/doc/kubernetes-operator-for-postgresql/users.html

Misc

  • Has TLS by default.

  • The project does not seem that open. No Github issues for example.

Updates

Operator

The clusters have to be stopped in order for the operator to be updated.

Instances

Provides fully automatic and schedulable instance/cluster upgrades: www.percona.com/doc/kubernetes-operator-for-postgresql/update.html#automatic-upgrade

Stackgres

Install

Operator
kubectl apply -f 'https://sgres.io/install'

Also helm available.

helm install --namespace stackgres stackgres-operator https://stackgres.io/downloads/stackgres-k8s/stackgres/latest/helm/stackgres-operator.tgz
Instance/Cluster
cat << 'EOF' | kubectl create -f -
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: simple
spec:
  instances: 2
  postgres:
    version: 'latest'
  pods:
    persistentVolume:
      size: '5Gi'
EOF

Prometheus

Has Prometheus integration. Can also be integrated into existing Prometheus operator and Grafana. Brings some dashboards. stackgres.io/doc/latest/install/prerequisites/monitoring/

Postgres Config

Some configs are not adjustable to guarantee a working cluster: stackgres.io/doc/latest/reference/crd/sgpgconfig/

Backup

Backup to S3 and other object storage. No backup to PVC available. stackgres.io/doc/latest/install/prerequisites/backups/

User/Database Management

No user or database management at all. Gives you the postgres user by default.

If we want that then we’d have to use crossplane-provider-sql.

Misc

Updates

After upgrading the operator, all new clusters are provisioned with updated versions. Existing clusters need to be restarted in order to get the updates. stackgres.io/doc/latest/install/helm/upgrade/

For the clusters themselves the operator brings an ops CRD. With this CRD various operations can be triggered, like minor and security updates. As well as major updates. These can be scheduled.

CloudnativePG

Install

Operator
kubectl apply -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.15.1.yaml
Instance/Cluster
cat << 'EOF' | kubectl create -f -
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3

  # Example of rolling update strategy:
  # - unsupervised: automated update of the primary once all
  #                 replicas have been upgraded (default)
  # - supervised: requires manual supervision to perform
  #               the switchover of the primary
  primaryUpdateStrategy: unsupervised

  # Require 1Gi of space
  storage:
    size: 1Gi
EOF

Prometheus

Brings exporters for each instance. A unique feature of this exporter is, that any given SQL query can be transformed into a metric. But it doesn’t provide that many metrics out of the box.

Postgres Config

Supports global and instance specific settings.

Backup

It uses barman as a backup solution. By default no backup to PVC provided.

User/Database Management

No user or database management at all. Gives you the postgres user by default.

Misc

Updates

Operator

This is a two step operation, in addition to the operator update it also needs adjustments for every PostgreSQL pod. However, as of a newer version this can be done without PostgreSQL downtime.

Instances

Only supports minor updates.

Zalando

Install

Operator
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm install postgres-operator postgres-operator-charts/postgres-operator
Instance/Cluster
cat << 'EOF' | kubectl create -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
EOF

Prometheus

No Prometheus exporters included. But provides the ability to implement sidecar pods.

Postgres Config

I was not able to find any information about this in the docs.

Backup

The operator can do physical backups as well as logical backups. The logical backups lack some features. It can only backup all databases and it has no retention management. No backup to PVC, only to object storages.

User/Database Management

The operator provides full user and database management, see installation of cluster.

Misc

  • docs are not very accessible

Updates

Operator

Not much information about the impact of an operator upgrade.

Instances

Supports in-place major upgrades. But they need manual execution of scripts within the instance pods. Major upgrades can also be disabled.

Crunchydata

Install

cd postgres-operator-examples
kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default
Instance/Cluster
kubectl apply -k kustomize/postgres

Prometheus

Exporters can easily be enabled. They provide prometheus configs and Grafana dashboards in the example repository.

Postgres Config

PostgreSQL settings can be injected per instance/cluster.

User/Database Management

The operator provides the ability to manage users and databases

Misc

  • v5 images only via their registry

  • tls by default

Updates

Operator

They provide good release notes and upgrade instructions: access.crunchydata.com/documentation/postgres-operator/5.1.2/upgrade/kustomize/

Instances

The oprerator supports minor and bugfix updates of the instances/clusters.

Bitnami Helmchart

While technically not an operator, it’s included for completeness sake.

Install

Instance/Cluster
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/postgresql-ha

Prometheus

The chart provides abilities to enable exporters.

Postgres Config

Custom postgresql configs can be passed to the chart.

Backup

There’s no included backup with the chart. Has to be engineered manually.

User/Database Management

The chart accepts a secret with users to be provisioned.

Misc

  • tls (certs need to be generated)

Updates

Bitnami keeps track of breaking changes in the chart: github.com/bitnami/charts/tree/master/bitnami/postgresql-ha/#upgrading

It looks like the chart provided images only support minor updates. No mention of major upgrade support.


1. Can be added via crossplane-provider-sql
2. This is a platform feature, APPUiO Cloud provides this.
3. May not be relevant as all solutions have connection pooling. Should probably be scraped.