Restore PostgreSQL by VSHN Database (CNPG)
During emergencies, we need a reliable and precise solution to restore PostgreSQL databases with CloudNativePG (CNPG). This guide provides step-by-step instructions for restoring a CNPG-based PostgreSQL database from backup, helping to minimize downtime and service disruption for customers.
Step 1 - Prepare the Original Cluster for Restore
Before restoring the database, the original cluster must be paused and hibernated to prevent conflicts during the restore operation.
Step 1.1 - Pause the Release
First, get the release name of the database you want to restore and annotate it to stop reconciliation. This prevents Crossplane from making changes during the restore process.
kubectl annotate release <release-name> crossplane.io/paused="true" --as=system:admin
Step 1.2 - Hibernate the CNPG Cluster
Hibernate the CNPG cluster in the instance namespace to stop all database activity. This ensures no writes occur during the restore process.
kubectl annotate clusters.postgresql.cnpg.io postgresql cnpg.io/hibernation="on" --overwrite -n <instance-namespace>
| Hibernation will shut down all PostgreSQL pods in the cluster. Ensure the customer is aware of the downtime. |
Step 2 - Create and Bootstrap the Restore Cluster
Create a temporary restore cluster from the backup to retrieve the data.
Step 2.1 - Create the Restore Cluster
Create a new CNPG cluster that bootstraps from the existing backup. Apply the following manifest in the instance namespace:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgresql-restored
namespace: <instance-namespace> (1)
spec:
instances: 1
# Use PostgreSQL version to match original cluster
imageCatalogRef:
apiGroup: postgresql.cnpg.io
kind: ImageCatalog
major: 16 (2)
name: postgresql
# Bootstrap from backup using recovery
bootstrap:
recovery:
source: postgresql-backup
# Optional: Point-in-time recovery (PITR)
# recoveryTarget:
# targetTime: "2023-08-11 11:14:21.00000+02" (3)
# Reference to the original cluster backup
externalClusters:
- name: postgresql-backup
plugin:
name: barman-cloud.cloudnative-pg.io
parameters:
barmanObjectName: postgresql-object-store
serverName: postgresql
# Storage configuration - match original cluster
storage:
size: 20Gi (4)
walStorage:
size: 1Gi
# Match original cluster settings
postgresql:
parameters:
timezone: Europe/Zurich
max_connections: "100"
# Resources matching original
resources:
limits:
cpu: 400m
memory: 1936Mi
requests:
cpu: 400m
memory: 1936Mi
# Enable superuser access for manual operations
enableSuperuserAccess: true
logLevel: info
| 1 | The instance namespace where the original cluster is running |
| 2 | PostgreSQL major version - must match the original cluster version |
| 3 | Optional: Specify a timestamp for point-in-time recovery to restore to a specific moment |
| 4 | Storage size should match or exceed the original cluster’s storage |
Step 2.2 - Wait for Restore Cluster to be Ready
Monitor the restore cluster until it’s fully operational:
kubectl get clusters.postgresql.cnpg.io postgresql-restored -n <instance-namespace> --watch
Check the logs to verify the restore is successful:
kubectl logs postgresql-restored-1 -n <instance-namespace> -f
Step 3 - Synchronize Data to Original Cluster
Use rsync jobs to copy the restored data and WAL files to the original cluster’s persistent volumes.
Step 3.1 - Identify the Primary Instance (HA Clusters)
If the original cluster has multiple replicas (HA setup), you must identify the primary instance to ensure data is synced to the correct PVCs.
kubectl get clusters.postgresql.cnpg.io -n <instance-namespace>
Example output:
NAME AGE INSTANCES READY STATUS PRIMARY postgresql 24m 3 Cluster in healthy state postgresql-1
The PRIMARY column shows which instance is the primary (in this example, postgresql-1).
For HA clusters, sync data only to the primary instance’s PVCs. In the rsync jobs below, use the primary instance name in the target PVC names (for example, postgresql-1 and postgresql-1-wal).
|
Step 3.2 - Sync WAL Storage
Create a job to synchronize the Write-Ahead Log (WAL) storage from the restored cluster to the original cluster:
apiVersion: batch/v1
kind: Job
metadata:
name: rsync-pgwal
namespace: <instance-namespace>
spec:
template:
spec:
restartPolicy: Never
containers:
- name: rsync
image: alpine:latest
command:
- sh
- -c
- |
set -e
echo "Installing rsync..."
apk add --no-cache rsync
echo "Source WAL PVC contents:"
ls -la /source/
echo "Target WAL PVC contents before rsync:"
ls -la /target/
echo "Starting rsync from postgresql-restored-1-wal to postgresql-1-wal..."
rsync -av --delete /source/ /target/
echo "Rsync completed successfully!"
echo "Target WAL PVC contents after rsync:"
ls -la /target/
volumeMounts:
- name: source
mountPath: /source
- name: target
mountPath: /target
volumes:
- name: source
persistentVolumeClaim:
claimName: postgresql-restored-1-wal (1)
- name: target
persistentVolumeClaim:
claimName: postgresql-1-wal (2)
| 1 | PVC name for the restored cluster’s WAL storage |
| 2 | PVC name for the primary instance’s WAL storage (for example, postgresql-1-wal if postgresql-1 is the primary) |
Monitor the job completion:
kubectl logs job/rsync-pgwal -n <instance-namespace> -f
Step 3.3 - Sync Data Storage
Create a job to synchronize the main database data from the restored cluster to the original cluster:
apiVersion: batch/v1
kind: Job
metadata:
name: rsync-pgdata
namespace: <instance-namespace>
spec:
template:
spec:
restartPolicy: Never
containers:
- name: rsync
image: alpine:latest
command:
- sh
- -c
- |
set -e
echo "Installing rsync..."
apk add --no-cache rsync
echo "Source PVC contents:"
ls -la /source/
echo "Target PVC contents before rsync:"
ls -la /target/
echo "Starting rsync from postgresql-restored-1 to postgresql-1..."
rsync -av --delete /source/ /target/
echo "Rsync completed successfully!"
echo "Target PVC contents after rsync:"
ls -la /target/
volumeMounts:
- name: source
mountPath: /source
- name: target
mountPath: /target
volumes:
- name: source
persistentVolumeClaim:
claimName: postgresql-restored-1 (1)
- name: target
persistentVolumeClaim:
claimName: postgresql-1 (2)
| 1 | PVC name for the restored cluster’s data storage |
| 2 | PVC name for the primary instance’s data storage (for example, postgresql-1 if postgresql-1 is the primary) |
Monitor the job completion:
kubectl logs job/rsync-pgdata -n <instance-namespace> -f
Step 4 - Clean Up and Restore Service
After the data has been synchronized, clean up the temporary restore cluster and bring the original cluster back online.
Step 4.1 - Delete the Restore Cluster
Remove the temporary restore cluster as it’s no longer needed:
kubectl delete cluster postgresql-restored -n <instance-namespace>
Step 4.2 - Unhibernate the Original Cluster
Remove the hibernation annotation to restart the original cluster with the restored data:
kubectl annotate clusters.postgresql.cnpg.io postgresql cnpg.io/hibernation- -n <instance-namespace>