Alert rule: CNPGClusterZoneSpreadWarning

Overview

This alert triggers when PostgreSQL pods are not evenly spread across availability zones - specifically when the number of pods exceeds the number of distinct zones hosting them. This means at least one zone contains multiple instances, reducing availability zone redundancy.

This alert can fire at the same time as CNPGClusterInstancesOnSameNode. Zone co-location is a broader condition: pods may be on different nodes but still in the same zone.

Steps for Debugging

Step one

Identify the affected namespace from the alert. Set it as a variable.

INSTANCE_NAMESPACE='<instance-namespace>'
Step two

List all PostgreSQL pods and the node they are running on.

kubectl get pods -n $INSTANCE_NAMESPACE -l cnpg.io/cluster=postgresql -o wide
Step three

Check which availability zone each node belongs to.

kubectl get nodes -L topology.kubernetes.io/zone
Step four

Cross-reference the pod nodes from step two with the zone output from step three to identify which zone has multiple instances.

Step five

Check how many nodes are available per zone.

kubectl get nodes -L topology.kubernetes.io/zone --no-headers | awk '{print $NF}' | sort | uniq -c
Step six

If a zone has only one node and the cluster has more instances than zones, the scheduler cannot spread pods evenly. Check the number of instances configured.

kubectl get cluster postgresql -n $INSTANCE_NAMESPACE -o jsonpath='{.spec.instances}'
CNPG uses a soft topology spread constraint by default. Uneven spread occurs when zone capacity is insufficient to host one instance per zone.