Alert rule: CNPGPostgreSQLNoStreamingReplicas
Overview
This alert triggers when the primary has zero streaming replicas connected, while replicas are expected to exist. HA is completely broken - if the primary fails now, there is no replica to promote.
Steps for Debugging
- Step one
-
Identify the affected namespace from the alert. Set it as a variable.
INSTANCE_NAMESPACE='<instance-namespace>' - Step two
-
Check which pods are running and their roles.
kubectl get pods -n $INSTANCE_NAMESPACE -l cnpg.io/cluster=postgresql -L cnpg.io/instanceRole - Step three
-
Resolve the primary pod name and check replication status.
PRIMARY=$(kubectl get cluster postgresql -n $INSTANCE_NAMESPACE -o jsonpath='{.status.currentPrimary}') kubectl exec -n $INSTANCE_NAMESPACE $PRIMARY -- psql -U postgres -c "SELECT application_name, state, sync_state FROM pg_stat_replication;" - Step four
-
Check logs for all replica pods for connection errors.
kubectl logs -n $INSTANCE_NAMESPACE -l cnpg.io/cluster=postgresql,cnpg.io/instanceRole=replica --prefix | grep -i "error\|fatal\|replication\|receiver" - Step five
-
Check the overall cluster status for conditions or events.
kubectl describe cluster postgresql -n $INSTANCE_NAMESPACE kubectl get events -n $INSTANCE_NAMESPACE --sort-by='.lastTimestamp'