ADR 0034 - Redis High Availability / Clustering

Author	Nicolas Bigler
Owner	Schedar
Reviewers
Date Created	2025-06-04
Date Updated	2025-06-04
Status	draft
Tags	redis,ha

Author

Nicolas Bigler

Owner

Schedar

Reviewers

Date Created

2025-06-04

Date Updated

2025-06-04

Status

draft

Context

Requirements to the Redis HA architecture:

Minimizes downtime during failover
Supports Kubernetes-native patterns
Is operationally manageable
Is compatible with our existing infrastructure and client usage patterns

Considered Options

Option 1: Redis Sentinel + Redis Master/Replica

image::adr-0034-redis-sentinel.png[Redis Master/Replica & Sentinel Example]

Description: Traditional Redis HA using Sentinel to monitor and promote replicas when the master fails.
Pros:
- Well-established model
- Supported by Redis directly
- Easy to deploy and manage with the Bitnami Helm chart
- Supports Kubernetes-native patterns
Cons:
- Requires client that supports sentinel
- Can’t be used when exposing Redis outside of Kubernetes (for example Loadbalancer service) and therefore can’t be used by Servala.

Option 2: Redis Cluster Mode

Description: Redis Cluster offers built-in sharding and high availability with multiple masters and replicas.
Pros:
- Native Redis HA and scalability
- No need for Sentinel
Cons:
- Requires client-side cluster awareness
- Higher operational complexity: need to manage multiple masters and replicas

Option 3: Redis Master/Replica with TCP Proxy in Front (for example HAProxy, Envoy)

image::adr-0034-redis-sentinel-haproxy.png[Redis Master/Replica & Sentinel & HAProxy Example]

Description: Deploy Redis in master-replica mode, expose the Redis master behind a TCP load balancer or proxy that can detect failure and route to the promoted master.
Pros:
- No need for Sentinel
- Client remains unaware of master failover; proxy handles redirection
- Greater flexibility in failover and health checks
- Can be exposed outside of Kubernetes (for example Loadbalancer service) and therefore be used by Servala
Cons:
- Requires custom failover scripts or integration with Sentinel to update routes
- Slight increase in latency due to proxy hop
- Operational complexity: proxy config must track Redis role changes

Option 4: Redis Master/Replica with label-selector on the Service

image::adr-0034-redis-sentinel-controller.png[Redis Master/Replica & Sentinel & Custom Controller Example]

Description: Deploy Redis in master-replica mode, expose the Redis master using a Kubernetes Service that selects pods based on a label indicating the master role. A custom deployment/sidecar watches Redis Sentinel to identify the current master and updates pod labels accordingly.
Pros:
- Kubernetes-native solution using native Service abstraction
- No additional proxy layer, fully transparent for clients
- Stable DNS name always points to the master node
- Simplifies failover for clients that are not Sentinel-aware
- The bitnami helm chart already supports additing sidecars. So we can easily add it as a sidecar instead of a separate deployment.
- Can be exposed outside of Kubernetes (for example Loadbalancer service) and therefore be used by Servala
Cons:
- Requires development and maintenance of a custom controller or an operator
- Deployment/sidecar needs robust error handling to maintain correct labels during Sentinel failover events
- Slight failover lag depending on Sentinel detection + label update speed
- Complexity shifts from proxy configuration to controller logic
- We don’t have much experience with this yet.

Decision

Option 1 and 4: Option 1 is the easiest solution to implement and doesn’t require much engineering effort. However, we still need a solution for all the clients that don’t have sentinel support. For that we will go with the Kubernetes native approach by levering label selectors on the service. Option 4 is slightly more work to setup as we don’t have any experience with this yet. However, we should try to leverage Kubernetes native solutions if they are available and make sense.

We can even use both options at the same time by using a flag if we want to expose redis via separate service using a label selector on the service.

This gives us the flexibility to support users that have sentinel support in their application as well as the users that don’t. Furthermore, with Option 4 we ensure that the service is also usable with Servala, where the service is exposed publicly.

Last but not least, Options 1 and 4 make it also easy to migrate from HA to non-HA and vice versa. With Option 2 this is not easily doable, as the data is spread across different masters.

Option 2: Redis Cluster Mode is too complex for our use case at the moment as it requires multiple masters and replicas. It’s also not straight-forward to switch between non-clustered and clustered mode.

Consequences

We need an additional flag to enable the additional master service and the additional deployment/sidecar for labeling the pods.
We need to develop and test a software fo master/replica detection and pod labeling in a side-car or deployment. We can get some inspiration from this existing (but archieved) project: github.com/riptl/redis-k8s-election