ADR 0040 - TCP Service Access

Author	Marco De Luca
Owner	Schedar
Reviewers	Schedar
Date Created	2025-10-01
Date Updated	2025-10-27
Status	accepted
Tags	framework, networking, tcp

Author

Marco De Luca

Owner

Schedar

Reviewers

Schedar

Date Created

2025-10-01

Date Updated

2025-10-27

Status

accepted

Context

AppCat/Servala users need direct TCP access from outside a cluster to various services running in their tenant namespaces, including Codey (Git over SSH), PostgreSQL, Redis, MySQL, and other database or TCP-based workloads.

Unlike HTTP, TCP protocols typically lack SNI (Server Name Indication), which means hostname-based routing on a single port is not possible. To multiplex multiple tenant services through a single IP address, each service instance must be assigned a unique TCP port.

Cloud provider load balancers impose hard limits on the number of listeners (ports) per IP address. For example (as of October 2025):

Cloudscale: up to 100 listeners per LoadBalancer
Exoscale: up to 10 listeners per LoadBalancer

Considered Options

Option 1: Per-tenant LoadBalancer (distinct IP per instance)

Each tenant service instance would expose its Service directly as type: LoadBalancer, letting the cloud provider allocate a public IP for that instance. This is the most straightforward design and uses Kubernetes native service abstraction without any shared proxying layer. Network traffic from the Internet terminates directly on each tenant’s service pod, no intermediate routing or multiplexing is required. Operationally, this model scales linearly: every new tenant service gets one LoadBalancer and one public IP.

Pros

Simple datapath
Simpler to operate
No lateral movement risk

Cons

Potentially hundreds of IPs
Linear cost
Violates consolidation goal
Cloud providers limit the number of load balancers a per tenant

Option 2: One LB + one IP + file-based TCP proxy (graceful reloads)

A single shared LoadBalancer IP would forward all inbound SSH traffic to a TCP proxy Deployment (for example HAProxy). The proxy configuration file would define one bind :PORT and server <tenant> pair per customer instance. A custom controller would maintain that configuration file (for example a ConfigMap) and trigger a graceful reload whenever tenants are added or removed. During reloads, HAProxy can keep existing sessions alive but still incurs short pauses for new connections.

Pros

Simple architecture
Portable

Cons

Reloads can interrupt active sessions, altough sessions are short-lived for Git
Config management complexity

Option 3: Dynamic proxy with runtime API (HAProxy Data Plane API / Envoy xDS)

Instead of editing files, the controller would communicate with the proxy at runtime:

For HAProxy, through its REST-based Data Plane API to add or remove frontends/backends dynamically.
For Envoy, through an xDS control-plane that updates listeners and clusters live.

The proxy stays running continuously. New tenants are added or removed atomically without restarting the process. The controller becomes a lightweight "routing brain" maintaining the mapping of (tenant → port → backend service) and pushing those changes to the proxy’s API.

Pros

Atomic updates
No dropped connections
Mature APIs

Cons

Needs custom control-plane component
Higher operational burden

References

Option 4: Kubernetes Gateway API (TCPRoute per tenant)

In this model, the Kubernetes Gateway API serves as the control plane for multiplexing TCP traffic, independent of any specific proxy implementation. A conformant Gateway controller such as Envoy Gateway, Cilium Gateway, or KGateway manages a shared data plane that acts as a TCP multiplexer for all tenant service connections.

A Kubernetes Gateway resource represents the public entry point (one Service type=LoadBalancer with a single external IP). Each tenant service receives a dedicated TCP listener bound to a unique port. For larger deployments, additional ports can be defined through XListenerSet (an experimental feature as the X implies) resources, which extend a Gateway with extra listener definitions beyond its native limit.

A custom controller acts as a lightweight orchestrator: it allocates a free port, creates a TCPRoute pointing to the tenant’s internal Service, and adds a ReferenceGrant in the tenant’s namespace to authorize the cross-namespace reference. The selected Gateway implementation (Envoy, Cilium, etc.) then automatically reconciles these CRDs into live configuration, handling listener creation, routing, and backend resolution without reloads or direct API calls.

Pros

Fully Kubernetes-native and declarative, integrates cleanly with OpenShift and future Gateway API implementations.
Zero-downtime dynamic updates. Gateway controllers apply config atomically without restarts.
Simpler controller logic: only needs to manage CRDs, not interact with proxy runtime APIs.
Strong observability and status reporting via Gateway API metrics and Proxy metrics.

Cons

Each Gateway supports ≤ 64 listeners, additional listeners require XListenerSet, which can add management overhead.
Still constrained by cloud LoadBalancer listener caps (10-100), necessitating sharding at scale.
Slightly heavier footprint than a plain HAProxy setup.

References

Option 5: BGP-advertised Floating IPs with MetalLB or Cilium LB

This approach uses BGP to advertise floating IP addresses directly to the cluster, bypassing cloud provider LoadBalancer-as-a-Service (LBaaS) offerings. Tools such as MetalLB or Cilium LoadBalancer mode manage IP allocation and BGP advertisements, allowing Kubernetes Service type=LoadBalancer resources to receive IPs from a provider-configured floating IP range.

Pros

Independence from cloud provider LBaaS APIs and their limitations
Potentially lower cost at scale

Cons

Each Kubernetes Service type=LoadBalancer still requires a unique IP address
Kubernetes Services cannot route traffic to endpoints across multiple namespaces
Does not solve the listener port multiplexing problem, reverse proxy still required
Requires BGP peering configuration with the cloud provider
May require external load balancer infrastructure or additional network management
Increases operational overhead compared to managed LBaaS
Feasibility depends on cloud provider support for BGP and floating IP ranges
Not verified on all target platforms (for example Exoscale)

References

Decision

We will adopt Option 4, a Gateway API based architecture for multiplexing TCP traffic across AppCat/Servala tenant services.

This design leverages the Kubernetes Gateway API as the declarative control plane for dynamic TCP routing, while allowing flexibility in choosing the underlying implementation such as Envoy Gateway, Cilium Gateway API, KGateway etc.

The custom controller will manage Gateway API resources (Gateway, TCPRoute, ReferenceGrant, and optionally XListenerSet) to dynamically assign ports and routes for each tenant service. The selected Gateway implementation (Envoy, Cilium, etc.) will reconcile these CRDs into live data-plane configuration without requiring reloads or manual synchronization.

Sharding model:

Each shard = one Gateway object behind its own Service type=LoadBalancer (one IP).
Shard capacity is provider-specific (Cloudscale = 100 listeners/IP, Exoscale = 10 listeners/IP).
The controller allocates ports from reserved per-shard ranges (sticky per tenant to avoid collisions) and updates each tenant service’s configuration with the assigned external port and hostname.
DNS uses deterministic service-specific hostnames per shard: for example, ssh1.codey.ch, ssh2.codey.ch for Codey Git SSH; db1.example.ch, db2.example.ch for databases.

Rationale:

Gateway API is the next generation Kubernetes Ingress, ensuring long-term compatibility and portability.
Multiple conformant implementations exist, providing choice and ecosystem flexibility.
Dynamic listener and route updates occur natively through the Gateway controller, no reloads, no direct API integration required.
The controller logic remains simple: declarative CRD management and port allocation rather than low-level proxy configuration.

Note on complexity:

LoadBalancer sharding remains necessary due to provider listener caps (10-100 listeners/IP).
This complexity is isolated to the controller layer (shard lifecycle, DNS updates), not the data plane.
The architecture remains portable across providers and Gateway implementations.

Implementation Notes

Each service instance listens internally on a service-specific port (for example, Forgejo on 2222, PostgreSQL on 5432, Redis on 6379).
Controller maintains a CRD tracking shard capacity, port ranges, and per-service assignments.
When a shard reaches capacity, the controller provisions a new Gateway and assigns the next DNS alias.
NetworkPolicies allow traffic from the Gateway namespace to tenant namespaces on the service-specific internal ports.
Health, metrics, and reconciliation loops ensure the Gateway state matches the controller’s record.

Consequences

Cloudscale: one IP supports ~100 tenant services (listeners).
Exoscale: one IP supports 10 tenant services (listeners).
Uniform controller logic across providers and service types, only shard capacities differ.
LoadBalancer costs grow linearly with shard count.
Each service type (Codey, PostgreSQL, Redis, etc.) can use dedicated or shared Gateway shards depending on scale and isolation requirements.

Risks and Mitigations

Hard provider caps: unavoidable → automatic sharding.
Gateway listener scaling: if using Gateway API, use XListenerSet, enforce per-shard and per-Gateway limits.
Port collisions: non-overlapping port ranges per shard, lease-based allocator.
DNS complexity: deterministic shard hostnames (sshN.codey.ch) and automation.
Operational drift: periodic reconcile verifies proxy state against controller CRDs.