ADR 0051 - Standardized Status Fields for AppCat Services

Author

Nicolas Bigler

Owner

Schedar

Reviewers

Date Created

2026-04-20

Date Updated

2026-04-20

Status

draft

Tags

framework,servala,service

Summary

Introduce a shared VSHNServiceStatus struct embedded in all AppCat service status types using a hybrid approach. It exposes a phase enum (Provisioning, Ready, Scaling, Maintenance, Degraded, Failed, Deleting, Unknown) for quick machine-readable summary, plus standard metav1.Condition[] for rich transition history and detail. This aligns with the Crossplane way of working: consumers can read phase for a simple lifecycle view while operators and automation can inspect conditions for full history and reasoning.

Context

With AppCat/Servala we manage multiple services. Each service exposes different status fields with no consistent schema. Consumers (for example Servala portal) cannot reliably determine the service lifecycle state without service-specific knowledge. This makes it harder to create a unified dashboard.

The current state: - No unified "phase" or "state" field across services - Transition states (provisioning, scaling, maintenance) are not exposed - Some services expose internal implementation details - Inconsistent depth (PostgreSQL has 11 conditions, Garage has none)

Solution Ideas

Option 1: Single phase field (enum string)

Add a top-level phase field (for example Provisioning, Ready, Scaling, Maintenance, Degraded, Failed, Deleting, Unknown) to a new VSHNServiceStatus struct shared across all services.

Pros Cons

Single field. Trivial to read for consumers and UIs

Mutually exclusive. Can only represent one state at a time

Machine-readable enum, easy to act on

Priority rules required when multiple states apply simultaneously

No condition-evaluation logic in consumers

New states require enum extension (CRD update + re-deploy)

Easy to default via mutating webhook

Loses detail: no reason/message per state without extra fields

Option 2: Standard Kubernetes conditions only (metav1.Condition[])

Use standard metav1.Condition with well-known type values: Ready, Provisioning, Maintenance, Degraded, Scaling.

Pros Cons

Follows Kubernetes convention. Familiar to operators

No single "current state". Consumers must evaluate N conditions

Append-only, additive. No breaking changes when adding new condition types

status: True/False/Unknown per condition is confusing for states like Maintenance

Rich per-condition reason/message/lastTransitionTime

UI/portal must implement condition-evaluation logic per consumer

Already partially in place via xpv1.ResourceStatus

Verbose. Harder to display in a dashboard without transformation

Option 3: Hybrid approach with phase field + conditions

Combine both: top-level phase enum for quick state summary, plus metav1.Condition[] for detail and reason tracking per transition.

Pros Cons

Best of both: simple summary + rich detail

More fields to maintain and keep consistent

Phase set by controller; conditions carry the why

Phase and conditions can diverge if not updated atomically

Consumers choose abstraction level (phase for UI, conditions for automation)

Slightly more complex webhook defaulting

Extensible: new conditions don’t break existing phase consumers

Requires clear ownership rules (who sets what)

Option 4: Follow Crossplane’s Ready/Synced pattern only

No new fields. Rely entirely on existing xpv1.ResourceStatus conditions (Ready=True/False, Synced=True/False) that Crossplane already populates.

Pros Cons

Zero implementation cost

Only two states expressible: ready or not

Already works for Crossplane tooling

No transition states (provisioning, maintenance, scaling). Servala requirement unmet

No migration needed

Consumers must understand Crossplane internals

Consistent with upstream conventions

Synced=False covers too many distinct failure modes

Decision

We use Option 3 (hybrid approach), because it gives consumers the best of both worlds: a single phase enum for a quick machine-readable summary without condition-evaluation logic, plus metav1.Condition[] for rich transition history and detail that operators and automation can inspect. New conditions don’t break existing phase consumers, making the approach extensible without CRD-breaking changes. No intermediate wrapper struct.

Standard Fields (all services)

type VSHNServiceStatus struct {
    // Phase summarizes the lifecycle state of the service.
    // +kubebuilder:validation:Enum=Provisioning;Ready;Scaling;Maintenance;Degraded;Failed;Deleting;Unknown
    // +kubebuilder:default=Unknown
    Phase VSHNServicePhase `json:"phase,omitempty"`

    // Reason is a short CamelCase reason for the current phase.
    // +optional
    Reason string `json:"reason,omitempty"`

    // Message is a human-readable explanation of the current phase.
    // +optional
    Message string `json:"message,omitempty"`

    // LastTransitionTime is when the phase last changed.
    // +optional
    LastTransitionTime *metav1.Time `json:"lastTransitionTime,omitempty"`

    // Conditions holds the detailed transition history and per-state reasons.
    // Follows the Crossplane convention: each phase transition is recorded as a condition.
    // +optional
    // +listType=map
    // +listMapKey=type
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

type VSHNServicePhase string

const (
    VSHNServicePhaseProvisioning VSHNServicePhase = "Provisioning"
    VSHNServicePhaseReady        VSHNServicePhase = "Ready"
    VSHNServicePhaseScaling      VSHNServicePhase = "Scaling"
    VSHNServicePhaseMaintenance  VSHNServicePhase = "Maintenance"
    VSHNServicePhaseDegraded     VSHNServicePhase = "Degraded"
    VSHNServicePhaseFailed       VSHNServicePhase = "Failed"
    VSHNServicePhaseDeleting     VSHNServicePhase = "Deleting"
    VSHNServicePhaseUnknown      VSHNServicePhase = "Unknown"
)

Embed in every service status:

type VSHNPostgreSQLStatus struct {
    VSHNServiceStatus `json:",inline"`
    // ... existing fields ...
}

Phase Transition Rules

Phase Condition

Provisioning

instanceNamespace not yet set OR initial resource creation

Ready

Crossplane Ready=True, Synced=True, all service pods in instanceNamespace are Ready, not in maintenance. See Pod Readiness below.

Maintenance

Maintenance window active (tracked by maintenance controller)

Scaling

Composition function detects a scaling operation (for example replica count change), sets phase=Scaling and marks the service unready. Returns to Ready only once the scaling completes and pod readiness is re-confirmed.

Degraded

Ready=True but health check failing, OR partial replica availability

Failed

Ready=False for > threshold OR unrecoverable error

Deleting

deletionTimestamp set

Unknown

Insufficient information to determine state

Priority when multiple conditions active:

Deleting > Failed > Provisioning > Maintenance > Scaling > Degraded > Ready > Unknown

Pod Readiness

All AppCat services are deployed via Helm. Crossplane marks a service Ready=True and Synced=True as soon as the Helm release is successfully applied. It does not track pod lifecycle. This means a service can report Ready while its pods are still initializing, terminating from a previous revision, or in CrashLoopBackOff.

To guard against premature Ready phase the composition function MUST perform an additional pod readiness check:

  1. After Crossplane conditions are Ready=True and Synced=True, query all pods in instanceNamespace.

  2. If any service pod is not in Running phase with all containers Ready, remain in Provisioning (initial deploy) or transition to Degraded (post-deploy regression) rather than setting phase=Ready.

  3. Only set phase=Ready when Crossplane conditions AND pod readiness are both satisfied.

This check is implemented inside the composition function (function-appcat). It sets a custom condition PodReady (type PodReady, status True/False) on the composite resource alongside the standard Crossplane conditions, giving operators visibility into which layer is not yet healthy:

Scenario Crossplane Ready PodReady condition phase

Initial provisioning, pods pending

False

False

Provisioning

Helm release applied, pods not yet ready

True

False

Provisioning

All pods running and ready

True

True

Ready

Post-update pod regression (was ready before)

True

False

Degraded

Helm release failed

False

False

Failed

Internal Conditions Cleanup

Internal implementation-detail conditions (pgclusterConditions, namespaceConditions, etc.) should be:

  • Kept but marked // +optional with clear doc that these are internal

  • Not surfaced in API docs aimed at consumers

  • Evaluated for removal in a follow-up (separate ADR)

Service-Specific Fields

Keep existing service-specific fields (currentVersion, currentReleaseTag, sshPort, etc.), but standarize the naming by adding a common field for all services:

  • Use currentVersion for version strings (not currentReleaseTag / mariadbVersion)

Extensibility

New status fields added as optional (omitempty). Mutating webhook sets defaults for new fields on existing objects (backfill). No breaking changes to consumers reading existing fields.

Example: a billing controller adds a VSHNBillingStatus sub-object to expose cost and usage information to the Servala dashboard without touching existing fields. Service-specific extensions are added directly to the service status type, not to VSHNServiceStatus:

type VSHNBillingStatus struct {
    // CurrentMonthlyPrice is the current estimated monthly price in CHF.
    // Set by the billing controller.
    // +optional
    CurrentMonthlyPrice *resource.Quantity `json:"currentMonthlyPrice,omitempty"`

    // Unit is the unit of measurement for consumption (e.g. "GiB", "minutes").
    // +optional
    Unit string `json:"unit,omitempty"`

    // QuantityConsumed is the amount consumed in the current billing period, in Unit.
    // +optional
    QuantityConsumed *resource.Quantity `json:"quantityConsumed,omitempty"`
}

type VSHNPostgreSQLStatus struct {
    VSHNServiceStatus `json:",inline"`

    // Billing exposes cost and usage information, populated by the billing controller.
    // +optional
    Billing *VSHNBillingStatus `json:"billing,omitempty"`

    // ... existing fields ...
}

Consumers that don’t read billing are unaffected. The billing controller owns this field; the mutating webhook defaults it to nil (absent) on existing objects.

Consequences

Positive

  • Servala gets machine-readable lifecycle state without service-specific logic

  • Operators can write generic tooling against any AppCat service

  • Future services required to set phase from day one

Negative

  • Composition functions must set phase correctly. Ongoing maintenance burden

  • Phase transitions need to be tested

  • Scaling transitions managed by composition function: sets unready on scale start, re-checks pod readiness on completion

Open Questions

  1. Servala specifics: Do we need additional states beyond the 8 proposed?

  2. Condition cleanup: Remove internal condition slices now or in follow-up?

  3. Degraded threshold: What defines degraded vs. failed?