ADR 0047 - Service Maintenance and Upgrades (Framework 2.0)
Author |
Gabriel Saratura |
|---|---|
Owner |
Schedar |
Reviewers |
Schedar |
Date Created |
|
Date Updated |
|
Status |
draft |
Tags |
framework,framework2,maintenance,upgrades,crossplane,workflows,cronoperations |
|
Summary
The decision selects Kubernetes native CronJobs to orchestrate maintenance and upgrades with possible use of ArgoCD Workflows or Crossplane Operations in the future in case of growing complexity. |
Problem
Need automated maintenance workflows for:
-
Service instance upgrades
-
Backup and restore orchestration
-
Version checks and notifications
-
Maintenance window enforcement
-
Maintenance suspension
Current State
-
CronJob-based maintenance
-
VersionHandler tracks claim/composite/instance relationships
-
Manual chart version bumps in class/defaults.yml
-
Composition revisions for versioning
-
Revision policy: manual vs automatic
-
Maintenance CronJob switches revision during maintenance window
-
Configurable delayed production rollout
-
Hotfix job for urgent updates
-
Custom rollback scripts
Solutions
Option A: CronJob Pattern
Description:
Kubernetes CronJobs execute scheduled maintenance tasks at specified intervals. Jobs run containers that perform version checks, apply composition revision updates, and trigger service upgrades during maintenance windows.
Advantages:
-
Proven Pattern: Used in component-appcat, works in production
-
Simple: Standard Kubernetes CronJob, easy to understand
-
No Additional Dependencies: Just a CronJob controller (built into K8s)
-
Easy Debugging: Job logs visible in
kubectl logs
Disadvantages:
-
Limited Observability: Job logs disappear after completion (need log aggregation)
-
No Complex Workflows: Difficult to express multi-step workflows (backup → upgrade → release → verify)
-
Sequential Only: Can’t easily parallelize tasks
Option B: Argo Workflows
Description:
Argo Workflows is a Kubernetes-native workflow engine that orchestrates multi-step operations as directed acyclic graphs (DAGs). Workflows define sequential or parallel steps with dependencies, conditional execution, and retry logic. Each workflow runs as pods executing container-based tasks.
Example Workflow:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: redis-upgrade-workflow
spec:
entrypoint: upgrade-with-backup
templates:
- name: upgrade-with-backup
steps:
# Step 1: Backup
- name: create-backup
template: k8up-backup
# Step 2: Perform upgrade
- name: upgrade-instance
template: update-revision
# Step 3: Perform release
- name: release-instance
template: rollout-revision
# Step 4: Post-upgrade validation
- name: run-smoke-tests
template: smoke-tests
Advantages:
-
Complex Workflows: Multi-step workflows with dependencies (backup before upgrade)
-
Excellent Observability: Workflow UI shows step progress, logs, errors
-
Retry Logic: Built-in retry with exponential backoff
-
Conditional Steps: Skip steps based on conditions (for example, skip backup if recent backup exists)
-
DAG Support: Parallel execution of independent tasks
-
Audit Trail: Workflow history persisted, easy to review what happened
-
Comp Function Integration: Workflows can be part of each instance
-
Generic WorkflowTemplates: WorkflowTemplates can defined cluster wide in KCL and easily referenced in Workflows
-
Convenient UI: ServiceOperators can handle maintenance directly in the ArgoCD UI
Disadvantages:
-
Additional Dependency: Requires Argo Workflows installation and management
-
Learning Curve: Schedar team members need to learn workflow DSL (YAML-based)
-
Operational Overhead: Another operator to monitor, upgrade, secure
-
Complexity: Might be overkill for simple version checks
Option C: Crossplane WatchOperation / CronOperation Functions
Description:
Crossplane operation functions (CronOperation, WatchOperation) extend composition pipelines with scheduled or event-driven execution. Operations trigger composition function steps at specified intervals or in response to resource changes, executing maintenance logic within the Crossplane reconciliation loop.
Example:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: vshnredis.vshn.appcat.vshn.io
spec:
mode: Pipeline
pipeline:
# Normal rendering
- step: redis
functionRef:
name: function-appcat
# Maintenance operation (runs periodically)
- step: maintenance-check
functionRef:
name: function-appcat-maintenance
operationRef:
name: maintenance-cron
Advantages:
-
No Additional Dependencies: Built into Crossplane
-
Tightly Integrated: Direct access to composed resources and claim state
-
Declarative: Same function model as composition
Disadvantages:
-
Experimental: CronOperation/WatchOperation still alpha in Crossplane
-
Complex Logic in Go: Maintenance logic embedded in composition function code
-
No Multi-Step Workflows: Can’t express backup → upgrade → verify easily
Decision
Use CronJob pattern (Option A). Phase 2: Reevaluate Crossplane Operations (Option C) vs Argo Workflows (Option B) for complex orchestration such as release and rollback.
Rationale
-
Proven Pattern: CronJob-based maintenance works in component-appcat production.
-
Simplicity: No additional dependencies, standard Kubernetes pattern.
-
Composition Revisions: Leverage ADR0030 pattern (revision selector + automatic policy) for upgrades during maintenance windows.
-
Good Enough: Version checks and revision updates don’t require complex multi-step workflows yet.
Phase 2 Deferred Decision:
When CronJob maintenance becomes insufficient for complex multi-step workflows (backup → upgrade → verify → rollback):
-
Option B (Argo Workflows): Production-ready workflow engine with DAG orchestration and dedicated UI
-
Option C (Crossplane Operations): Native Crossplane integration but currently alpha status
Reevaluate based on Crossplane Operations maturity, actual workflow complexity needs, and operational burden.