ADR 0043 - Managed OpenBao Service Implementation

Author

Yannik Dällenbach

Owner

Schedar/bespinian

Reviewers

Schedar

Date Created

2025-01-13

Date Updated

2025-01-13

Status

draft

Tags

service,openbao,secret-management

Summary

This ADR outlines the implementation of a managed OpenBao service on the AppCat platform to provide secret management capabilities to customers. This builds upon the suggestion of ADR 0024 - Product Choice for Secret Management to use OpenBao as secret and PKI management solution.

Context

Following the suggestion in ADR 0024 - Product Choice for Secret Management to use OpenBao for secret management, we need to implement it as a managed service within the AppCat ecosystem. OpenBao provides:

  • Secret storage with REST API

  • Vault API compatibility

  • Open-source license with Linux Foundation backing

  • Self-hostable deployment model

The service must integrate with the existing AppCat patterns including:

  • Crossplane-based provisioning

  • Managed namespace deployment model

  • User-workload monitoring integration

  • Backup and maintenance automation

  • SLA monitoring and reporting

Requirements

Functional Requirements

  • Secret Management: Store, retrieve, and manage secrets via REST API

  • API Compatibility: Maintain Vault API compatibility for existing tooling

  • High Availability: Support clustered deployment for production workloads

  • Authentication: Integration with OIDC

Operational Requirements

  • Backup & Recovery: Automated backup of secret data

  • Monitoring: SLA metrics, capacity alerts, and operational dashboards

  • Maintenance: Automated security updates and version upgrades

  • Scaling: Horizontal scaling capabilities for high-throughput scenarios

  • Security: Encryption at rest, TLS in transit, audit logging

Proposals

Proposal 1: Helm Chart with External Storage

Deploy OpenBao using the official Helm chart with external storage backends (PostgreSQL).

Implementation
  • Use provider-helm to deploy OpenBao Helm chart

  • PostgreSQL backend for secret storage (leveraging existing VSHNPostgreSQL)

  • Initialization through composition functions

Advantages
  • Leverages existing PostgreSQL infrastructure

  • Official Helm chart provides production-ready deployment

  • Separation of compute and storage for better scaling

  • Familiar AppCat deployment patterns

Disadvantages
  • Additional complexity in managing external dependencies

  • Potential performance overhead with external storage

Proposal 2: Helm Chart with Internal Storage

Deploy OpenBao using the official Helm chart with integrated storage using Raft consensus.

Implementation
  • Use provider-helm to deploy OpenBao Helm chart

  • Raft storage backend for simplicity

  • Built-in clustering for high availability

Advantages
  • Simplified deployment with fewer external dependencies

  • Built-in consensus and replication

Disadvantages
  • Raft cluster management overhead

Proposal 3: Operator-Based Deployment

Develop or adopt an OpenBao operator for Kubernetes-native management.

Implementation
  • Custom operator following AppCat patterns

  • CRDs for vault configuration and policies

  • Automated lifecycle management

  • Native Kubernetes integration

Advantages
  • Full Kubernetes-native experience

  • Automated day-2 operations

  • Extensible for future features

Disadvantages
  • High development effort

  • Additional operational complexity

  • Maintenance burden for custom operator

Decision

Proposal 2: Helm Chart with Internal Storage

We choose to implement OpenBao using the official Helm chart with integrated Raft storage.

Implementation Details

Storage Backend
  • Primary: Raft consensus storage for built-in clustering

  • Leverage existing AppCat Backup mechanisms (K8up)

  • Self-contained storage eliminates external dependencies

API Specification:

The VSHNOpenBao CRD follows AppCat conventions (ADR 0016 - Service API Design) with parameter groups for service configuration, sizing, backup, monitoring, and maintenance.

apiVersion: vshn.appcat.vshn.io/v1
kind: VSHNOpenBao
metadata:
  name: my-openbao
  namespace: my-namespace
spec:
  parameters:
    # Service configuration
    service:
      version: "2.1.0"  # OpenBao version (enum of supported versions)
      fqdn: "openbao.example.com"  # Fully qualified domain name
      serviceLevel: guaranteed  # besteffort or guaranteed
      openBaoSettings:
        # Auto-unseal configuration (optional)
        # Enables automatic unsealing using external key management systems
        # Only one provider should be configured at a time
        autoUnseal:
          awsKmsSecretRef: ""  # Reference to secret containing AWS KMS credentials and configuration
          azureKeyVaultSecretRef: ""  # Reference to secret containing Azure Key Vault credentials and configuration
          gcpKmsSecretRef: ""  # Reference to secret containing GCP Cloud KMS credentials and configuration
          transitSecretRef: ""  # Reference to secret containing connection details to another Vault/OpenBao instance

    # Number of OpenBao instances
    # For guaranteed serviceLevel: must be 3
    # For besteffort serviceLevel: can be 1 or 3
    instances: 3

    # Sizing
    size:
      plan: standard  # Resource plan: small, standard, large
      requests:
        cpu: "2"
        memory: "4Gi"
      disk: 20Gi  # Raft storage volume size per replica
      storageClass: ""  # Optional storage class override

    # Backup and restore configuration (using K8up)
    backup:
      enabled: true
      schedule: "0 2 * * *"  # Cron schedule for Raft snapshots
      retention:
        keepLast: 2
        keepHourly: 2
        keepDaily: 7
        keepWeekly: 4
        keepMonthly: 3
    restore:
      claimName: ""
      backupName: ""

    # Maintenance window
    maintenance:
      dayOfWeek: Tuesday  # enum: Monday-Sunday
      timeOfDay: "22:00"  # HH:MM format in UTC

    # Monitoring
    monitoring:
      alertmanagerConfigRef: ""
      alertmanagerConfigSecretRef: {}
      alertmanagerConfigTemplate: {}
      email: ""

  # Unseal keys and root token secret reference
  # This secret will contain the unseal keys and root token generated during initialization
  writeConnectionSecretToRef:
    name: openbao-unseal-keys

Unseal Keys Secret:

The writeConnectionSecretToRef secret contains the unseal keys and root token:

apiVersion: v1
kind: Secret
metadata:
  name: openbao-unseal-keys
data:
  UNSEAL_KEY_1: <base64-encoded-key>
  UNSEAL_KEY_2: <base64-encoded-key>
  UNSEAL_KEY_3: <base64-encoded-key>
  UNSEAL_KEY_4: <base64-encoded-key>
  UNSEAL_KEY_5: <base64-encoded-key>
  ROOT_TOKEN: <base64-encoded-root-token>

Auto-unseal

Auto unseal allows OpenBao to unseal automatically without manual intervention using an external key management system. This is crucial for automated recovery and reduces operational burden.

By default OpenBao instances will be configured to use a central, internal VSHN managed Vault or OpenBao to auto-unseal.

If a customer configures an auto-unseal provider, only the service level "besteffort" can be guaranteed.

Supported auto-unseal providers:

AWS KMS

Configure using awsKmsSecretRef pointing to a secret containing AWS credentials and KMS key configuration

Azure Key Vault

Configure using azureKeyVaultSecretRef pointing to a secret containing Azure credentials and Key Vault details

GCP Cloud KMS

Configure using gcpKmsSecretRef pointing to a secret containing GCP credentials and Cloud KMS configuration

Transit (Vault/OpenBao)

Configure using transitSecretRef pointing to a secret containing connection details to another Vault/OpenBao instance

Each secret reference should contain the necessary credentials and configuration for the respective provider. When auto-unseal is configured, OpenBao will automatically unseal after restarts without requiring the unseal keys from writeUnsealKeysSecretToRef.

If no auto-unseal provider is configured, manual unsealing using the unseal keys is required after each pod restart.

Example AWS KMS auto-unseal secret:

apiVersion: v1
kind: Secret
metadata:
  name: openbao-awskms-config
type: Opaque
stringData:
  region: "us-east-1"
  access_key: "AKIAIOSFODNN7EXAMPLE"
  secret_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  kms_key_id: "19ec80b0-dfdd-4d97-8164-c6examplekey"
  endpoint: "https://vpce-0e1bb1852241f8cc6-pzi0do8n.kms.us-east-1.vpce.amazonaws.com"

Service Levels:

besteffort
  • 1 or 3 instances

  • Standard resource guarantees

  • Best-effort availability

guaranteed
  • Requires 3 instances (HA deployment)

  • Resource guarantees with pod anti-affinity

  • Higher availability SLA

Plans:

By default, the following plans are available on every cluster:

Plan CPU Memory Disk

standard-2

500m

2Gi

16Gi

standard-4

1

4Gi

16Gi

standard-8

2

8Gi

16Gi

Key Components
  1. OpenBao Cluster: 3-node HA deployment with Raft consensus

  2. Raft Storage: Built-in distributed storage backend

  3. Backup Storage: ObjectBucket for Raft snapshots using K8up

  4. Monitoring: Custom SLI exporter and Prometheus integration

Security Model
  • TLS encryption for all communications

  • RBAC policies managed through OpenBao

  • Audit logging to persistent storage

  • Auto-unseal configuration for OpenBao bootstrap

Consequences

Positive
  • Simplified deployment with fewer external dependencies

  • Built-in consensus and replication reduces operational complexity

  • Self-contained backup mechanisms using Raft snapshots

  • Leverages official OpenBao Helm chart for production readiness

  • Eliminates external storage dependency management

Negative
  • Raft cluster management requires specialized knowledge

  • Limited to OpenBao’s built-in storage capabilities

  • Potential storage scaling limitations compared to external databases

  • No feature parity with HashiCorp Vault Enterprise

Operational Impact
  • Simplified service deployment with reduced external dependencies

  • Raft snapshot management and restoration procedures

  • Need for OpenBao and Raft consensus expertise in operations team

  • Integration testing with existing AppCat services

  • TLS certificate lifecycle management (renewal, rotation)

  • Auto-unseal configuration and cluster bootstrap management

  • Raft cluster health monitoring and node management

  • Audit log management and compliance reporting

  • ServiceMonitor configuration for Prometheus integration

  • Snapshot-based backup validation and testing

Customer Benefits
  • Self-hosted alternative to cloud secret management services

  • Vault API compatibility for existing applications and tooling

  • Compliance with data sovereignty requirements