Converged Service Provisioning Implementation

Problem

Crossplane brings a lot of lifecycle features for XRDs and XRs. While developing the service prototype it became apparent that some functionality is missing in Crossplane out-of-the-box.

Also, using Crossplane to deploy converged services may provide many lifecycle boilerplate, but also adds significant overhead complexity. It’s being questioned whether this complexity is worth it.

The scope of this problem and decision here is explicitly converged services that certain cloud providers don’t offer as a managed cloud instance. How cloud instances are provisioned is a different problem with a different decision.

Missing Features In Crossplane 1.7

  1. Generate random strings

    See discussions in #1895 and #2308. Currently the Secret generator is used but this should be regarded as a workaround.

  2. Webhook support for validations and custom business logic

    With release v1.7.0 Crossplane offers methods to build webhooks, but apparently only for dedicated Providers. It may be possible to come off without custom webhooks for simple validations, but if custom logic or complex validation is required, no existing policy engine is going to be a sensible choice.

  3. Delay reconciliations to specific times when using CompositionRevisions feature

    This would be interesting to enable maintenance windows.

  4. Cross-resource references within Composition

    This could eliminate a lot of patch duplications. See discussions in #1770 and #2099.

  5. More methods for dynamic value computation

    It’s currently impossible to apply certain types of patching:

    • Transformation of string to boolean with an expression evaluator, for example if spec.sla == "guaranteed" then true else false

    • Omit deploying resources, for example if spec.backups.enabled is set to false in the XR, it’s impossible to skip generating a resource based on this conditional.

    • Basic math operations like addition, subtraction division or floating-point multiplication.

    • The map patch type only supports string-to-string mapping, not string-to-object which would help patching whole specs.

This list doesn’t mean that they should be part of Crossplane, only that they are missing from the PoV of AppCat. Please see also #2524, where the community discusses 'Custom Compositions'. In there the problems with Compositions are summarized pretty well and also AppCat is affected by the limitations of Compositions.

Deployment Requirements

There are several additional resources required along the usual deployment resources.

  1. Base service runtime

    • Helm release

  2. Backup

    • S3 Buckets (through Crossplane)

    • K8up Schedule

  3. Service Monitoring

    • Prometheus ServiceMonitor

    • Prometheus PrometheusRule

  4. Networking

    • NetworkPolicy to allow traffic to the service.

  5. RBAC

    • RoleBinding to grant read access for requesting customer to instance logs.

  6. Customer association

    Each instance needs to be associated to the requesting customer for billing the services. This might require additional business logic.

Some proposals are going to mention custom Crossplane providers, though the impact of versioning mechanics of a provider is currently unknown.

Proposals

Proposal 1: Umbrella Charts

The primary deployment strategy will be a Helm chart that contains most, if not all deployment artifacts. This includes cloud instances provisioned by Crossplane itself if required as dependencies.

Additional logic is implemented through custom webhooks and controllers.

appcat proposal 1.drawio
Figure 1. Service provisioning with an umbrella Helm chart

The "umbrella" chart contain all the feature gates and dependent charts required to provision services on the supported cloud infrastructure. The chart takes input parameters such as

  • major version

  • backup enabled

  • SLA (alerting enabled)

and then deploys the resources as required.

In addition to the composition, there are webhooks to perform validations and assignment to customer. Also, a dedicated version update controller delays changing of CompositionRevisions to the newest version until a specific time is met.

Advantages
  • Umbrella chart is able to deploy arbitrary resources

  • Umbrella chart is integration testable

Disadvantages
  • Separate version controller required for automated updates.

  • Webhook controller required (possibly per service).

  • Logic and feature gates in umbrella chart is limited by Helm’s templating engine.

  • Maintaining services longterm using just YAML might become questionable.

Proposal 2: Side Charts

The main difference to proposal 1 is that the umbrella chart is split up into parts where just the service is deployed and another where the additional resources are deployed.

It is assumed that the additional resources are roughly the same for all services, thus they are packaged into one side chart.

appcat proposal 2.drawio
Figure 2. Service provisioning with a common "side" Helm chart

In addition to the composition, there are webhooks to perform validations and assignment to customer. Also, a dedicated version update controller delays changing of CompositionRevisions to the newest version until a specific time is met.

Advantages
  • Easier onboarding of new services.

  • Side chart is able to deploy arbitrary resources.

Disadvantages
  • It assumes that every service requires roughly the same common resources that can be packed into one chart.

  • Webhook controller required (possibly per service).

  • Logic and feature gates in side chart is limited by Helm’s templating engine.

  • Side chart and service definition are loosely coupled.

  • Maintaining services longterm using just YAML might become questionable.

Proposal 3: Dedicated Provider

This proposal uses a dedicated Crossplane provider to deploy and configure all the resources that the service needs. A Crossplane provider uses code thus provides the most flexibility.

appcat proposal 3.drawio
Figure 3. Service provisioning with a dedicated Crossplane provider
Advantages
  • Flexible deployment using code and Kubernetes API.

  • Webhook controller can be built-in.

  • Built-in version update controller.

  • Strong coupling of the resources.

  • Generating the resources is unit testable.

Disadvantages
  • There may be a lot of code repetition between services that deploy the same set of common resources (though this can be counteracted with a code library).

Proposal 4: Sub Providers

This proposal is similar to proposal 3, however it uses dedicated Crossplane providers for each sub component. A Crossplane provider uses code thus provides the most flexibility.

appcat proposal 4.drawio
Figure 4. Service provisioning with multiple Crossplane providers
Advantages
  • Flexible deployment using code and Kubernetes API.

  • Webhook controller can be built-in.

  • Built-in version update controller.

  • Generating the resources is unit testable.

Disadvantages
  • Loose coupling of the resources.

  • Each provider might need their own feature gates and share similar API only to be patched repeatedly in compositions.

  • A lot of repetition in the compositions.

Proposal 5: Dedicated Provider With Side Chart

This proposal combines the ideas of proposal 3 with proposal 2. A Crossplane provider uses code to provision the service, whereas the additional resources are deployed using a Helm chart.

appcat proposal 5.drawio
Figure 5. Service provisioning with a dedicated Crossplane provider and side chart
Advantages
  • Flexible deployment using code and Kubernetes API.

  • Webhook controller can be built-in.

  • Built-in version update controller.

  • Generating the resources is unit testable.

  • Common resources are sharable between services using the chart.

Disadvantages
  • Loose coupling between service and additional resources.

  • Logic and feature gates in side chart is limited by Helm’s templating engine.

Proposal 6: Free Choice

This proposal does not impose a certain strategy how services are to be provisioned. Each service can choose how to best provision the required resources in a Composition.

Custom webhooks and version update controller would still be required.

Advantages
  • Flexible deployment strategy

Disadvantages
  • No common ground between services makes maintenance and day-2 operations difficult.

Proposal 7: Dedicated Operator

This proposal completely removes Crossplane as the manager of resources related to the service. Instead, a dedicated Kubernetes Operator that comes with its own CRD will be built.

appcat proposal 7.drawio
Figure 6. Service provisioning with a dedicated Operator

The operator might still use Crossplane resources as a means to provision those resources.

Also, there’s nothing that stops anyone to integrate Operator-managed resources into another Crossplane Composition later on. The following graph depicts how this proposal leaves a possible integration with Compositions open:

appcat proposal 7 cp.drawio
Figure 7. Service provisioning with a dedicated Operator, managed by Crossplane Composition
Advantages
  • Overall reduced complexity of the stack compared to Crossplane.

  • Flexible deployment using code and Kubernetes API.

  • Webhook controller can be built-in.

  • Built-in version update controller.

  • Generating the resources is unit testable.

Disadvantages
  • There may be a lot of code repetition between services that deploy the same set of common resources (though this can be counteracted with a code library).

  • More engineering effort needed in resource lifecycle management compared to Crossplane.

Proposal 8: Dedicated Provider With Common Provider

This is similar to proposal 5, but instead of a side chart a side provider deploys the common resources. It eliminates the disadvantages of Helm charts while increasing testability. Compared to proposal 3, it also moves the common resources from a simple shared code library to an actual provider.

appcat proposal 8.drawio
Figure 8. Service provisioning with a dedicated provider and a side provider

It’s assumed that a lot of services are going to be domain-specific, yet some common boilerplate is needed to fully make a service managed. These common resources could be deployed by a dedicated provider that can be reused from multiple services.

Advantages
  • Flexible deployment using code and Kubernetes API.

  • Webhook controller can be built-in.

  • Built-in version update controller.

  • Generating the resources is unit testable.

  • The dedicated provider is free of VSHN-specific resources, making the provider possibly sharable with Crossplane community.

Disadvantages
  • Loose coupling between service and additional resources (though Kubernetes API lookups may help out to a degree).

Decision

Proposal 7: Dedicated Operator

Rationale

Crossplane Composition API is limited and it alone can’t solve the problem mentioned at the top. While umbrella and side charts sound easy at first, they lack flexibility and testability compared to code running in some form of controller. Proposal 4 is risking a too lose coupling between the resources. A totally undefined foundation as in proposal 6 puts operational aspects into silos, requiring a learning curve for each individual service. Through method of elimination the remaining proposals are 3, 7 and 8.

Compared to proposal 3, proposal 7 is favored for the following reasons:

  • Compared to classic Kubernetes Operators, Crossplane brings a lot of added complexity that is technically not required. The cost of higher complexity is not worth the reduced engineering effort in resource lifecycle management that Crossplane brings.

  • If cloud instances were implemented with Crossplane Compositions, one could argue they would be aligned and comparable. However, this means that the converged service provisioning is artificially more complex than it needs to be. By now it’s already clear enough that converged services and cloud services are fundamentally different, both in implementation as in usage. Thus this alignment doesn’t bring much value and is not considered justified.

Compared to proposal 8, proposal 7 is favored for the following reasons:

  • While some experience could be gained by developing an AppCat service prototype, the actual details, dependencies and relations between common resource and service-specific resources are currently still rather uncertain. For this reason, it makes sense to start the first productive service with strong coupling by having the common artifacts in the same operator and internal code.

  • Once the dependencies and relations could be made out, it may be sensible to refactor and extract the common artifacts into their own provider. This would become similar to proposal 8.

Going for the most simplest and flexible option first (a dedicated Operator) still allows platform engineers to build Crossplane Compositions that make use of the service operator.