ADR 0049 - Migrate MinIO to Rclone

Author

Mike Ditton

Owner

Schedar

Reviewers

Schedar

Date Created

2025-12-29

Date Updated

2026-01-06

Status

rejected

Tags

minio,rclone,objectstorage

Summary

Migrate from MinIO to Rclone for backup storage to address MinIO’s shift to maintenance-only mode. Evaluate per-instance vs. cluster-wide deployment architectures.

Context

Recently, the open source version of MinIO entered a maintenance-only state and will no longer receive updates. The application catalog relies on MinIO to store backups on CSPs that don’t provide an external S3 solution.

In ADR 0042 - Backup Encryption for MariaDB Galera and PostgreSQL, we decided to use Rclone as an intermediate S3 proxy for encrypting backups for operators that do not support encryption. We can further extend this by using Rclone as a backend for all backup targets.

Rclone

Rclone can serve an S3-compatible API using rclone serve s3. We can utilize this S3-compatible API to write backups into a PVC. For more information, see the full evaluation of Rclone in ADR 0042.

Rclone instance per AppCat instance

In this approach, each AppCat instance would have its own dedicated Rclone deployment. This architecture provides several advantages:

  • Isolation: Each instance has complete independence from other instances, preventing cross-instance interference

  • Security: Credentials are scoped to individual instances with no shared access

  • Reliability: Failures are isolated to individual instances rather than affecting all backup operations cluster-wide

The main trade-off is increased resource consumption, as each instance requires its own Rclone deployment. However, the improved isolation and elimination of single points of failure outweigh this cost.

Proof of Concept

A proof of concept was conducted to evaluate the viability of using rclone serve s3 as a backup backend with multiple backup mechanisms. The PoC tested both K8up-based backups (VSHNForgejo) and operator-native backups (VSHNPostgreSQL with StackGres). The PoC implementation can be found in AppCat PR #564.

Implementation Overview

The PoC deployed a complete backup infrastructure for each AppCat instance, consisting of:

  • Dedicated Rclone deployment (not a sidecar)

  • ClusterIP service exposing the S3 API on port 8080

  • 10Gi PVC for backup storage

  • Autogenerated S3 credentials using secure random generation

  • Backup system configured to use the Rclone endpoint (K8up Schedule for Forgejo, SGObjectStorage for PostgreSQL)

Rclone’s compatibility with the MariaDB and Cloud Native PostgreSQL operators was previously validated in ADR 0042 - Backup Encryption for MariaDB Galera and PostgreSQL.

Successful Functionality

The PoC demonstrated that Rclone can successfully serve as an S3-compatible backend for multiple backup mechanisms:

  • K8up/Restic (VSHNForgejo): Test backups completed successfully with data correctly stored in the Rclone PVC

  • StackGres/WAL-G (VSHNPostgreSQL): Backup and restore operations worked as expected, validating compatibility with operator-native backup systems

All Kubernetes resources deployed without issues, and the automatic credential generation worked as expected for both implementations.

Configuration

The following Rclone flags were essential for reliable backup operations:

--vfs-cache-mode full            # Cache both reads and writes
--vfs-write-back 0s              # Write immediately on close
--dir-cache-time 10s             # Faster directory cache refresh
--no-checksum                    # Disable MD5 for compatibility
--no-modtime                     # Disable modification time tracking
--force-path-style=true          # Use path-style S3 URLs

The --vfs-write-back 0s flag is important, as the default 5-second write delay caused Restic to fail when attempting to read immediately after writing.

PostgreSQL/StackGres-Specific Considerations

StackGres/WAL-G expects S3 buckets to already exist before writing backups. Since rclone serve s3 represents buckets as filesystem directories, the bucket directory must be pre-created on startup:

mkdir -p /data/postgres-backup && rclone serve s3 /data ...

Limitations and Considerations

The PoC revealed some limitations:

Experimental Status and Multipart Uploads

The rclone serve s3 feature is officially marked as experimental with limited S3 API compatibility. A known issue is that multipart uploads hold all parts in memory, which could cause OOM errors. However, this limitation is not relevant under the default K8up configuration: Restic’s default 16 MiB pack size stays at or below the minio-go library’s multipart threshold (16 MiB), resulting in single PUT operations rather than multipart uploads. The PoC’s 512Mi memory limit is adequate for this default behavior. Additionally, server-side copy operations for files >5GB fail in rclone serve s3, though this does not affect backup upload workflows.

Single Rclone instance per cluster

This approach would replicate the current MinIO architecture, with a single centralized S3-compatible storage service per cluster. While this is resource-efficient, it introduces some authentication challenges.

Rclone’s S3 server supports two authentication methods:

  • Single credential set shared across all instances

  • htpasswd file containing multiple credential pairs

The single credential approach is not viable from a security perspective, as it eliminates isolation between instances. The htpasswd approach requires dynamically updating the authentication file for each new instance, introducing additional complexity and potential points of failure.

VFS Caching and Single Replica Limitation

Rclone can only run with a single replica due to VFS caching limitations. The --vfs-cache-mode full flag is required for K8up/Restic compatibility because Restic reads files immediately after writing to verify pack files and check locks. Without full caching mode, files cannot be opened for both read and write simultaneously, and seeking in write-mode files is not supported.

Rclone offers four VFS cache modes (off, minimal, writes, full) with increasing compatibility and disk usage. The full mode buffers all reads and writes to a local disk cache, enabling normal file system operations and automatic retry of failed uploads. However, this cache is local to each pod and cannot be shared across replicas—running multiple replicas would cause cache inconsistencies and potential data corruption.

Implementation

Bucket Provisioning Architecture

The implementation follows the existing bucket provisioning pattern used for CloudScale and Exoscale, and MinIO:

  1. New Rclone Bucket Function

    Create pkg/comp-functions/functions/buckets/rclonebucket/ following the existing bucket function structure. Services request buckets via backup.CreateObjectBucket(), which creates an XObjectBucket composite resource processed by provider-specific functions.

  2. Composition Selection

    Each S3 backend has its own Crossplane Composition definition (deployed from component-appcat):

    • objectbucket.rclone.appcat.vshn.io → rclonebucket

    • objectbucket.cloudscale.appcat.vshn.io → cloudscalebucket

    • objectbucket.exoscale.appcat.vshn.io → exoscalebucket

      Services select the appropriate composition when creating the XObjectBucket. The composition then passes the serviceName to the function via ConfigMap input:

      # Inside the composition definition
      input:
        apiVersion: v1
        kind: ConfigMap
        data:
          serviceName: rclonebucket
  3. Resource Deployment

    A wrapper Helm chart deploys rclone resources via provider-helm. The chart exposes only necessary values for configuration.

    Each instance gets dedicated rclone resources in the appcat-backup namespace:

    • Deployment: rclone-<composite-name>

    • Service: rclone-<composite-name> (ClusterIP on port 8080)

    • PVC: rclone-<composite-name>-storage

    • Secret: rclone-<composite-name>-credentials (generated S3 credentials)

      The rclone command pre-creates the bucket directory: mkdir -p /data/postgres-backup && rclone serve s3 /data …​

  4. Connection Details

    The function returns standard S3 connection details consumed by all backup systems:

    ENDPOINT_URL: http://rclone-<composite-name>.appcat-backup.svc.cluster.local:8080
    BUCKET_NAME: postgres-backup
    AWS_REGION: local
    AWS_ACCESS_KEY_ID: <generated>
    AWS_SECRET_ACCESS_KEY: <generated>
  5. Bucket Pre-creation

    For operators requiring pre-existing buckets (StackGres/WAL-G, Cloud Native PostgreSQL), the rclone container command creates the bucket directory on startup before serving the S3 API.

Decision

We will not migrate to Rclone for backup storage. While Rclone proved technically viable in the proof of concept, the solution is not production-ready, given its experimental S3 API status, memory limitations with multipart uploads, and VFS caching constraints.

A follow-up ADR will be created to evaluate alternative solutions for replacing MinIO as our backup storage backend.

Consequences

With the decision to not proceed with the Rclone approach:

  • A new ADR will evaluate alternative S3-compatible storage backends that are production-ready