Multi-Tenancy
This document explains the implementation of multi-tenancy in the centralized logging / monitoring.
Basics: X-Scope-OrgID
Multi-tenancy in Loki and Mimir is implemented by setting the X-Scope-OrgID
header in all HTTP requests when writing data to and reading data from Loki or Mimir.
The value of the header is set to the organization’s unique Keycloak group name
, a human readable, all lowercase single word identifier (don’t confuse it with the Keycloak group id
which is a database row identifier or the organization’s displayName
, which actually contains the proper name of the organization).
Making sure that this header is set to the correct value for all use cases brings a few challenges which are explained below.
Note that sometimes the X-Scope-OrgID
header is used to isolate logs or metrics not by organization, but by use case (e.g. billing).
This document ignores those cases.
OpenShift LogForwarding
OpenShift’s LogForwarding can be configured to set the X-Scope-OrgID
header when sending data to Loki.
This configuration may look something like this:
openshift4_logging:
clusterLogForwarder:
inputs:
my-logs:
application: [...]
outputs:
vshn-central-logs:
type: loki
loki:
tenantKey: openshift.labels.tenant_id
labelKeys:
- openshift.labels.tenant_id
This configuration works well in situations where a LogForwarder instance is responsible for exactly one customer, as it is the case e.g. on an VSHN Managed OpenShift cluster.
Multiple Organizations
There are situations in which a LogForwarder instance is responsible for multiple organizations. A typical example is the LogForwarder in an APPUiO cluster. This LogForwarder instance handles logs from many namespaces which can belong to different organizations.
The LogForwarder can set the tenantKey
based on namespaces labels, see OpenShift Documentation.
Prometheus Remote Writes
Prometheus can be configured to set the X-Scope-OrgID
header when sending data to Mimir.
This configuration may look something like this:
prometheus:
_remoteWrite:
mymetrics:
url: [...]
headers:
"X-Scope-OrgID": somecustomername
writeRelabelConfigs:
[...]
This configuration works well in situations where a single Prometheus instance is responsible for exactly one customer, as it is the case e.g. on an APPUiO Managed cluster.
Multiple Organizations
There are situations in which a single Prometheus instance is responsible for multiple organizations.
A typical example is the openshift-user-workload-monitoring
Prometheus instance in an APPUiO Cloud cluster.
This Prometheus instance handles metrics from many namespaces which can belong to different organizations.
Prometheus itself is not able to set the X-Scope-OrgID
header dynamically, hence a proxy server is needed to fix the HTTP requests.
The cortex-tenant-ns-label proxy does this job.
It is a fork of the cortex-tenat proxy which does a similar job, but not quite the same.
The cortext-tenant-ns-label proxy reads all namespaces and their annotations from the k8s API and converts this into a lookup table namespace
→ organization name
, which is refreshed regularly.
The proxy receives remote writes from Prometheus which are analyzed, separated by namespace
label and forwarded to Mimir with the correct X-Scope-OrgID
header from the lookup table.
A single Prometheus remote write HTTP request likely needs to be split up into multiple HTTP requests to Mimir, because the former can have metrics from different organizations mixed together while the latter can’t.
There is one cortex-tenant-ns-label proxy per APPUiO Cloud cluster, installed via SYN component component-cortex-tenant-ns-label.
Prometheus remote writes are configured as normal without the X-Scope-OrgID
header but with the destination url set to the cortex-tenant-ns-label proxy.
Grafana
Grafana itself is partially multi-tenant capable by having multiple organizations. There are many details to consider though.
Grafana cannot filter data it receives from Mimir or Loki, hence we have to make sure that Mimir or Loki only returns data for the organization that is currently selected in Grafana.
This requires setting the correct X-Scope-OrgID
header when querying Mimir or Loki.
This can be achieved by having a dedicated data source per organization with this header set to the organization name.
As a result we can’t give any users admin
privileges even on their own Grafana organizations because that would allow them to change the X-Scope-OrgID
header value and see other organization’s data.
Hence organization users must remain restricted to the editor
and viewer
roles, and this restriction must be enforced.
editor
permissions are sufficient to create/change dashboards hence this should not be a problem in practice.
Grafana does not offer a way to store the Keycloak organization name, which is a problem because we need to reliably link Grafana organizations to Keycloak groups (which represent organizations).
We solve this by using a particular scheme for the Grafana organization name: [keycloak organization name] - [organization display name]
, e.g. vshn - VSHN AG
.
Since the Keycloak organization name cannot contain spaces this scheme is unambigous.
If we want to provide some standard dashboards to all organizations, these need to be set up in every organization separately. There are also situations in which Grafana modifies a dashboard, and without further measures any sync process that sets up these dashboards would revert the changes.
This complex setup needs to be maintained automatically. This is the job of the grafana-organizations-operator (installed via component-grafana-organizations-operator) where you will also find more details on how this works.