Integrating Alertmanager with OpsGenie
For OpenShift4 clusters managed by team Aldebaran the manual steps documented here are no longer necessary. Everything is configured via the |
This document describes how to configure alert forwarding from Alertmanager to OpsGenie at VSHN. The how-to assumes that you’re familiar with managing schedules and escalations in OpsGenie and that alerts should be delivered to OpsGenie teams.
Prerequisites
In order to setup integrations and heartbeats in OpsGenie, you’ll need role "admin" in your OpsGenie team(s). To deploy the changes to the Commodore component, you need to be able to compile catalogs.
Enabling the integrations in OpsGenie (once per team)
The Prometheus and REST API integrations can be used to forward alerts from multiple clusters to OpsGenie |
In OpsGenie, navigate to
and enable the Prometheus and REST API integrations. You can use the same integrations to receive alerts from multiple Alertmanagers. In the REST API integration enable "Read Access" and "Create and Update Access." The REST API integration is required for heartbeat alerts, such as the "Watchdog" alert.Save the API keys which are generated when enabling the integrations in Vault, we’ll reference them from the Commodore hierarchy.
The snippets in this how-to assume that you’re using the default Commodore Vault hierarchy, and expect the OpsGenie API keys to be in the following locations:
-
Prometheus Integration:
clusters/kv/<tenant-id>/<cluster-id>/opsgenie/api-key
-
REST API Integration:
clusters/kv/<tenant-id>/<cluster-id>/opsgenie/heartbeat-password
Configuring the cluster heartbeat in OpsGenie (for each cluster)
We have heartbeat auto-creation enabled in opsgenie. Opsgenie will automatically create a heartbeat when it receives the first ping request. Manual creation is no longer necessary. |
Configure a heartbeat in Watchdog
alerts from the cluster as heartbeats in OpsGenie.
Create a new heartbeat, the name can be whatever you’d like.
However, we suggest using the Commodore <cluster-id>
of the cluster as the heartbeat name.
Set the heartbeat interval to two minutes.
Optionally, you can add the cluster’s display name (or any other descriptive name for the cluster) in the heartbeat’s description field.
Configuring Alertmanager to send alerts to OpsGenie
To configure Alertmanager to send alerts to OpsGenie we need to configure an Alertmanager receiver using Alertmanager’s opsgenie
integration.
First, we need to configure the opsgenie_api_key
in the global
section of the Alertmanager config.
For the global API key you’ll want to reference the Vault secret holding the API key for the Prometheus integration.
Depending on the Kubernetes distribution for which you’re configuring the integration, you’ll have to place the Alertmanager config into the corresponding component’s parameter. This how-to uses the |
openshift4_monitoring:
alertManagerConfig:
global:
opsgenie_api_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/opsgenie/api-key}
Next, we need to configure the OpsGenie receiver. First off, we want to set our team as the responder for all the alerts received from Alertmanager. We recommend that you use the team’s UUID for configuring the responders, as the UUID remains stable even if the team name is changed. We also configure the OpsGenie receiver as the default receiver.
The team’s UUID can be found in the URL of the team dashboard.
Alternatively, you can find the Team’s UUID by querying the OpsGenie API using the
|
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: opsgenie
opsgenie_configs:
- responders:
- id: <team-uuid>
type: team
route:
receiver: opsgenie
Additionally, we make use of Project Syn and kube-prometheus
conventions to improve the presentation of the alerts in OpsGenie.
One such convention is that the alert criticality is present as label severity
.
To ensure the configuration snippets which use fields in .GroupLabels
work correctly, alerts must be grouped by alertname
, namespace
, and severity
at least.
We don’t need group alerts by the |
openshift4_monitoring:
alertManagerConfig:
route:
group_by:
- alertname
- namespace
- severity
- cluster_id
First we want to map the alert group’s severity
label to an OpsGenie priority.
OpsGenie priorities are P1
, P2
, P3
and P4
in descending order of urgency.
We want to map critical
severity to P1
, warning
to P2
, info
to P3
and everything else (this includes alerts which don’t have a severity label) to P4
.
To achieve this mapping we add the following configuration in the OpsGenie receiver:
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: opsgenie
opsgenie_configs:
- priority: '{{ if eq .GroupLabels.severity "critical" }}P1{{ else if eq .GroupLabels.severity "warning" }}P2{{ else if eq .GroupLabels.severity "info" }}P3{{ else }}P4{{ end }}'
Next, we want to have a title for the OpsGenie alerts which gives some Project Syn information at a glance (tenant and cluster):
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: opsgenie
opsgenie_configs:
- message: '[{{ .CommonLabels.tenant_id }}/{{ .CommonLabels.cluster_id }}] {{ .GroupLabels.alertname }} in {{ .GroupLabels.namespace }}'
Because the default Alertmanager template for OpsGenie alert descriptions doesn’t fully match our use case, we deploy a custom template for the alert description.
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: opsgenie
opsgenie_configs:
- description: |-
{{ if gt (len .Alerts.Firing) 0 -}}
Alerts Firing:
{{ range .Alerts.Firing }}
- Message: {{ .Annotations.message }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Annotations:
{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Source: {{ .GeneratorURL }}
{{ end }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
Alerts Resolved:
{{ range .Alerts.Resolved }}
- Message: {{ .Annotations.message }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Annotations:
{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Source: {{ .GeneratorURL }}
{{ end }}
{{- end }}
To make alerts filterable, we add a number of key-value pairs as details
and a number of values as tags
.
OpsGenie allows filtering alerts both by tag
and by details.key
and details.value
.
Note that tags must be provided as a single comma-separated string to Alertmanager.
Alertmanager upstream has merged a PR (prometheus/alertmanager#2276) which will automatically add all common labels as details to the OpsGenie alert. As of 2021-04-07, there’s no Alertmanager release which contains this change. |
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: opsgenie
opsgenie_configs:
- details:
namespace: '{{- if .CommonLabels.exported_namespace -}}{{- .CommonLabels.exported_namespace -}}{{- else if .CommonLabels.namespace -}}{{- .CommonLabels.namespace -}}{{- end -}}'
pod: '{{- if .CommonLabels.pod -}}{{- .CommonLabels.pod -}}{{- end -}}'
deployment: '{{- if .CommonLabels.deployment -}}{{- .CommonLabels.deployment -}}{{- end -}}'
alertname: '{{ .GroupLabels.alertname }}'
cluster_id: '{{ .CommonLabels.cluster_id }}'
tenant_id: '{{ .CommonLabels.tenant_id }}'
severity: '{{ .GroupLabels.severity }}'
tags: '{{ .CommonLabels.tenant_id }},
{{ .CommonLabels.cluster_id }},
{{ .GroupLabels.severity }},
{{ .GroupLabels.alertname }},
{{ .GroupLabels.namespace }},
{{- if .CommonLabels.exported_namespace -}}{{ .CommonLabels.exported_namespace }},{{- end -}}'
Finally, we need to make sure that the Watchdog alert is sent to OpsGenie as a heartbeat instead of a regular alert.
To this effect, we configure an additional receiver which sends alerts to the OpsGenie REST API integration.
In particular, this receiver sends alerts to the heartbeat ping
endpoint for the heartbeat we’ve configured.
If you followed our suggestion and used the Commodore cluster-id
as the name for the heartbeat the snippet below will work out of the box.
For this receiver you need to provide the API key of the REST API integration, which should be stored in Vault.
In addition to the receiver, we also add a routing configuration to match alerts which are called Watchdog
and ensure they’re sent to the heartbeat
receiver with a repeat interval of one minute (60 seconds).
openshift4_monitoring:
alertManagerConfig:
receivers:
- name: heartbeat
webhook_configs:
- send_resolved: false
url: https://api.opsgenie.com/v2/heartbeats/${cluster:name}/ping
http_config:
basic_auth:
password: ?{vaultkv:${cluster:tenant}/${cluster:name}/opsgenie/heartbeat-password}
route:
routes:
- match:
alertname: Watchdog
repeat_interval: 60s
receiver: heartbeat
Full component configuration
Since we’ve discussed and shown individual elements of the Alertmanager configuration in the previous section, here’s the full, copy-pasteable configuration for component openshift4-monitoring
.
Parts of the configurations shown in this section will be moved into the VSHN Commodore defaults repo at some point in the future. |
To use this configuration for component rancher-monitoring simply move the contents of parameters.openshift4_monitoring.alertManagerConfig to parameters.rancher_monitoring.alertmanagerConfig .
|
parameters:
openshift4_monitoring:
alertManagerConfig:
global:
opsgenie_api_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/opsgenie/api-key}
receivers:
- name: opsgenie
opsgenie_configs:
- priority: '{{ if eq .GroupLabels.severity "critical" }}P1{{ else if eq .GroupLabels.severity "warning" }}P2{{ else if eq .GroupLabels.severity "info" }}P3{{ else }}P4{{ end }}'
message: '[{{ .CommonLabels.tenant_id }}/{{ .CommonLabels.cluster_id }}] {{ .GroupLabels.alertname }} in {{ .GroupLabels.namespace }}'
description: |-
{{ if gt (len .Alerts.Firing) 0 -}}
Alerts Firing:
{{ range .Alerts.Firing }}
- Message: {{ .Annotations.message }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Annotations:
{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Source: {{ .GeneratorURL }}
{{ end }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
Alerts Resolved:
{{ range .Alerts.Resolved }}
- Message: {{ .Annotations.message }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Annotations:
{{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
{{ end }} Source: {{ .GeneratorURL }}
{{ end }}
{{- end }}
details:
namespace: '{{- if .CommonLabels.exported_namespace -}}{{- .CommonLabels.exported_namespace -}}{{- else if .CommonLabels.namespace -}}{{- .CommonLabels.namespace -}}{{- end -}}'
pod: '{{- if .CommonLabels.pod -}}{{- .CommonLabels.pod -}}{{- end -}}'
deployment: '{{- if .CommonLabels.deployment -}}{{- .CommonLabels.deployment -}}{{- end -}}'
alertname: '{{ .GroupLabels.alertname }}'
cluster_id: '{{ .CommonLabels.cluster_id }}'
tenant_id: '{{ .CommonLabels.tenant_id }}'
severity: '{{ .GroupLabels.severity }}'
tags: '{{ .CommonLabels.tenant_id }},
{{ .CommonLabels.cluster_id }},
{{ .GroupLabels.severity }},
{{ .GroupLabels.alertname }},
{{ .GroupLabels.namespace }},
{{- if .CommonLabels.exported_namespace -}}{{ .CommonLabels.exported_namespace }},{{- end -}}'
responders:
- id: <team-uuid>
type: team
- name: heartbeat
webhook_configs:
- send_resolved: false
url: https://api.opsgenie.com/v2/heartbeats/${cluster:name}/ping
http_config:
basic_auth:
password: ?{vaultkv:${cluster:tenant}/${cluster:name}/opsgenie/heartbeat-password}
route:
group_by:
- alertname
- namespace
- severity
- cluster_id
receiver: opsgenie
routes:
- match:
alertname: Watchdog
repeat_interval: 60s
receiver: heartbeat