Operation Tools

This chapter explains the main tools we use to manage OCP3, Ansible and Puppet. For day two operations, we might use other tools as well, but this will get you started! Make sure to have at least some basic understanding of those tools!

Why Puppet and Ansible?

When we started with OpenShift 3, it was a sensible choice: We already had a working Puppet setup and already used it to manage all our systems. Our monitoring system required servers to be managed by Puppet in order to have them monitored. Plus we’ve already implemented a LOT of the basic stuff we required anyway (see chapter Puppet) in Puppet. However, in order to install and manage OpenShift 3, Ansible was required so it also made sense to use Ansible as well.

See our internal education recording "Ansible Playbooks for managing OpenShift" (held on 2020/07/09) if you prefer a video format.


At VSHN, we used Puppet from the get-go to manage all of our servers. On OCP3, it’s used to ensure some baseline configuration of all systems. Some examples include:

  • User accounts

  • Configuration of system services (for example SSH)

  • Firewall rules

  • sysctl settings

  • Monitoring

  • Backups

Go to the OpenSource Puppet docs for an introduction to Puppet.


For OCP3 clusters, profile_openshift3 does a lot of preparation work. The main jobs it does:

Generate Ansible inventory

Since OCP3 installation is facilitated using Ansible playbooks, most aspects of an OpenShift 3 cluster are configured via the Ansible inventory. profile_openshift3 applies a lot of VSHN-defaults, or extrapolates configuration from a minimal set of input configuration.

Install required Ansible modules

Ensure openshift-ansible, Mungg, and all required roles are installed on the Infra host (see Introduction).

Prepare systems for management using Ansible

Create an Ansible user account and distribute public keys.

Configure monitoring

Since our current monitoring solution uses the PuppetDB to collect all the monitoring targets.

We’re in the process to move most OpenShift monitoring to Prometheus/Alertmanager. To that end, we’ve developed Signalilo, which can forward Alertmanager alerts into Icinga2.


Ansible is the official way to install and configure OpenShift 3 (using Red Hat’s openshift-ansible project, see below), so it made sense to implement our custom cluster configuration in Ansible as well.

All Ansible playbooks and roles are installed on the Infra host (see Introduction), sometimes also referred to as the Ansible master.

Go to the Ansible User Guide for an introduction to Ansible!


Red Hat’s official way of installing, configuring and extending OpenShift 3 clusters is openshift-ansible — a (huge!) bundle of Ansible playbooks and Roles.

We use our own fork at vshn/openshift-ansible, which contains a few fixes which are still not merged into upstream (merging a PR can take months to years with Red Hat).

Some of the main uses:

  • Deploy OpenShift 3

  • Configure control plane

  • Configure nodes

  • Add new nodes

  • Configure network plugin

  • Deploy and configure additional components:

    • Logging

    • Metrics

    • Docker Registry

    • Router, ipfailover


In order to configure, customize and maintain our OCP3 clusters, we developed a set of Ansible playbooks and Roles under the project name "Mungg"[1].

Over time, a lot of features were added:

  • Perform operating system maintenance

  • Verifying cluster and node health

  • Install and configure various additional components like APPUiO pruner and Espejo

  • Partition systems using our Larawan image

  • Apply additional VSHN-specific configurations to OpenShift 3 components (that of course would not make sense to implement in openshift-ansible).

Mungg consists of two parts:

  1. mungg-overlord, the repository containing most of the playbooks. All the specific Ansible roles are required via its requirements.yml.

  2. A long list of Ansible Roles


Capra is another set of Ansible Roles, that have been extracted from Mungg. They facilitate package upgrades and system reboots, including the option to hard-reboot a server on supported infrastructure providers.

1. Which means Marmot in Swiss German!