This page provides information about APPUiO / OpenShift dedicated nodes.
|This document explains how dedicated nodes work in OpenShift 3.x. The upcoming OpenShift 4.0 will require a separate strategy and documentation.|
In Kubernetes and OpenShift, a dedicated Node is a node like any other, with one major difference: a different set of labels applied.
When a customer buys one or multiple dedicated nodes, the portal behaves differently. For newly created projects, the default node selector is different.
If a customer already has a project in APPUiO, and then decides to buy a dedicated node, the existing nodes aren’t affected.
|One positive side effect of this is that, when pods are restarted for maintenance, they land on that dedicated node.|
If a customer has more than one dedicated node, for example N nodes, the recommended capacity planning is N+1.
|This capacity planning policy isn’t yet applied throughout all clusters at VSHN.|
Among the advantages of dedicated nodes, please note the following:
Higher security as only pods of the same organisation are running on the same kernel (useful in the case of threats like Spectre or Meltdown).
Reserved resources: there are no "nosy neighbors" competing for performance and CPU power.
Possibility to have additional network interfaces, and therefore dedicated connections to some workload outside of the cluster (for example, a database server).
The risk of unused capacity is borne by the customer.
Suitable for system enviroments, to provide the benefit of node maintenance during the day, for example for test and development enviroments.
|Regarding the last point above, we currently don’t use dedicated nodes for different system enviroments.|
This approach has some caveats:
Management of additional node labels.
Increased complexity in the maintenace process (Worker nodes aren’t equal.)
Requires dedicated capacity management for those nodes.
Desaster recovery process becomes more complex.
It somehow goes against the "Kubernetes philosophy" of treating clusters as cattle, and not as pets.
Currently, teams are notified of problems through monitoring customer complaints. Given the low amounts of data available, it isn’t possible to warn customers proactively about possible problems. All things considered, it’s a very fragile manual process.
Dedicated nodes in APPUiO feature specific settings for each customer. However, in a platform like APPUiO it would be preferable to have homogeneous nodes in terms of configuration–but not in terms of hardware capacity.
With heterogenous configurations, in a case of a not recoverable failure in a short period of time, teams can’t migrate fast because of the existence of that particular configuration.
OpenShift as a platform should be able to do this automatically; which it does if there is enough capacity, but otherwise it doesn’t.
OpenShift has a certain number of policies which, when coupled with dedicated nodes, can lead to severe outage situations: images cleaning and pod eviction for example.
If these policies are applied, a cluster can get into unhealthy state; all of this takes time, and if you don’t have enough multiple nodes, things are more complicated.
Dedicated nodes are static in nature, which means that most common maintenance tasks, like for example when a node runs out of disk space, usually mean a reboot.
The currently sold "dedicated nodes" product implicitly provides this maintenance, which creates in turn false expectations. We can’t use this approach today because our way of managing clusters is too static.
|Some customers have just one dedicated node, which is a de facto single point of failure.|
At a high level, nodes are "disposable." Today it’s the case with Ansible: we don’t try to repair nodes, we just create a new one. Capacity evaluation can determine the number of nodes required to host a given workload (that’s precisely the "N+1 planning" mentioned above).
With the proper planing, the following communication would be possible:
Dear customer, we’ve two nodes and we would prefer to put your workload there, but if you don’t want that, we could run your workload in the "General Compute" pool.
|The "General Compute" pool would be a "lightweight" dedicated node.|
In such a scenario, if a node needs to be updated, then a new one is created, configured like a dedicated node, and workloads would be migrated, so that the old node is destroyed. This could happen at particular times or on a case-by-case basis.