Kubernetes Resource Management

Scheduling and resource management is a topic many Kubernetes users seem to struggle with, even though it is vital to understand it and correctly configure your workload to ensure optimal resource usage and application availability. In this article, we’ll explain what scheduling and resource management exactly is, how you configure and use them, and go into some best practices.

Target audience

This is a technical article targeting developers deploying applications onto Kubernetes, as well as cluster administrators.

Resource Requests & Limits on Pods & Containers

Requests vs Limits

When creating a Pod in Kubernetes, it’s possible to specify its resource requirements for its containers. This is done using two concepts called requests and limits:

Resource requests and limits are defined on a Container level, however since a Pod is the smallest schedulable unit we use the term "a Pod’s resources" in this article. A Pod’s resources is simply the sum of its Containers' resources.
Requests

An amount of resources that a container must have guaranteed to have available. When a Pod is running on a Node, those resources will be reserved for that pod.

Limits

As the name implies, a limit of how much of a given resource the container may contain for short periods of time. We’ll explain what happens when a container exceeds this limits later in this article.

Resource Types

The two resource types that can be configured are CPU and Memory[1].

CPU

Resource requests and limits for CPU are measured in "CPU units". One CPU (vCPU/Core on cloud providers, hyper thread on bare metal) is equivalent to 1 CPU unit.

CPU units are always measured as an absolute quantity, not as relative ones. So "1 CPU unit" is the same amount of CPU on a single core system as it is on a 256 core machine. However the single core system will only have one CPU unit capacity (we’ll come to that later), while the 256 core machine will have 256 CPU units capacity.

CPU requests and limits can be expressed as mCPU (milli CPU), or "millicore" as they are often referred to as. Each CPU can be divided into 1000 mCPU (because, you know, that’s what "milli" means).

  • 500m - half a CPU

  • 1000m == 1 - one CPU

  • 100m - one tenth of a CPU

The smallest allowed precision is 1m.

Memory

Resource requests and limits for Memory are measured in bytes. You can use the following suffixes: K, M, G, T, P, E, Ki, Mi, Gi, Ti, Pi, Ei:

  • 1K == 1000

  • 1Ki == 1024

  • 1M == 1000K == 1'000'000

  • 1Mi == 1024Ki == 1'048'576

  • …​ and so on

Usually the "power of two" suffixes (Ki, Mi, Gi, …​) are used, so if you’re unsure what to use, stick to them.

Configuring Requests & Limits

Configuring resource requests & limits is done by setting the .spec.containers[].resources field on a container spec:

Example Pod
apiVersion: v1
kind: Pod
metadata:
  name: resource-example
spec:
  containers:
    - name: app
      image: app
      resources:
        requests:
          cpu: "100m" (1)
          memory: "128Mi" (2)
        limits:
          cpu: "1" (3)
          memory: "1Gi" (4)
1 CPU requests
2 Memory requests
3 CPU limits
4 Memory limits
Since pods usually are created by Deployments (or DeploymentConfigs if you are using OpenShift), you would instead set the deployment’s .spec.template.spec.containers[].resources field.

It is not necessary to set all of the values. For example it’s possible to configure only Memory requests and CPU limits.

On APPUiO Public clusters, we enforce the usage of resource requests and limits using LimitRanges. They define the range of possible values as well as default values that will be applied if you do NOT specify any resource requests or limits.

Scheduling

In order to understand resource management properly, we first have to understand how kube-scheduler, the default scheduler for Kubernetes, works.

In Kubernetes, scheduling refers to making sure that Pods are matched to Nodes so that Kubelet can run them.

— Kubernetes documentation

The job of the scheduler is to take new Pods and assign them to a Node in the cluster.

It is possible to implement your own scheduler, however for most use cases the default kube-scheduler is sufficient — especially since it can be customized using scheduling policies.

Word is that CERN implemented its own scheduler to achieve workload packing (= avoiding workloads to be spread across nodes), however today this can be achieved using scheduler policies.

kube-scheduler

Whenever kube-scheduler sees a new Pod that is not assigned to a Node (indicated by the fact that the Pod’s .spec.nodeName is not set), it assigns the Pod to a Node in two phases:

Filtering

During this phase, the scheduler determines which nodes are eligible for the Pod to be scheduled on. In the beginning, all nodes are candidates. The scheduler then applies various filter plugins, for example: Does the Node fit the Pods nodeSelector? Has the Node sufficient resources available? Has the Node any taints that are not tolerated by the pod? Is the Node marked as unschedulable? Does the Pod request any special features, for example a GPU?

If after this step no Nodes are left, the Pod will not be assigned to a Node and stay in "Pending" state. An Event is added to the Pod explaining why scheduling failed.

If a pod stays in "Pending", use kubectl describe pod/<POD> and check the "Events" section to see why it failed.

Scheduling policy predicates can be used to configure the Filtering step of scheduling.

Scoring

In the second phase, the remaining Nodes are ranked. Again, various scoring plugins are used.

The default configuration tries to spread workload as even across the cluster as possible, minimizing the impact of a node becoming unavailable.

Once these two steps are completed, the scheduler will assign the Pod to the highest-ranking Node, and the Kubelet on that node will spin up its containers.

Resources and scheduling

As we can see, both the Filtering and Scoring phases of scheduling take "resources" into consideration, so let’s have a look at them next.

The two most important resources are CPU and Memory (RAM). Kubernetes tracks other resources as well (like disk space, available PIDs or network ports) but we’ll focus on this two.

Upon startup, the Kubelet determine how much resources the system it runs on has available. This is called the node’s capacity. Next, it reserves a certain amount of CPU and Memory for itself and the system. What’s left is called the Node’s _allocatable_resources. The Kubelet will communicate this information back to the control plane.

If you are cluster-admin, you can view a Node’s resources using the kubectl describe node <NODE> command (watch for the Capacity and Allocatable keys) or in the Node object’s .status.capacity and .status.allocatable fields.
For APPUiO clusters, the Kubelet reserves a total of 4Gi Memory and 400mCPU for itself and the system.

During scheduling, this information is used to determine whether a Pod would "fit" onto a Node or not by taking a Node’s allocatable resources and subtracting the requests of all Pods already running on the Node. If the remaining resources are greater than the requests of the Pod, it will fit.

Out of resource handling

Before we look into what happens when a node runs out of a resource, we first have to cover another concept: Quality of Service classes

QoS Classes

Kubernetes knows three QoS classes: "Guaranteed", "Burstable" and "BestEffort".

When a Pod starts, its QoS class is determine based on the resource requests and limits of its containers:

Guaranteed is assigned when

  • every container has both requests and limits set for both CPU and Memory

  • for each container, the requests and limits have the same values set.

The Pod is guaranteed to have the resources it has requested available.

Burstable is assigned when a Pod does not qualify for the "Guaranteed" QoS class, but at least one container has CPU or Memory requests set.

The Pod has its requested resources available, but may use more resources for a short period (aka burst).

BestEffort is assigned to Pods that have no requests or limits set at all.

The Pod may use resources available on a best effort basis.

What happens when a Pod exceeds its resource limits

CPU is a so-called "compressible" resource. This means, when a container exceeds its CPU usage limits, it will simply be throttled. A container with a CPU limit of "100m" cannot use more than 0.1 seconds of CPU time each second.

Memory on the other hand is not "compressible", so when a container exceeds its memory limit, it will be terminated (and restarted of course).

What happens when a node runs out of resources

Again, since CPU is a "compressible" resource, the Kubelet does not act on CPU starvation. Each container will have the CPU resources available that it requested - yes, this means that "BestEffort" Pods really get into a tight spot…​

Out of Memory handling however triggers an eviction. While evictions (and how they can be configured) would cover a whole blog post on its own, it usually ends with Pods being terminated and moved to different nodes. This is where the QoS classes play an important role: They decide, who gets killed:

First in line are pods that exceed their memory requests are killed, based on their memory usage in relation to their memory requests. Since "BestEffort" pods do not have any requests at all, they will be killed first. However, "Burstable" Pods might also be killed if they exceed their requests.

Since "Guaranteed" pods cannot exceed their requests (because they are equal to their limits), they are never killed because of another pods resource usage.

However, in the rare case that system services on a node (not running in Kubernetes) use more resources than was reserved for them (see "resource reservations" in "Resources and scheduling"), even "Burstable" or "Guaranteed" pods will be killed.

What happens when a cluster runs out of resources

This is the case if your overall resource requests exceed the allocateable resources within your cluster. In this case, when a new pod that has resource requests for the starved resource, it cannot be scheduled and will remain in status "Pending".

Best practices

You should now have a fairly good understanding of how scheduling works on Kubernetes. As a conclusion, We want to share a few best practices:

  • Use requests and limits extensively - it helps the scheduler to distribute your workload more evenly across your cluster.

  • Use QoS classes to your advantage, for example by making sure all production workloads are assigned a "Guaranteed" QoS class. This means that in case of an out of resource situation, your production environment is not killed by the OOM killer.

For cluster administrators, there are some more points:

  • Plan AT LEAST one node worth of "resources" as "leftover". This allows your cluster to tolerate the loss of a node - both planned (during maintenance) or unplanned (node crashes).


1. for Kubernetes 1.14+ there’s also the "huge pages" resource type, but we’ll not go into those in this article