How Kubernetes QoS Prevents OOMKilled Errors

Kubernetes employs Quality of Service (QoS) classes to manage and prioritize pod resources, thereby mitigating Out of Memory (OOMKilled) errors.

Understanding OOMKilled Errors

OOMKilled errors occur when a container exceeds its allocated memory, prompting the Linux kernel’s OOM Killer to terminate the process to maintain node stability.

Kubernetes QoS Classes

Kubernetes assigns pods to one of three QoS classes based on their resource requests and limits:

Guaranteed QoS Class

To qualify for the Guaranteed class, all containers in the pod must specify equal memory and CPU requests and limits. These pods get the highest priority and are the last to be terminated under memory pressure.

Example YAML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
name: guaranteed-pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "500m"

Burstable QoS Class

Pods are considered Burstable if at least one container has memory or CPU requests set, but not all containers match requests and limits. These pods have a medium priority and may be terminated if the node runs out of resources.

Example YAML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
name: burstable-pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"

BestEffort QoS Class

Pods that have no resource requests or limits defined fall under the BestEffort class. These pods receive the lowest priority and are the first to be terminated under memory pressure.

Example YAML:

1
2
3
4
5
6
7
8
apiVersion: v1
kind: Pod
metadata:
name: besteffort-pod
spec:
containers:
- name: app
image: nginx

Preventing OOMKilled Errors

By appropriately configuring resource requests and limits, Kubernetes ensures that critical applications remain operational even under memory constraints, effectively reducing the occurrence of OOMKilled errors.

Want to learn more? Check out this detailed guide on diagnosing and preventing OOMKilled errors.