Ninja Docs Help

FailedCreatePodContainer - unable to ensure pod container exists: failed to create container

Revision

Date

Description

1.0

24.07.2024

Init Changelog

Problem

During pod creation on K8s, FailedCreatePodContainer warning shows. Event message will look like:

unable to ensure pod container exists: failed to create container for [kubepods burtable podbb4b05d1-1506-49df-9321-cfa434373319[] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/podbb4b05d1-1506-49df-9321-cfa434373319: cannot allocate memory

This completely blocks pod creation and is connected with kubelet / CRI on Kubernetes Node.

Requirements

To fix problem you will need:

  1. Workstation:

    • kubectl installed.

  2. Kubernetes:

    • RO access to Events on cluster.

    • Permissions to run drain.

  3. Node:

    • Permissions to check running services on virtual machine.

    • Permissions to stop and start kubelet service.

    • Permissions to stop and stop CRI service (docker, containerd, etc.).

Solution

To solve problem you need to:

  1. Get corrupted Node name:

    kubectl get events -A --field-selector reason=FailedCreatePodContainer -o=custom-columns=KIND:.involvedObject.kind,NAMESPACE:.involvedObject.namespace,NAME:.involvedObject.name,NODE:.source.host,REASON:.reason,MESSAGE:.message
  2. Get all pods running on Node (remember to set Node name in --field-selector option):

    kubectl get pods -A --field-selector spec.nodeName=<node_name> --field-selector status.phase=Running -o custom-columns=NODE:.spec.nodeName,NAMESPACE:.metadata.namespace,NAME:.metadata.name | grep <node_name>
  3. Copy pod list and send it on Teams with info about Node drain.

  4. Drain corrupted Node (remember to set Node name in command):

    kubectl drain --grace-period=-1 --force --ignore-daemonsets --delete-emptydir-data <node_name>
  5. Log into corrupted Node:

    ssh <node_name>
  6. Check running CRI service on Node - it will be docker or containerd:

    sudo systemctl list-units --type service
  7. Stop and start kubelet and CRI services - do not mess with command order:

    sudo service kubelet stop && sudo service docker stop sudo service docker start && sudo service kubelet start
    sudo service kubelet stop && sudo service containerd stop sudo service containerd start && sudo service kubelet start
  8. Uncordon Node (remember to set Node name in command):

    kubectl uncordon <node_name>

Permissions

ClusterRole for Node draining:

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:node-drainer rules: # Needed to evict pods - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] # Needed to list pods by Node - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"] # Needed to cordon Nodes - apiGroups: [""] resources: ["nodes"] verbs: ["get", "patch"] # Needed to determine Pod owners - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["get", "list"] # Needed to determine Pod owners - apiGroups: ["extensions"] resources: ["daemonsets", "replicasets"] verbs: ["get", "list"]

ClusterRole for read Event

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:event-watcher rules: - apiGroups: - "" resources: - events verbs: - get - list - watch
Last modified: 17 February 2025