Container Escape and Privilege Escalation via NVIDIA CUDA Driver IOCTL Handling
Overview
A critical privilege escalation vulnerability was found in the NVIDIA GPU driver for Linux, affecting containerized AI/ML workloads in Kubernetes and Docker environments. The vulnerability resides in the kernel-mode driver's handling of specific IOCTL (Input/Output Control) calls originating from the user-mode CUDA library. A threat actor with execution privileges inside a container with GPU access can craft a sequence of CUDA API calls that result in a malformed IOCTL request sent to the underlying `/dev/nvidia` device. This request triggers a use-after-free condition in the kernel driver, allowing for arbitrary kernel memory write operations. An exploit can leverage this primitive to overwrite function pointers in kernel data structures or disable security features like Seccomp and AppArmor from within the container. Successful exploitation allows the attacker to break out of the container sandbox and gain full root privileges on the host operating system. This compromises the entire node, including all other containers running on it, posing a severe threat to multi-tenant GPU clusters used for AI training and inference.
Affected Systems
Testing Guide
1. On a non-production host system, check the installed NVIDIA driver version using `nvidia-smi`. 2. If the version is below the patched release, the system is vulnerable. 3. Run the proof-of-concept exploit code provided by the security researchers inside a GPU-enabled Docker container. 4. If the exploit successfully returns a root shell on the host (`uid=0`), the system is confirmed to be vulnerable.
Mitigation Steps
1. Update the host machine's NVIDIA drivers to the patched version `560.xx` or newer. 2. In Kubernetes environments, use security policies like Kyverno or OPA Gatekeeper to prevent pods from running with privileged security contexts. 3. Apply gVisor or Kata Containers as an additional sandboxing layer for untrusted ML workloads, which can intercept and mitigate malicious syscalls and IOCTL requests. 4. Regularly scan container images for known vulnerabilities and ensure they do not contain tools that could be used to craft such exploits.
Patch Details
Patched in NVIDIA driver version 560.xx and backported to several long-lived branches.