Out-of-Bounds Write in NVIDIA GPU Driver Kernel Mode Layer Allows Privilege Escalation
Overview
NVIDIA disclosed a high-severity vulnerability in the kernel mode layer of its GPU display driver for Windows. The flaw, identified as CVE-2024-0072, resides in a component responsible for managing GPU memory and processing shader commands. A specifically crafted, unprivileged user-mode application could trigger an out-of-bounds write condition in the kernel driver. Successful exploitation of this vulnerability could lead to a denial of service (DoS), causing a system crash (Blue Screen of Death). More critically, it could also be leveraged for local privilege escalation (LPE) and arbitrary code execution in the context of the NT kernel. An attacker who has already gained initial low-privilege access to a system—such as a multi-tenant cloud GPU instance or a shared developer workstation—could exploit this flaw to gain full system control. This type of vulnerability is particularly dangerous in AI/ML environments where large, complex models are processed on GPUs, as it could allow a malicious model or workload to break out of its user-space sandbox and compromise the host operating system. The discovery was part of NVIDIA's ongoing product security program and highlights the critical importance of maintaining up-to-date drivers for the underlying hardware that powers AI infrastructure.
Affected Systems
Testing Guide
1. On a Windows system, open the NVIDIA Control Panel, go to 'System Information' and check the 'Driver version'. 2. Alternatively, open a command prompt and run `nvidia-smi` to display the installed driver version. 3. Compare your installed version to the patched versions listed in the NVIDIA security bulletin. If your version is lower than the specified patch level for your branch, you are vulnerable.
Mitigation Steps
1. Update NVIDIA drivers to the latest version from the appropriate release branch (e.g., 551.52 or later for the R550 branch). 2. In multi-tenant environments, use hardware-level GPU partitioning and virtualization (e.g., NVIDIA vGPU, MIG) to isolate workloads. 3. Restrict direct GPU access for untrusted code. Run AI/ML workloads in heavily sandboxed environments (e.g., containers with gVisor or Kata Containers) to limit the impact of a kernel exploit. 4. Implement host-based intrusion detection systems (HIDS) to monitor for anomalous kernel activity.
Patch Details
Patches are available in NVIDIA driver release branches 550, 545, and 535. Users should update to the latest driver in their respective branch.