NVIDIA GPU Driver Kernel Mode Layer Allows Privilege Escalation
Overview
A high-severity vulnerability was found in the kernel mode layer of the NVIDIA GPU driver for Linux. The vulnerability, identified as CVE-2025-10773, allows a local attacker with basic user privileges to perform a specially crafted sequence of IOCTL calls to the driver's device file (`/dev/nvidia*`). This manipulation leads to a race condition in the memory management unit, causing a use-after-free error. An attacker can exploit this condition to write arbitrary data to kernel memory. Successful exploitation allows the attacker to overwrite function pointers or critical kernel data structures, leading to privilege escalation from a standard user account to the `root` user. This vulnerability is particularly dangerous in multi-tenant AI and ML environments, such as shared JupyterHub instances or Kubernetes clusters with GPU sharing, where users are not supposed to have administrative access. A malicious user could exploit this flaw to escape their container, gain control of the host node, and potentially access or disrupt the workloads of other users sharing the same physical GPU, leading to data theft, model tampering, or complete denial of service for critical AI training jobs.
Affected Systems
Testing Guide
1. **Identify Driver Version:** On a Linux system with an NVIDIA GPU, run `nvidia-smi` to check the installed `Driver Version` in the top-right corner. 2. **Check Against Affected Versions:** Compare the installed version with the list of affected versions. For example, if you are on the 550 series, any version below `550.76` is vulnerable. 3. **Run a Proof-of-Concept (If Available):** Use a non-destructive proof-of-concept (PoC) tool provided by security researchers to check for the vulnerability in a controlled test environment. Do not run untrusted exploit code.
Mitigation Steps
1. **Update NVIDIA Drivers:** Immediately update all affected systems to the patched driver versions released by NVIDIA. 2. **Restrict GPU Access:** In multi-tenant environments, use security mechanisms like Kubernetes Pod Security Policies or admission controllers to limit which users and pods can access GPU devices. 3. **Use Virtualized GPUs (vGPU):** For stronger isolation, use NVIDIA vGPU technology, which provides a more robust security boundary between tenants sharing a physical GPU. 4. **Monitor Driver-Related Activity:** Implement monitoring and alerting for unusual activity related to GPU driver interactions and system calls.
Patch Details
Patched in NVIDIA GPU Display Driver versions 550.76, 545.92, 535.154 and later.