NVIDIA GPU Driver Use-After-Free Vulnerability Allowing Denial of Service and Privilege Escalation
Overview
A high-severity use-after-free vulnerability was identified in the NVIDIA kernel-mode driver for Linux. This vulnerability can be triggered by a local user with access to the GPU device, a common scenario in multi-tenant cloud AI platforms, enterprise Kubernetes clusters with GPU sharing, and local developer workstations. The flaw exists in the driver's handling of memory objects related to CUDA command submissions. An attacker can run a specially crafted CUDA application that manipulates these objects in a way that causes the kernel driver to reference freed memory. Successful exploitation can reliably lead to a denial of service (DoS) by crashing the entire host system, disrupting all workloads. Security researchers also demonstrated that under specific memory layout conditions, this vulnerability could be escalated to achieve arbitrary code execution within the context of the kernel. A kernel-level compromise allows an attacker to bypass all system security measures, access all data, and gain complete control over the host machine. This is particularly dangerous in shared GPU environments, as it would allow one user's containerized workload to break out and compromise the underlying host and potentially other tenants' workloads.
Affected Systems
Testing Guide
1. **Check Driver Version**: On a Linux system, run `nvidia-smi` to check the installed driver version. 2. **Compare with Advisory**: Compare the reported version with the 'Affected Versions' listed in the official NVIDIA security bulletin for the corresponding CVE. 3. **Use Vulnerability Scanners**: Employ host-based vulnerability scanners (e.g., Trivy, Qualys) that have plugins or checks for NVIDIA driver vulnerabilities. 4. **Review Container Images**: If using GPU-enabled containers, check the base images and build processes to ensure they are pulling from updated driver repositories.
Mitigation Steps
1. **Update NVIDIA Drivers**: Immediately update all system and container NVIDIA drivers to a patched version as specified in the NVIDIA security bulletin. 2. **Isolate GPU Workloads**: In multi-tenant environments, use robust isolation technologies. Avoid sharing a single physical GPU between untrusted tenants whenever possible. 3. **Restrict GPU Access**: Limit access to GPU devices to only trusted users and processes. 4. **Regularly Monitor Security Bulletins**: System administrators for GPU-accelerated infrastructure must subscribe to and regularly review security bulletins from NVIDIA and other hardware vendors. 5. **Implement Egress Filtering**: Restrict outbound network traffic from GPU workloads to prevent compromised systems from easily exfiltrating data or connecting to command-and-control servers.
Patch Details
Patched in NVIDIA driver versions 550.40.07, 545.29.06, 535.154.05 and later.