Use-After-Free in NVIDIA GPU Driver Enables Denial of Service or Privilege Escalation in ML Workloads
Overview
A high-severity use-after-free vulnerability was discovered in the kernel mode layer of the NVIDIA GPU Display Driver for both Windows and Linux. The flaw resides in the driver's handling of memory resources, where a specific sequence of API calls from a user-mode application can cause the driver to deallocate a memory object but retain a dangling pointer to it. An attacker with local, unprivileged access to a system with an affected NVIDIA GPU can exploit this vulnerability. By carefully manipulating memory, the attacker can cause this dangling pointer to be reused, leading to a write or read operation on a controlled memory address. The most common impact of successful exploitation is a system crash, resulting in a denial of service (DoS). In a multi-tenant cloud environment or on-premise AI training cluster, this allows a single malicious user to crash the entire host machine, disrupting all other users' jobs. More advanced exploitation could potentially lead to arbitrary code execution within the kernel, allowing the attacker to escalate privileges to the highest level (SYSTEM on Windows, root on Linux) and completely compromise the host system. This poses a significant risk to AI infrastructure, which relies heavily on GPUs and often runs code from multiple users or customers.
Affected Systems
Testing Guide
1. **Check Driver Version**: On Windows, open the NVIDIA Control Panel and check the 'System Information' tab for the driver version. On Linux, run the command `nvidia-smi` which will display the driver version in the top right corner. 2. **Compare with Patched Versions**: Cross-reference your installed driver version with the versions listed as 'affected' in the CVE advisory or NVIDIA security bulletin. 3. **Run Vulnerability Scanners**: Use host-based vulnerability scanning tools that have checks for this specific CVE to automate the detection process across your infrastructure.
Mitigation Steps
1. **Update Drivers**: Immediately update NVIDIA GPU drivers to the patched versions listed in the NVIDIA security bulletin. 2. **Restrict GPU Access**: In multi-tenant environments, use technologies like NVIDIA MIG (Multi-Instance GPU) to partition GPUs and provide stronger isolation between workloads. 3. **Monitor System Logs**: Monitor for unexpected system crashes or kernel panics on GPU-enabled nodes, as this could be an indicator of exploitation attempts. 4. **Apply Principle of Least Privilege**: Ensure that users or services running on GPU nodes have the minimum privileges necessary. Avoid running processes as root or Administrator unless absolutely required.
Patch Details
Patches are available in NVIDIA driver versions 537.13 (Windows) and 535.104.05 (Linux) and later.