NVIDIA GPU Driver Kernel Mode Layer Vulnerability Allows Privilege Escalation
Overview
A high-severity vulnerability was disclosed in the NVIDIA GPU driver for both Windows and Linux platforms. The flaw resided in the kernel mode driver component, which is responsible for mediating communication between user-space applications (like CUDA programs) and the physical GPU hardware. Specifically, a lack of proper input validation on data passed from a user-mode process to a kernel-mode handler could lead to an out-of-bounds write. A local attacker with low-level user privileges could exploit this by crafting a malicious API call to the driver. Successful exploitation could corrupt kernel memory, leading to two primary impacts: a system crash resulting in a Denial of Service (DoS), or, more critically, the execution of arbitrary code with kernel-level privileges. This would allow a local user to escalate their privileges to SYSTEM on Windows or root on Linux. This type of vulnerability is especially dangerous in multi-tenant cloud environments or on-premise GPU clusters where multiple users or containerized workloads share the same physical GPU hardware. A compromised container could potentially break out and gain control over the entire host machine, compromising all other workloads.
Affected Systems
Testing Guide
1. **Check Driver Version**: This is the safest and most reliable method. Do not attempt to actively exploit this vulnerability as it can crash your system. 2. **On Windows**: Open the NVIDIA Control Panel, go to 'Help' -> 'System Information'. Check the 'Driver version' and compare it against the patched version (e.g., 537.13). 3. **On Linux**: Run the `nvidia-smi` command in the terminal. The top line will display the installed driver version. Compare it against the patched version (e.g., 535.104.05). 4. If your driver version is lower than the patched version, your system is vulnerable.
Mitigation Steps
1. **Update GPU Drivers**: Immediately install the latest NVIDIA GPU drivers provided by NVIDIA or the cloud service provider. Ensure the installed version is at or above the patched versions. 2. **Apply Principle of Least Privilege**: Run AI/ML workloads with the minimum necessary privileges. Avoid running training or inference jobs as root or Administrator. 3. **Use Secure Container Runtimes**: In containerized environments (e.g., Kubernetes), use security-hardened container runtimes and apply security contexts and policies (like SELinux or AppArmor) to limit the driver interaction surface. 4. **Monitor for Anomalous Activity**: Implement monitoring and logging for GPU-related system calls and kernel-level events to detect potential exploitation attempts.
Patch Details
Patched in NVIDIA GPU Display Driver versions 537.13 (Windows) and 535.104.05 (Linux) and later releases.