NVIDIA CUDA Driver Kernel Mode Layer Vulnerability Allows for Privilege Escalation
Overview
A high-severity vulnerability was identified in the NVIDIA GPU display driver's kernel mode layer, specifically within `nvlddmkm.sys` on Windows and its equivalent in Linux. The vulnerability is an out-of-bounds write due to improper validation of user-supplied input when processing specific CUDA API calls. A local attacker with low-level user privileges can exploit this flaw by crafting a malicious application that interacts with the CUDA driver. Successful exploitation allows the attacker to write arbitrary data to kernel memory. This can lead to a denial of service (DoS) by crashing the system, or more critically, can be leveraged to execute arbitrary code with SYSTEM/root privileges, resulting in a full local privilege escalation. This poses a significant risk in multi-tenant environments where users share GPU resources, such as university computer labs, on-premise AI training clusters, or cloud-based virtual machines with GPU passthrough. An attacker could use this vulnerability to escape their user context and gain administrative control over the entire host machine, compromising all other users' data and workloads. The issue was responsibly disclosed by a security researcher from Project Zero.
Affected Systems
Testing Guide
1. **Check Driver Version (Linux)**: Run the command `nvidia-smi` and check the 'Driver Version' in the output. Compare this with the patched version (e.g., 550.54.14 or newer). 2. **Check Driver Version (Windows)**: Open the 'NVIDIA Control Panel', go to 'Help' -> 'System Information', and check the 'Driver version'. Compare this with the patched version (e.g., 551.61 or newer). 3. **Verify Vulnerability**: If your driver version is older than the patched versions listed in the security bulletin for your GPU series, you are affected.
Mitigation Steps
1. **Update Drivers**: Update NVIDIA drivers to the latest version provided by NVIDIA or your system vendor. Refer to the NVIDIA security bulletin for the specific patched versions for your GPU model. 2. **Restrict GPU Access**: In multi-tenant environments, use security mechanisms like cgroups and namespaces to isolate GPU access and limit the attack surface for unprivileged users. 3. **Monitor System Logs**: Implement monitoring to detect anomalous driver behavior or system crashes that could indicate attempted exploitation.
Patch Details
Addressed in NVIDIA driver versions R550 (550.54.14) and R551 (551.61) and subsequent releases.