NVIDIA Driver Improper Access Control Vulnerability Leading to Denial of Service
Overview
A high-severity vulnerability was discovered in the kernel-mode layer of the NVIDIA GPU driver for Linux. The flaw resides in how the driver handles memory mapping operations submitted by user-mode CUDA applications. Due to improper input validation and access control on specific memory management IOCTLs, a local, unprivileged attacker could craft a sequence of CUDA API calls that trigger a race condition or a double-fetch situation within the driver. This allows the attacker's process to write to an arbitrary physical memory address, including memory owned by the kernel itself. Successful exploitation of this vulnerability leads to kernel memory corruption. In most scenarios, this corruption immediately results in a kernel panic, causing a complete system crash and a Denial of Service (DoS). This vulnerability is particularly dangerous in multi-tenant environments, such as cloud-based GPU instances or on-premise Kubernetes clusters, where a malicious user or a compromised container could use this exploit to disrupt the entire underlying host, affecting all other tenants and workloads. While remote code execution was not demonstrated, the ease of triggering the DoS condition from an unprivileged context makes this a critical issue for infrastructure providers relying on NVIDIA GPUs.
Affected Systems
Testing Guide
1. Identify the currently installed NVIDIA driver version on your Linux system using the `nvidia-smi` command. 2. Check the driver version against the affected versions listed in the NVIDIA Security Bulletin (versions prior to 550.76). 3. If a proof-of-concept exploit is publicly available, run it within a non-production, isolated environment with an affected driver version. 4. A successful test will result in an immediate kernel panic and system reboot.
Mitigation Steps
1. **Update Drivers:** Immediately update all affected NVIDIA drivers to version 550.76 or later by downloading the latest version from NVIDIA's official driver portal. 2. **Restrict GPU Access:** In multi-tenant environments, limit direct GPU device access to trusted workloads only. 3. **Use Virtualization:** Employ hardware-assisted virtualization technologies like NVIDIA vGPU or SR-IOV to provide stronger isolation between tenants sharing a physical GPU. 4. **Monitor System Logs:** Continuously monitor kernel logs for signs of driver instability or unexpected errors, which could indicate attempted exploitation.
Patch Details
The vulnerability is addressed in NVIDIA GPU Display Driver version 550.76 and all subsequent releases.