NVIDIA GPU Driver Kernel Mode Layer Contains Out-of-Bounds Write Vulnerability
Overview
A high-severity vulnerability was found in the kernel mode layer of the NVIDIA GPU driver for Windows and Linux. The flaw stems from the driver's failure to properly validate input originating from user mode. A specifically crafted shader or API call from a user-level process can trigger an out-of-bounds write in a kernel memory buffer. An attacker who has local user access to a system with an affected driver can exploit this vulnerability to cause a system crash, leading to a denial of service (DoS). More critically, the out-of-bounds write could potentially be leveraged to corrupt kernel memory structures, allowing the attacker to execute arbitrary code with kernel-level privileges. This would constitute a full system compromise, enabling the attacker to bypass all security measures and gain complete control. The vulnerability impacts a wide range of systems, from individual developer workstations with consumer GPUs to large-scale AI/ML training clusters in data centers running on enterprise-grade Tesla and Data Center GPUs, making it a significant threat to AI infrastructure security.
Affected Systems
Testing Guide
1. Identify the currently installed NVIDIA driver version on your system. On Linux, run `nvidia-smi`. On Windows, check the NVIDIA Control Panel. 2. Compare the installed version against the patched versions listed in the `affected_systems` field. 3. If your driver version is lower than the specified patched version, your system is vulnerable. 4. Vulnerability scanners like Nessus or Qualys with authenticated scanning capabilities can also detect outdated and vulnerable driver versions.
Mitigation Steps
1. **Update Drivers:** Immediately update all NVIDIA drivers to the patched versions specified in the NVIDIA security bulletin. 2. **Restrict GPU Access:** On multi-tenant systems, use containerization and features like NVIDIA Multi-Instance GPU (MIG) to isolate workloads and limit the potential blast radius of an exploited vulnerability. 3. **Monitor System Integrity:** Employ host-based intrusion detection systems (HIDS) to monitor for anomalous kernel activity or unexpected system crashes. 4. **Apply Principle of Least Privilege:** Do not grant users unnecessary access to systems with powerful GPUs. Ensure that only trusted code is executed on these machines.
Patch Details
Patches are available in NVIDIA driver branches 550.54.14+, 551.61+, and later versions.