NVIDIA GPU Driver Kernel Mode Layer Improper Input Validation Leading to Denial of Service
Overview
A high-severity vulnerability was discovered in the kernel mode layer of the NVIDIA GPU display driver for both Windows and Linux. The vulnerability, identified as CVE-2024-0071, arises from improper input validation of data passed from user-mode applications to the driver. An unprivileged local attacker can craft a malicious API call or shader that sends specially formed data to the driver. When the kernel-mode driver processes this malformed data, it can lead to a null-pointer dereference or an out-of-bounds write. This immediately triggers a system crash, resulting in a Blue Screen of Death (BSOD) on Windows systems or a kernel panic on Linux systems. The impact is a complete denial of service (DoS), requiring a system reboot. This poses a significant risk to multi-tenant cloud GPU environments, high-performance computing (HPC) clusters, and individual AI development workstations, where a single malicious or compromised user process can bring down the entire host machine. This disrupts critical AI training and inference workloads, leading to significant downtime and potential data loss if the system was in the middle of a critical operation.
Affected Systems
Testing Guide
This vulnerability is difficult and dangerous to test safely as it involves causing a kernel panic. It is not recommended for users to attempt to trigger it. 1. **Check Driver Version:** The safest method is to check your installed NVIDIA driver version. - On Windows, open the NVIDIA Control Panel and go to 'System Information'. - On Linux, run the command `nvidia-smi` in the terminal. 2. **Compare with Patched Versions:** Compare your installed version with the patched versions listed in the 'affected_systems' field or the official NVIDIA security bulletin. If your version is lower, you are vulnerable.
Mitigation Steps
1. **Update Drivers:** Immediately update NVIDIA drivers to the patched versions specified in the NVIDIA security bulletin (e.g., version 551.61 or newer for Windows). 2. **Restrict GPU Access:** In multi-tenant environments, use containerization and virtualization technologies with proper device passthrough controls to limit direct access to the GPU driver from untrusted workloads. 3. **Limit User Privileges:** Ensure that users and processes running on the system operate with the lowest possible privileges, reducing the attack surface. 4. **Monitor System Logs:** Monitor system and kernel logs for unexpected driver crashes or errors, which could indicate attempts to exploit this vulnerability.
Patch Details
Patches are available in NVIDIA GPU Display Driver versions 551.61 (Windows) and 550.54.14 (Linux) and later.