NVIDIA GPU Driver Kernel Mode Vulnerability Leading to Privilege Escalation
Overview
A high-severity vulnerability was disclosed in the NVIDIA GPU driver for both Windows and Linux operating systems. The flaw exists in the kernel mode layer of the driver, which is responsible for mediating access to the GPU hardware. Specifically, the driver failed to properly validate certain parameters passed from user-mode applications via API calls. A local, unprivileged attacker could create a malicious application that sends specially crafted data to the driver. This could trigger a race condition or a buffer overflow within the kernel space. Successful exploitation allows the attacker to execute arbitrary code with the highest system privileges (SYSTEM on Windows, root on Linux), enabling a complete takeover of the affected machine. An unsuccessful exploit attempt would likely result in a system crash, leading to a denial of service (DoS). This type of vulnerability is especially dangerous in multi-tenant environments where GPU resources are shared, such as cloud AI platforms (e.g., AWS, GCP, Azure), high-performance computing (HPC) clusters, and remote desktop services. A malicious user or compromised container could leverage this vulnerability to escape their sandboxed environment and gain control of the underlying host, potentially accessing or disrupting the workloads of all other tenants on the same physical server.
Affected Systems
Testing Guide
1. **Check Driver Version**: On Windows, open the NVIDIA Control Panel and check the driver version under 'System Information'. On Linux, run `nvidia-smi` to display the installed driver version. 2. **Compare with Advisory**: Compare your installed version with the 'Affected Versions' listed in the official NVIDIA security bulletin for CVE-2024-0099. 3. **Use Vulnerability Scanners**: Run a system vulnerability scanner (e.g., Tenable, Qualys) that has checks for this specific CVE to confirm if your systems are affected.
Mitigation Steps
1. **Update NVIDIA Drivers**: Immediately update all system and container drivers to the patched versions specified by NVIDIA in their security bulletin. 2. **Restrict GPU Access**: In multi-tenant environments, use security mechanisms like `cgroups` and SECCOMP filters to limit the system calls that containers can make to the GPU driver. 3. **Regular Patch Management**: Incorporate GPU driver updates into your regular security patch management cycle, as they are a critical part of the infrastructure stack. 4. **Monitor for Anomalous Activity**: Use system monitoring tools to detect unusual GPU activity or driver crashes that could indicate an exploitation attempt.
Patch Details
Patched in NVIDIA driver versions 551.61 (Windows) and 550.54.14 (Linux) and later, released in February/March 2024.