Privilege Escalation Vulnerability in NVIDIA GPU Display Driver for Linux
Overview
A high-severity vulnerability was discovered in the NVIDIA GPU Display Driver for Linux. The flaw, identified as a use-after-free condition in the kernel-mode layer, can be triggered by a local user with basic execution permissions. By crafting a malicious application that sends specific, malformed inputs to the GPU driver via its API, an attacker can corrupt kernel memory. Successful exploitation allows the attacker to crash the system, leading to a denial of service, or more critically, execute arbitrary code with kernel-level privileges. This bypasses all standard user-space security controls and results in a full system compromise. This vulnerability is particularly dangerous in multi-tenant AI/ML environments, such as shared JupyterHub instances or Kubernetes clusters with GPU sharing, where multiple users run code on the same physical host. A malicious user or a compromised container could exploit this vulnerability to break out of their isolated environment, gain control of the host node, and access or disrupt the workloads of all other tenants. The discovery was made through fuzzing and reverse engineering of the driver's IOCTL handlers, revealing improper memory management when handling certain graphics processing objects.
Affected Systems
Testing Guide
1. **Check Driver Version**: On a Linux system, run the command `nvidia-smi` or `cat /proc/driver/nvidia/version` to check the installed driver version. 2. **Compare with Bulletin**: Compare the installed version with the 'Affected Versions' and 'Fixed Versions' listed in the official NVIDIA security bulletin for the corresponding CVE. 3. **Scan for Vulnerabilities**: Use a host-based vulnerability scanner that has checks for NVIDIA driver vulnerabilities to automate the detection process across a fleet of machines.
Mitigation Steps
1. **Update NVIDIA Drivers**: Update the system's NVIDIA drivers to the versions specified in the NVIDIA security bulletin (e.g., version 535.154.05 or newer). 2. **Restrict GPU Access**: In multi-tenant environments, use technologies like NVIDIA MIG (Multi-Instance GPU) to partition GPUs where possible, and strictly control which users or pods can access GPU resources. 3. **Use Kernel hardening**: Employ Linux kernel hardening features and security modules like SELinux or AppArmor to limit the capabilities of processes, potentially mitigating the impact of an exploit. 4. **Monitor System Logs**: Monitor kernel logs for unusual error messages or crashes related to the NVIDIA driver, which could indicate attempted exploitation.
Patch Details
Patched in NVIDIA GPU Display Driver versions 535.154.05, 545.29.06, and subsequent releases.