NVIDIA Driver Vulnerability Allows Privilege Escalation in Containerized ML Environments
Overview
NVIDIA disclosed a high-severity vulnerability in its kernel mode display driver affecting both Windows and Linux systems. The vulnerability, tracked as CVE-2024-0089, is an out-of-bounds write issue that can be triggered by a user-mode client. An attacker with local, low-privilege access can craft a malicious request to the driver, causing it to write data outside of its intended buffer in kernel memory. This can lead to a system crash (Denial of Service) or, under certain conditions, allow for arbitrary code execution with kernel-level privileges. This vulnerability poses a significant threat to multi-tenant AI/ML environments that use GPU sharing, such as university computing clusters, on-premise Kubernetes clusters with GPU nodes, or cloud instances. A malicious user could exploit this flaw to escape their container, gain root access on the host machine, and potentially access or disrupt the workloads of other tenants sharing the same physical GPU. The issue was discovered by a security researcher and reported through NVIDIA's bug bounty program. System administrators for GPU-accelerated infrastructure were urged to patch immediately.
Affected Systems
Testing Guide
1. Check your installed NVIDIA driver version using `nvidia-smi` on Linux or the NVIDIA Control Panel on Windows. 2. Compare your installed version to the 'Affected Versions' list provided in the official NVIDIA Security Bulletin for CVE-2024-0089. 3. If your version is listed as affected, your system is vulnerable and should be patched.
Mitigation Steps
1. **Update Drivers:** Update all NVIDIA drivers on affected systems to the patched versions listed in the NVIDIA security bulletin. 2. **Isolate Workloads:** In multi-tenant environments, use stricter isolation technologies like gVisor or Kata Containers for untrusted workloads, although this may not fully mitigate a kernel-level vulnerability. 3. **Restrict GPU Access:** Limit direct GPU access to trusted users and processes only. 4. **Monitor Systems:** Implement host-based intrusion detection systems (HIDS) to monitor for anomalous kernel activity.
Patch Details
Patched in NVIDIA driver versions 551.61 (Windows) and 550.54.14 / 535.154.05 / 470.223.02 (Linux) and later.