NVIDIA CUDA Driver Use-After-Free Vulnerability Allows for Code Execution and Privilege Escalation
Overview
A critical use-after-free vulnerability, identified as CVE-2024-0073, was discovered in the kernel mode layer of the NVIDIA GPU display driver for both Windows and Linux. The driver is a core component of the CUDA stack, which is fundamental for nearly all accelerated AI/ML workloads. The vulnerability can be triggered by a user-mode application that makes specially crafted API calls to the driver. Successful exploitation allows an attacker to write to or read from freed memory, which can lead to a denial of service (system crash) or, more severely, arbitrary code execution with the privileges of the kernel. In a cloud or containerized environment, this is particularly dangerous. An attacker who has compromised a single ML container could potentially exploit this vulnerability to escape the container's sandbox and gain full control over the underlying host machine. This would compromise all other containers and workloads running on the host, enabling data theft, lateral movement, and complete system takeover. The wide deployment of affected NVIDIA GPUs in data centers and cloud provider infrastructure meant this vulnerability had a massive impact surface, requiring immediate patching across the industry to secure critical AI training and inference systems.
Affected Systems
Testing Guide
1. **Check Driver Version (Linux):** Run the command `nvidia-smi` in the terminal. The output will display the installed driver version in the top right corner. 2. **Check Driver Version (Windows):** Open the NVIDIA Control Panel, go to 'Help', and select 'System Information'. The driver version will be listed. 3. **Compare Version:** Compare your installed driver version against the patched versions listed in the NVIDIA security bulletin for CVE-2024-0073. If your version is lower, your system is vulnerable.
Mitigation Steps
1. **Update NVIDIA Drivers:** The primary mitigation is to update all system drivers to the patched versions specified in the NVIDIA security bulletin. 2. **Restrict GPU Access:** In multi-tenant environments, use security mechanisms like seccomp-bpf profiles to limit the system calls that containers can make to the kernel and GPU driver. 3. **Host-Level Security:** Employ host-based intrusion detection systems (HIDS) to monitor for anomalous kernel activity or privilege escalation attempts. 4. **Run Unprivileged:** Ensure that ML containers are run as unprivileged users to add an extra layer of defense.
Patch Details
Patches are available in NVIDIA driver versions R550 (551.61+), R545 (545.29.06+), R535 (535.154.05+), and later.