Use-After-Free in NVIDIA CUDA Driver Allows Local Privilege Escalation
Overview
A high-severity use-after-free vulnerability was identified in the NVIDIA Linux kernel driver, impacting systems that rely on GPUs for AI/ML workloads. The flaw exists in the driver's handling of Unified Memory Manager (UMM) memory allocations. A local attacker with basic user permissions can execute a specially crafted application that makes a series of CUDA API calls. These calls can trigger a race condition, causing a kernel-space memory object to be freed while a reference to it is still held and accessible from the user-space application via the GPU. The attacker can then use this dangling pointer to write controlled data into the now-deallocated kernel memory. By carefully grooming the kernel heap, the attacker can overwrite critical kernel data structures, such as function pointers or credential structures (`cred`). Successful exploitation of this vulnerability allows the attacker to execute arbitrary code in the context of the kernel, escalating their privileges from a standard user to root. This poses a significant threat to multi-tenant cloud and on-premise AI environments where multiple users share physical GPU resources, as a single malicious or compromised user account could lead to a full host system takeover, compromising the data and workloads of all other tenants.
Affected Systems
Testing Guide
1. Identify the currently installed NVIDIA driver version on your Linux system using the `nvidia-smi` command. 2. Compare the installed version against the patched versions listed in the official NVIDIA security bulletin for CVE-2023-25516. 3. If your version is listed as vulnerable, the system is affected. 4. (For security researchers) Obtain a proof-of-concept exploit for the CVE and run it in a controlled, non-production environment to confirm exploitability. The PoC would typically involve a C/C++ program making specific CUDA API calls.
Mitigation Steps
1. **Update NVIDIA Drivers:** Immediately update all affected systems to the patched driver version recommended in the NVIDIA security bulletin (e.g., version 535.129.03 or later). 2. **Restrict GPU Access:** In multi-tenant environments, restrict access to GPU resources to only trusted users and workloads. 3. **Use Kernel-Isolating Runtimes:** For containerized workloads, use runtimes like gVisor or Kata Containers that provide an additional layer of isolation between the container and the host kernel, making kernel exploits more difficult. 4. **Monitor System Logs:** Monitor for anomalous system behavior and kernel panics that could indicate exploitation attempts.
Patch Details
Patched in NVIDIA driver version 535.129.03 and subsequent releases.