NVIDIA CUDA Driver Out-of-Bounds Read in cuBLAS GEMM Kernel Causes Host Denial-of-Service
Overview
A high-severity Denial-of-Service (DoS) vulnerability was discovered in the NVIDIA Linux driver. The vulnerability exists within the kernel-mode driver's handling of General Matrix Multiply (GEMM) operations processed by the cuBLAS library. A local attacker with basic user privileges can exploit this flaw by crafting a specific set of input matrices with malicious dimension parameters. When these matrices are passed to a cuBLAS GEMM function, it triggers an out-of-bounds memory read within the GPU hardware context. The driver fails to properly handle the resulting GPU page fault, leading to a fatal error that cannot be contained within the user-space process. This error escalates to the host system's kernel, causing an immediate kernel panic and a full system crash. The impact is particularly severe in multi-tenant cloud environments or high-performance computing (HPC) clusters where multiple users share a single physical machine with one or more GPUs. A single malicious user or container can deliberately crash the entire host machine, denying service to all other users and applications running on that node. Successful exploitation requires local access to the system but does not require any special privileges beyond the ability to run CUDA applications.
Affected Systems
Testing Guide
1. Identify the currently installed NVIDIA driver version using the `nvidia-smi` command. 2. Compare the installed version against the patched version mentioned in the NVIDIA security bulletin (e.g., 550.54.14 for Linux). 3. If the installed version is lower, the system is vulnerable. 4. (Caution: This will crash your system) On a non-production system, obtain a proof-of-concept (PoC) exploit code and execute it. If the system experiences a kernel panic, the vulnerability is confirmed.
Mitigation Steps
1. **Update NVIDIA Drivers:** Immediately update all GPU-enabled systems to the latest NVIDIA driver version as specified in the security bulletin. 2. **Isolate Workloads:** Use containerization technologies like Docker or Kubernetes with GPU support, but be aware that a kernel-level vulnerability may still crash the host. 3. **Restrict GPU Access:** In shared environments, only grant GPU access to trusted users and applications. 4. **Monitor System Logs:** Regularly monitor kernel logs for GPU-related errors that could indicate attempted exploitation.
Patch Details
NVIDIA released patched drivers in January 2026. For Linux, the issue is resolved in version 550.54.14 and later.