Heap-based Buffer Overflow in NVIDIA CUDA cuDNN Library for Transformer Kernels
Overview
A critical heap-based buffer overflow vulnerability was identified in the NVIDIA cuDNN library, specifically affecting optimized kernels used for accelerating self-attention mechanisms in Transformer models. The vulnerability, tracked as NVIDIA Security Bulletin 5611, could be triggered by processing a specially crafted input tensor with manipulated metadata or dimensions. When the vulnerable cuDNN function (e.g., `cudnnMultiHeadAttnForward`) attempts to calculate memory offsets and allocate workspace for the attention score computations, an integer overflow leads to an undersized buffer allocation. Subsequent memory write operations then overflow this buffer, corrupting adjacent heap memory structures. An attacker with the ability to run arbitrary ML models on a shared GPU, such as in a multi-tenant cloud environment or on-premise ML platform, could exploit this flaw. Successful exploitation could lead to a denial-of-service (DoS) by crashing the GPU driver or the host kernel. More sophisticated attacks could potentially achieve arbitrary code execution within the context of the GPU driver, enabling container escape and privileged access to the underlying host system. This vulnerability underscores the growing attack surface of low-level, highly optimized GPU computing libraries that form the bedrock of modern AI infrastructure.
Affected Systems
Testing Guide
1. Check the currently installed NVIDIA driver version using `nvidia-smi`. 2. Check the cuDNN version used by your ML framework (e.g., `torch.backends.cudnn.version()` in PyTorch). 3. If versions are older than the patched versions, the system is vulnerable. 4. A proof-of-concept (PoC) script that crafts a malformed tensor and calls the vulnerable cuDNN function can be used in a controlled environment to verify the crash/DoS condition. Do not run public PoCs on production systems.
Mitigation Steps
1. Update NVIDIA GPU drivers to version 555.43 or later. 2. Update the NVIDIA cuDNN library to version 9.2.1 or later. 3. In multi-tenant environments, use hardware-level security features like NVIDIA MIG (Multi-Instance GPU) to provide stronger isolation between workloads. 4. Implement strict validation on all input data and model architectures before they are run on shared GPU resources to prevent malformed tensors from reaching the driver.
Patch Details
Patches are available in NVIDIA Driver version 555.43 and cuDNN version 9.2.1.