Cross-Tenant GPU Memory Leak in Cloud ML Infrastructure via 'LeftoverLocals' Variant
Overview
A variant of the 'LeftoverLocals' GPU memory vulnerability was discovered, affecting multi-tenant cloud AI platforms that use specific high-end accelerator hardware. The vulnerability, designated 'MemSpectre-GPU', allows an attacker to read residual data left in GPU local memory from previous tenants' processes. In a typical cloud environment, GPU resources are partitioned and reallocated between different customers' ML workloads. Researchers found that on certain GPU models, the on-chip local memory banks were not properly scrubbed or zeroed-out upon deallocation. An attacker could rent a GPU instance on a major cloud provider, and then run a specially crafted CUDA or ROCm kernel designed to allocate large amounts of local memory and immediately dump its contents. This dump would contain fragments of data from the previous workload that ran on that specific GPU streaming multiprocessor. The leaked data could include sensitive information such as model parameters, proprietary training data, inference inputs, or intermediate layer activations. This attack poses a significant risk to organizations training proprietary models or processing sensitive data on shared cloud GPU infrastructure, as it breaks the fundamental tenant isolation guarantee.
Affected Systems
Testing Guide
1. Check your current NVIDIA GPU driver and firmware version using `nvidia-smi`. 2. Compare the reported version against the patched version numbers listed in the security advisory (e.g., 560.x.x or newer). 3. For AMD, use `rocm-smi` to check the driver version. 4. Run the proof-of-concept tool provided by the security researchers on a cloud GPU instance. If the tool is able to recover any non-zero, non-random data from uninitialized local memory, the instance is vulnerable.
Mitigation Steps
1. **Apply Firmware and Driver Updates:** Immediately update GPU drivers and firmware to the patched versions provided by the hardware vendors (NVIDIA, AMD). 2. **Enable Memory Zeroing:** If available, enable explicit memory zeroing features in the GPU management tools or container runtimes, though this may incur a performance penalty. 3. **Use Confidential Computing:** For highly sensitive workloads, utilize confidential computing VMs with GPU support (e.g., AMD SEV-SNP or Intel TDX) to encrypt data while it is in use on the GPU. 4. **Favor Dedicated Instances:** Where possible, use dedicated GPU instances or bare metal servers to avoid sharing hardware with other tenants, completely eliminating the cross-tenant risk.
Patch Details
NVIDIA and AMD released updated firmware and kernel drivers that ensure local memory is zeroed out upon context switching and deallocation.