Cross-Tenant Data Leakage in Hugging Face Inference Endpoints via Shared GPU Memory
Overview
Security research from Wiz exposed a potential vulnerability in multi-tenant cloud AI platforms, specifically impacting Hugging Face's Inference Endpoints. The vulnerability stemmed from inadequate memory isolation between different inference jobs running concurrently on the same physical GPU. In a multi-tenant environment, multiple customers' models and data are processed by the same hardware to optimize resource utilization. The research demonstrated that due to a lack of deterministic memory sanitization between jobs, an attacker could deploy a malicious model designed to scan and read raw GPU memory. This malicious model could potentially capture residual data left behind by a previous tenant's inference task. The leaked data could include highly sensitive information such as proprietary model weights, confidential input data (e.g., user queries, personal information), or the outputs of the previous model. This type of vulnerability highlights a fundamental challenge in securing shared AI infrastructure, as traditional CPU-based isolation techniques do not always translate directly to the unique architecture of GPUs. The impact is significant, as it could lead to intellectual property theft and severe privacy breaches for customers using the shared infrastructure.
Affected Systems
Testing Guide
Testing for this vulnerability is highly complex and generally not feasible for end-users. It requires: 1. Deploying a custom-built, malicious model to an inference endpoint. 2. The model must be coded with low-level CUDA or Triton kernels capable of scanning large regions of VRAM for structured data remnants. 3. The tester needs to be co-located on a GPU that has just processed another tenant's sensitive data, which is not something a user can control. 4. This vulnerability is best identified through provider-led security audits and penetration testing.
Mitigation Steps
1. **Use Dedicated Infrastructure:** For sensitive workloads, use dedicated or single-tenant inference endpoints to ensure physical hardware isolation. 2. **Data Minimization:** Avoid sending unnecessarily sensitive data to shared inference endpoints. 3. **Provider-Side Mitigation:** The cloud provider (Hugging Face) is responsible for implementing stronger isolation. This includes robust memory clearing between jobs and leveraging newer hardware/software features for GPU virtualization and sandboxing.
Patch Details
Hugging Face addressed the issue by deploying enhanced memory sanitization protocols and improving tenant isolation on their infrastructure after the responsible disclosure.