Resource Exhaustion via Malformed Safetensors Header in Hugging Face Inference Endpoints
Overview
A high-severity denial-of-service vulnerability was identified in the model loading process of several cloud AI platforms, including Hugging Face Inference Endpoints and Spaces. The vulnerability stemmed from improper validation of model files using the `safetensors` format. Researchers demonstrated that it was possible to craft a `.safetensors` file with a malicious header. Specifically, the JSON metadata section of the header could declare a tensor with an enormous shape (e.g., `[1000000, 1000000, 1000000]`) while the actual data section was extremely small. The vulnerable `safetensors` parsing library would read this metadata and attempt to allocate memory for the declared tensor size before validating if sufficient data was present in the file. This would trigger an attempt to allocate terabytes of RAM, immediately exhausting all available memory and swap space on the inference server. This caused the container's OOM (Out-Of-Memory) killer to terminate the process, resulting in a denial of service for the endpoint. An attacker could upload a malicious model to the Hub and direct users to it, or exploit applications that automatically load models from user-provided paths, causing widespread service disruption and high computational costs for victims.
Affected Systems
Testing Guide
1. Check your Python environment's installed version of the `safetensors` library: `pip show safetensors`. 2. If the version is below 0.4.3, your environment is vulnerable. 3. To test an inference service, create a small `safetensors` file with a handcrafted header declaring an impossibly large tensor. 4. Attempt to load this model using the service's API. If the server becomes unresponsive or the container crashes with an OOM error, the service is vulnerable.
Mitigation Steps
1. **Update Libraries:** Update the `safetensors` and `transformers` libraries to the latest patched versions (`safetensors >= 0.4.3`). 2. **Implement Pre-load Validation:** Before attempting to load a model, parse the `safetensors` header in a memory-constrained environment to validate that the declared tensor sizes are reasonable and consistent with the total file size. 3. **Set Resource Limits:** Configure strict memory and CPU limits for any process that loads or runs inference on untrusted models. For example, use cgroups or Kubernetes resource quotas. 4. **Scan Models:** Use model scanning tools to inspect model files for structural anomalies or malicious content before they are deployed.
Patch Details
The issue was patched in safetensors version 0.4.3 by adding validation checks before memory allocation.