Malicious Safetensors Model on Hugging Face Hub Executes Remote Code via Custom Operator Deserialization
Overview
Researchers demonstrated a supply chain attack vector targeting the popular `safetensors` format, previously considered a secure alternative to Python's `pickle` for model weight serialization. The vulnerability stems from the way PyTorch and other frameworks deserialize model architectures that include custom operators. An attacker can craft a seemingly legitimate model, save it in the `.safetensors` format, and upload it to a public repository like the Hugging Face Hub. The model's architecture definition, stored alongside the tensors, can be manipulated to include a malicious custom class. This class's `__reduce__` method, which is called during model loading, is engineered to execute arbitrary Python code. When a victim downloads and loads this model using standard library functions like `transformers.AutoModel.from_pretrained()`, the malicious code executes on their machine with the user's permissions. The payload can establish a reverse shell, steal API keys from the environment, exfiltrate training data, or further poison the victim's ML development environment. This attack is particularly insidious as it bypasses security scanners that only look for `pickle` imports or other known dangerous patterns, exploiting the trusted model-loading process itself.
Affected Systems
Testing Guide
1. **Obtain a Proof-of-Concept Model:** Find a safe, non-malicious PoC model designed to demonstrate this vulnerability (e.g., one that only prints a message or creates a harmless file). 2. **Set up an Isolated Environment:** Create a new virtual environment or Docker container. 3. **Attempt to Load the Model:** Write a simple Python script using the `transformers` library to load the PoC model from a local path using `AutoModel.from_pretrained('path/to/model', trust_remote_code=True)`. 4. **Observe for Side Effects:** Run the script and observe if the expected side effect (e.g., the message being printed) occurs. If it does, your loading process is vulnerable. 5. **Test Mitigation:** Modify the script to use `trust_remote_code=False` and run it again. The model should fail to load or load without executing the custom code, confirming the mitigation works.
Mitigation Steps
1. **Disable Custom Code Execution:** When loading models from untrusted sources, explicitly set `trust_remote_code=False` in Hugging Face `from_pretrained` methods. This is the default but should be double-checked. 2. **Use Model Scanning Tools:** Employ tools that inspect model architectures for suspicious custom code or operators before loading them. Examples include Hugging Face's built-in malware scanner. 3. **Isolate Loading Environments:** Load and test new models in a sandboxed, network-isolated environment (e.g., a container with no network access or strict egress filtering) to prevent payloads from communicating with C2 servers. 4. **Vet Model Sources:** Only use models from highly trusted and verified creators or organizations on platforms like Hugging Face Hub. Check for community usage, likes, and verification badges.
Patch Details
This is a procedural vulnerability. Mitigation relies on user awareness and safe handling practices, though frameworks have improved warnings.