Malicious AI Models on Hugging Face Hub Execute Code via Unsafe Pickle Deserialization
Overview
A significant supply chain vulnerability was identified affecting developers who download pre-trained models from public repositories like the Hugging Face Hub. Researchers from ProtectAI found numerous models containing malicious code embedded within their weight files. The attack leverages the Python `pickle` format, which is commonly used to serialize and save PyTorch models (`pytorch_model.bin`). The `pickle` module is known to be insecure and can execute arbitrary code during deserialization. Attackers upload a trojanized version of a popular model, and when an unsuspecting developer loads it using standard library functions like `torch.load()` or `transformers.AutoModel.from_pretrained()`, the malicious payload executes on their machine. The payload can steal API keys from environment variables, exfiltrate proprietary training data, or establish a foothold on the developer's workstation or CI/CD pipeline. This attack vector is particularly dangerous as models are often treated as data and not subjected to the same security scanning as code dependencies. The findings forced a re-evaluation of security practices in the MLOps lifecycle, leading to a push for safer serialization formats like `safetensors` and the implementation of repository-side scanning for malicious code within model files.
Affected Systems
Testing Guide
1. Do NOT test this with a live malicious model. Use a safe proof-of-concept. 2. Create a simple Python script that saves a class with a malicious `__reduce__` method to a pickle file. 3. In a separate, sandboxed environment, write a script to load this pickle file using `torch.load()`. 4. Observe if the malicious code (e.g., printing a message or creating a file) is executed upon loading. If so, the environment is susceptible.
Mitigation Steps
1. Whenever possible, exclusively use models saved in the `safetensors` format. This format is designed for safety and does not allow for arbitrary code execution. 2. When loading a pickle-based model from an untrusted source, use the `trust_remote_code=False` flag in the Hugging Face `from_pretrained` method. 3. Run model loading and training in isolated, sandboxed environments with no access to sensitive information or networks. 4. Utilize model scanning tools to inspect model files for suspicious code before loading them.
Patch Details
Hugging Face implemented server-side scanning and requires explicit opt-in (`trust_remote_code=True`) for models with custom code. The community strongly encourages the use of the `safetensors` format.