Arbitrary Code Execution via Malicious Model Pickles on Hugging Face Hub
Overview
A critical supply chain vulnerability was identified in the Hugging Face Hub ecosystem, allowing attackers to achieve remote code execution on victim machines. The attack vector abuses the Python `pickle` module, which is commonly used to serialize and deserialize machine learning models, particularly in formats like PyTorch's `pytorch_model.bin`. The pickle format is insecure by design and can be crafted to execute arbitrary code during deserialization. Attackers could upload a seemingly legitimate model to the Hub, but with a malicious pickled file. When a developer or MLOps pipeline downloads this model and loads it using standard code like `torch.load()`, the malicious payload executes with the permissions of the user's process. The risk was compounded by the Hub's feature allowing custom code (`custom_code=True`), which provided an even more direct path for execution. This vulnerability exposed thousands of developers who implicitly trust models from the public Hub, highlighting the need for model scanning and safer serialization formats. In response, Hugging Face implemented malware scanning and stronger warnings about trusting model repositories.
Affected Systems
Testing Guide
1. **Install Scanner**: Run `pip install picklescan`. 2. **Scan Model Files**: Run `picklescan -p /path/to/your/model/files/` against your locally downloaded model weights. 3. **Check Loading Code**: Audit your model loading code. If you are using `torch.load()` on a `.bin` or `.pth` file from an untrusted source, or using `trust_remote_code=True` with Hugging Face models, you are at risk.
Mitigation Steps
1. **Use `safetensors`**: Whenever possible, use the `safetensors` format for loading models (`from_safetensors=True`). It is a secure alternative to pickle that does not allow for arbitrary code execution. 2. **Scan Models**: Before loading any model from a public repository, use security scanners like `picklescan` to check for malicious payloads within pickle files. 3. **Disable Custom Code**: When loading models using the Hugging Face `transformers` library, explicitly set `trust_remote_code=False` to prevent the execution of custom code bundled with the model. 4. **Isolated Environments**: Always download and test new models in a sandboxed, network-isolated environment to contain any potential malicious activity.
Patch Details
Hugging Face introduced malware scanning and promotes the `safetensors` format as the default. The `trust_remote_code` flag now defaults to `False` in many loading methods.