Remote Code Execution on Hugging Face Hub via Malicious Model Conversion
Overview
Security researchers from Trail of Bits discovered a critical vulnerability in the Hugging Face Hub's model conversion service that could allow for remote code execution (RCE) on their infrastructure. The vulnerability stemmed from the unsafe handling of user-submitted models, particularly those in the `safetensors` format that contained a malicious `data.pkl` file. When a user requested a format conversion for such a model (e.g., from `safetensors` to TensorFlow's `SavedModel`), the backend service would use Python's `pickle` module to deserialize the embedded `data.pkl` file without proper validation. Attackers could craft a `pickle` payload that, upon deserialization, executes arbitrary Python code. This provided a direct path to RCE within the isolated conversion environment. A sophisticated attacker could then potentially break out of the sandbox to compromise the wider Hugging Face infrastructure, access private models, or tamper with popular public models. The discovery was part of the 'Leaky Models' research, which highlighted systemic risks in the AI model supply chain, where seemingly safe model formats could be used as containers for dangerous serialized objects. Hugging Face responded quickly by disabling the vulnerable conversion pathway and implementing stricter validation to prevent malicious `pickle` payloads from being processed.
Affected Systems
Testing Guide
1. **Vulnerability Was Platform-Side:** This vulnerability existed on the Hugging Face Hub's infrastructure and cannot be tested directly by a user. 2. **Verify Safe Model Loading:** As a best practice, when loading models from any source, check the model archive's contents before loading. Ensure it does not contain unexpected files like `data.pkl` or `.py` scripts. Use `safetensors.safe_open` for inspection without loading into memory. 3. **Test Your Own Systems:** If you run a similar model hosting or conversion service, you can test for this vulnerability by creating a `safetensors` model containing a malicious `pickle` file and running it through your conversion pipeline to see if the code executes.
Mitigation Steps
1. **For Hub Users:** No direct action is required for end-users, as the vulnerability was in the platform's backend and has been patched by Hugging Face. 2. **For AI Platform Developers:** Never use `pickle` or similar unsafe deserialization libraries on untrusted data, including user-uploaded model files. Always prefer safer, data-only serialization formats like JSON, Protobuf, or the standard use of `safetensors` (without `data.pkl`). 3. **Model Scanning:** Implement automated scanning for all uploaded model artifacts to detect embedded malicious code, unexpected file types, or dangerous serialization formats like `pickle`. 4. **Sandboxed Processing:** Ensure all model processing, conversion, and inference tasks run in tightly sandboxed, ephemeral environments with no network access by default.
Patch Details
Hugging Face patched the vulnerability on their platform by adding stricter controls to the model conversion process and disabling the use of `pickle` for untrusted model formats.