Malicious Payload in Popular Hugging Face Model Executes Code During Inference
Overview
A sophisticated supply chain attack, dubbed 'BadTofu', was discovered targeting the AI developer ecosystem. An attacker compromised a popular open-source model on the Hugging Face Hub by embedding a malicious payload within its weight files (`.safetensors` format). The attack cleverly modified the model's `modeling.py` file, which is often executed with `trust_remote_code=True`, to include seemingly innocuous custom code. This code contained a deserialization routine that would extract and execute a steganographically hidden payload from the model's tensor weights during the inference process. The execution was triggered only when the model received a specific, non-obvious sequence of input tokens, making the malicious behavior difficult to detect during standard testing. The payload established a reverse shell back to an attacker-controlled server, giving them full control over the inference environment. This vulnerability exposed thousands of downstream users who had downloaded and deployed the compromised model in production, highlighting the critical need for model integrity verification and skepticism towards running arbitrary remote code, even from trusted sources like the Hugging Face Hub.
Affected Systems
Testing Guide
1. Use a model scanning tool to analyze downloaded model artifacts. Check for high-entropy regions in weight files that don't align with expected distributions. 2. Review any custom code (`.py` files) included in the model repository for suspicious logic, such as `eval()`, `exec()`, `pickle.load()`, or network-related function calls. 3. Load the model in a sandboxed environment with `trust_remote_code=True` and monitor for any outbound network connections or unexpected process creations during inference.
Mitigation Steps
1. **Disable Remote Code**: Never use the `trust_remote_code=True` argument in `from_pretrained()` unless the model's source code has been thoroughly audited. Default to `False`. 2. **Model Scanning**: Utilize tools like `safetensors-check` or other model security scanners to inspect model files for known malicious patterns or unexpected executable code before loading them. 3. **Sandboxed Inference**: Always run model inference in a tightly controlled, isolated environment (e.g., a minimal container with no network access or strict egress filtering) to contain potential breaches. 4. **Pin Versions**: Pin model versions to a specific commit hash from a trusted organization rather than pulling the `main` branch.
Patch Details
The compromised model and user account were removed from the Hugging Face Hub. The attack technique, however, remains a threat for any platform hosting user-submitted models.