Hugging Face Hub Misconfiguration Leaks Sensitive Tokens in Multi-Tenant Inference Environments
Overview
A significant security flaw was discovered in the way Hugging Face's Text Generation Inference (TGI) framework handled API tokens in certain multi-tenant or shared-use configurations. When multiple users or processes submitted requests to a single TGI instance, a race condition or improper state isolation could cause one user's Hugging Face API token (`hf_token`) to be leaked in the error messages or logs of another user's request. This typically occurred under high load or when a request caused a CUDA out-of-memory error. An attacker could intentionally submit malformed or resource-intensive requests to a public or shared TGI endpoint to trigger these error conditions, and then harvest the leaked tokens from other, legitimate users' concurrent requests. A leaked `hf_token` with write permissions could allow an attacker to poison models, delete repositories, or access private datasets and models associated with the victim's account. This vulnerability highlighted the complexities of secure resource sharing on GPU hardware and the need for robust logical isolation in cloud AI services. The discovery prompted immediate remediation from Hugging Face and raised awareness about tenant isolation in the broader MLOps community.
Affected Systems
Testing Guide
1. **Check TGI Version:** If you are self-hosting TGI, check the version of the Docker container or library being used. If it is below 1.1.0, you are likely affected. 2. **Review Endpoint Configuration:** For Hugging Face Inference Endpoints, review your security and configuration settings in the Hub dashboard. Ensure you are not using older, un-patched instance types. 3. **Stress Test (in a safe environment):** Create a dedicated, non-production endpoint. Submit multiple, simultaneous, and malformed requests and inspect the full error responses and logs for any unexpected token data.
Mitigation Steps
1. **Update TGI Instances:** Upgrade all TGI instances to version 1.1.0 or later, which contains fixes for state isolation. 2. **Use Least-Privilege Tokens:** Generate and use read-only, fine-grained access tokens for inference tasks whenever possible. Do not use your primary user account's write-access token in automated services. 3. **Rotate Tokens Regularly:** Implement a policy for regularly rotating all API tokens used in production environments to limit the window of exposure for a compromised key. 4. **Isolate Workloads:** For highly sensitive tasks, do not use shared inference endpoints. Deploy a dedicated TGI instance or use a cloud service that guarantees strong tenant isolation at the hardware level.
Patch Details
Patched in TGI version 1.1.0. Hugging Face also rolled out fixes to their managed Inference Endpoints service.