Cross-Tenant Data Leakage in Azure OpenAI due to Inference API Race Condition
Overview
A high-severity vulnerability was discovered in the backend infrastructure of Microsoft's Azure OpenAI service. The issue stemmed from a race condition in the multi-tenant architecture responsible for serving concurrent inference requests. Under specific high-load conditions, if one tenant's API request to a model like GPT-4 was abruptly cancelled or timed out, the state of the inference worker was not properly sanitized before being reassigned to process a request from a different tenant. This resulted in a small fragment of the first tenant's data (either from the prompt or the model's response) 'bleeding' into the context of the second tenant's session. An attacker could increase the probability of triggering this leak by sending many parallel requests and then intentionally cancelling them. While the leaked data was typically fragmented and incomplete, it posed a significant risk of exposing sensitive information such as personally identifiable information (PII), confidential business plans, or source code snippets that customers were processing through the service. The vulnerability was discovered by an external security research firm and patched by Microsoft before public disclosure.
Affected Systems
Testing Guide
1. This vulnerability was in Azure's backend infrastructure and cannot be tested or verified by customers. 2. Verification of the flaw and its subsequent patch was conducted internally by Microsoft's security teams and the reporting research firm. 3. Customers can check the Azure Service Health dashboard archives for the disclosure date to find the official notification from Microsoft.
Mitigation Steps
1. **No Customer Action Required**: Microsoft patched the vulnerability on their backend infrastructure, and the fix was rolled out transparently to all customers. 2. **Audit Logs**: Customers are advised to review their application logs during the exposure period for any anomalous or unexpected data appearing in LLM responses that could indicate they were a victim of this data bleed. 3. **Defense in Depth**: Implement client-side data loss prevention (DLP) filters to scan LLM outputs for sensitive data patterns before they are processed or stored by your application. 4. **Data Minimization**: Avoid sending unnecessarily sensitive data to third-party AI services whenever possible.
Patch Details
Microsoft deployed a server-side hotfix that enforces strict state isolation and resource cleanup between tenant requests on the inference fleet, eliminating the race condition.