Cross-Tenant Prompt Leakage in AWS Bedrock's Multi-Model Endpoint Cache
Overview
Security researchers at a leading university identified a high-severity information disclosure vulnerability in AWS Bedrock's multi-model endpoint feature. This feature allows multiple LLM models to be served from a single endpoint on shared GPU infrastructure to optimize cost. The researchers discovered a sophisticated side-channel attack targeting the shared Key-Value (KV) cache in the GPU's memory, which is used to accelerate token generation. Under high load conditions, a race condition could be triggered in the cache eviction and allocation logic. An attacker in one tenant could send a rapid sequence of specially crafted, long-context prompts to their model on the shared endpoint. This would force frequent and predictable cache evictions. By carefully timing their requests, they could cause a small portion of another tenant's KV cache—containing fragments of their prompt or generated response—to be incorrectly mapped to the attacker's session. The leaked fragments were then returned as part of the attacker's own model generation. Although the leakage was probabilistic and fragmented, it could be reconstructed over time to reveal sensitive intellectual property, personally identifiable information (PII), or proprietary prompts from other customers using the same endpoint.
Affected Systems
Testing Guide
1. **Review Endpoint Configuration:** Check your AWS Bedrock configuration to identify if you are using multi-model endpoints for Provisioned Throughput. 2. **Analyze Historical Logs:** Review CloudTrail and Bedrock logs for any unusual activity patterns from specific IAM roles or IP addresses that match the attack description (high frequency, large prompts). 3. **Consult AWS Advisory:** Refer to the security bulletin released by AWS in late January 2026 for specific indicators of compromise and guidance.
Mitigation Steps
1. **Isolate Sensitive Workloads:** For workloads processing highly sensitive data, use single-model endpoints or deploy models within a Virtual Private Cloud (VPC) to ensure strict tenant isolation. 2. **Apply AWS Patch:** AWS deployed a server-side patch to the cache management system that strengthens isolation boundaries and randomizes memory allocation patterns to defeat the race condition. 3. **Monitor for Anomalous Usage:** Monitor API usage for patterns consistent with the attack, such as rapid, repeated requests with unusually large context windows. 4. **Data Sanitization:** Where possible, sanitize or mask sensitive data in prompts before sending them to the service, reducing the impact of a potential leak.
Patch Details
A server-side, transparent patch was deployed by AWS to all affected regions. No customer action is required to receive the fix, but workload isolation is still recommended as a best practice.