Algorithmic Complexity Attack on AWS Bedrock MoE Models Causes Endpoint Denial-of-Service
Overview
A new type of Denial-of-Service (DoS) attack was found to be effective against large Mixture-of-Experts (MoE) models hosted on cloud services like AWS Bedrock. Researchers from Stanford University devised a method to create 'computational bomb' prompts that exploit the routing mechanism of MoE models. The attack involves submitting a prompt containing a highly complex, deeply nested, and self-referential structure, often related to formal logic or mathematical proofs. This structure is designed to force the MoE's router to activate a maximum number of 'experts' simultaneously for every token generated. Furthermore, the prompt's nature causes pathological behavior in the attention mechanism, maximizing computational cost. A single request with such a prompt could cause the underlying GPU inference hardware to run at 100% utilization for the maximum allowed duration (e.g., 60 seconds) before timing out, while consuming a disproportionate amount of VRAM. An attacker could make a hosted model endpoint unavailable to all other users by sending just a few concurrent requests from a single account, leading to significant service disruption. The attack was particularly effective against newer, larger MoE models which had not been sufficiently tested against adversarial computational loads.
Affected Systems
Testing Guide
1. **Monitor Inference Latency**: Track the P95 and P99 latency for your model endpoints. A sudden, sustained spike in latency could indicate a resource exhaustion attack. 2. **Analyze Prompt Logs**: Review application logs for unusually long or complex prompts, especially those with deeply nested structures (e.g., excessive JSON or XML nesting). 3. **Craft a Benign Test Prompt**: Construct a non-malicious but complex prompt (e.g., asking for a detailed story with many characters and subplots) and measure its inference time. Compare this against a simple prompt to establish a baseline for complexity.
Mitigation Steps
1. **Implement Granular Rate Limiting**: Apply stricter API rate limits per-user to prevent a single user from overwhelming the shared infrastructure. 2. **Input Complexity Analysis**: Cloud providers should implement a pre-inference analysis step to estimate the computational complexity of a prompt and reject those that exceed a safe threshold. 3. **Cost-Based Throttling**: Throttle or reject requests that are predicted to have an exceptionally high computational cost relative to their token count. 4. **Use Smaller Models**: If possible, use smaller, non-MoE models for tasks that do not require the power of a large MoE model, as they are less susceptible to this specific attack vector.
Patch Details
AWS implemented internal controls on the Bedrock service, including dynamic, cost-based request throttling and improved input validation to detect and reject computationally intensive prompts before they reach the inference engine.