Algorithmic Complexity Attack on LLMs Causes Denial of Service via Tokenizer
Overview
Researchers from a leading university published findings on a new algorithmic complexity attack targeting major Large Language Models (LLMs), leading to a Denial of Service (DoS) condition. The attack exploits inefficiencies in the tokenization algorithms used by models like GPT-4 and Llama 3. Specifically, certain tokenizers that use regular expressions for pre-tokenization rules exhibit catastrophic backtracking behavior when processing prompts with long, repeating sequences of specific characters or nested structures. An attacker could send a relatively small prompt (a few kilobytes) containing a 'regex bomb' to a public-facing API endpoint. When the LLM backend attempts to tokenize this prompt, the regex engine's CPU usage spikes to 100% for an extended period (seconds to minutes), effectively blocking the processing thread and preventing it from serving other users. This can be scaled up to cause a significant DoS attack on an AI service, leading to service unavailability for legitimate users and racking up large compute bills for the service provider or API consumer. The attack is difficult to mitigate with simple input length limits, as the malicious payload can be small yet computationally devastating. The research demonstrated the vulnerability across multiple closed- and open-source models.
Affected Systems
Testing Guide
1. In an isolated, self-hosted environment running an open-source LLM, craft a prompt with a known regex-vulnerable pattern. An example is a long string of characters like `"(" + "a"*50 + ")"*50`. 2. Measure the CPU time and wall-clock time required for the model's tokenizer to process this string. 3. Compare this with the time required to process a benign string of the same length. 4. A disproportionately large increase in processing time for the crafted string indicates a potential vulnerability. 5. **Do not perform this test against public commercial APIs, as it constitutes a violation of their terms of service.**
Mitigation Steps
1. **Implement Strict Timeouts:** Enforce aggressive, short timeouts on the tokenization step of the inference pipeline. If tokenizing a prompt takes longer than a few hundred milliseconds, reject the request. 2. **Input Validation and Rate Limiting:** Implement rate limiting based on IP address and API key. Additionally, use Web Application Firewalls (WAFs) with rules designed to detect and block repetitive, low-complexity string patterns characteristic of these attacks. 3. **Use Optimized Tokenizers:** Model providers should audit and optimize the regular expressions used in their tokenizers to avoid catastrophic backtracking vulnerabilities. 4. **Cost and Usage Monitoring:** API consumers should set strict spending limits and monitor their usage for sudden, unexplained spikes in cost or processing time, which could indicate an ongoing attack.
Patch Details
Major cloud providers (OpenAI, Google, Anthropic) have implemented server-side mitigations, such as improved regex patterns and processing timeouts, to silently drop malicious requests.