Overview
Severity: HIGH | Affected: Multiple LLM Providers | Category: research
Researchers from the Stanford AI Lab have published a new paper detailing a novel jailbreak technique called 'Sleepwalker'. This method uses a combination of time-delayed, obfuscated instructions and unicode character manipulation to bypass the safety alignment of several major large language models, including GPT-5 and Gemini Advanced. The technique works by putting the model into a 'confused' state where it processes harmful instructions before its safety filters can fully engage. The researchers demonstrated successful bypasses for generating misinformation, malicious code, and PII extraction prompts with a success rate of over 85% on tested models. The findings highlight the persistent fragility of current alignment methods and the need for more robust, dynamic defense mechanisms that can detect sophisticated obfuscation.