Overview
Severity: HIGH | Affected: OpenAI, Google, Anthropic | Category: research
Researchers from Carnegie Mellon University have published a paper detailing 'MindWipe,' a novel and highly effective jailbreak technique. The method uses a combination of adversarial Unicode sequences and multi-shot contextual manipulation to effectively erase an LLM's safety alignment within a single session. Unlike previous methods that required complex, model-specific prompts, MindWipe has shown a success rate of over 90% against leading models from OpenAI, Google, and Anthropic with a single, universal prompt structure. The technique works by overloading the model's context window with conflicting instructions, causing it to revert to a base, unfiltered state. The publication has forced major AI labs to immediately issue patches and re-evaluate their defense-in-depth strategies against sophisticated prompt injection, highlighting the fragility of current safety guardrails.