Universal 'MindWipe' Prompt Injection Technique Bypasses Gua

Overview

Severity: HIGH | Affected: OpenAI, Google, Anthropic | Category: research

Researchers from Carnegie Mellon University have published a paper detailing 'MindWipe,' a novel and highly effective jailbreak technique. The method uses a combination of adversarial Unicode sequences and multi-shot contextual manipulation to effectively erase an LLM's safety alignment within a single session. Unlike previous methods that required complex, model-specific prompts, MindWipe has shown a success rate of over 90% against leading models from OpenAI, Google, and Anthropic with a single, universal prompt structure. The technique works by overloading the model's context window with conflicting instructions, causing it to revert to a base, unfiltered state. The publication has forced major AI labs to immediately issue patches and re-evaluate their defense-in-depth strategies against sophisticated prompt injection, highlighting the fragility of current safety guardrails.

References

https://arxiv.org/abs/2502.1138
https://www.wired.com/story/mindwipe-ai-jailbreak-llm/

Universal 'MindWipe' Prompt Injection Technique Bypasses Guardrails on All Major LLMs

Overview

References

Comments

Comments