CMU Researchers Unveil 'Recursive Embedding Attack' Bypassin

Overview

Severity: HIGH | Affected: Carnegie Mellon University | Category: research

Researchers at Carnegie Mellon University's CyLab have published a paper detailing a novel jailbreak technique named the 'Recursive Embedding Attack' (REA). This method circumvents the safety alignments of major large language models, including those from Anthropic and Google, with a near-perfect success rate in lab tests. REA works by crafting prompts that embed harmful instructions within multiple layers of benign-seeming data structures, effectively confusing the model's safety classifiers. The attack doesn't rely on specific keywords or character-level obfuscation, making it difficult to detect with current defense mechanisms. The paper highlights the inherent tension between model capability and safety, demonstrating that as models become more complex, they open new, non-obvious attack surfaces. The researchers have responsibly disclosed their findings to major AI labs.

References

https://arxiv.org/abs/2605.14832
https://www.theverge.com/2026/5/28/28991204/cmu-llm-jailbreak-recursive-embedding-attack

CMU Researchers Unveil 'Recursive Embedding Attack' Bypassing Latest LLM Safety Filters

Overview

References

Comments

Comments