Researchers Unveil 'Semantic Scrambling' Attack Bypassing Ma

Overview

Severity: HIGH | Affected: Multiple LLM Providers | Category: research

Researchers from Carnegie Mellon University have published a paper detailing a novel jailbreak technique named 'Semantic Scrambling'. The attack works by embedding malicious instructions within complex, grammatically correct but nonsensical paragraphs. The LLM's core logic, in its attempt to find meaning, inadvertently processes the hidden commands, bypassing its surface-level safety and alignment filters. The technique has proven effective against several leading models, including GPT-5 and Claude 4, achieving an over 85% success rate in generating harmful content in lab tests. The research highlights a fundamental vulnerability in how models process contextual information, suggesting that simple input filtering is insufficient. Model providers are now scrambling to develop more sophisticated semantic analysis defenses to counter this new attack vector.

References

https://arxiv.org/abs/2503.0812
https://www.wired.com/story/semantic-scrambling-llm-jailbreak/

Researchers Unveil 'Semantic Scrambling' Attack Bypassing Major LLM Safety Filters

Overview

References

Comments

Comments