Overview
Severity: HIGH | Affected: Multiple LLM Providers | Category: research
Researchers from Stanford University's AI Lab have published a paper detailing a novel jailbreak technique named 'Contextual Weaving'. The attack bypasses safety alignments in large language models by embedding subtle, manipulative instructions within a long and seemingly benign context window. Unlike direct prompt injection, this method gradually shifts the model's attention and interpretation, leading it to generate harmful or restricted content without triggering safety filters. The paper demonstrates successful attacks against several leading commercial and open-source models, including GPT-5 and Llama 4. The researchers argue that current safety mechanisms, which primarily focus on analyzing individual prompts, are insufficient against these sophisticated, multi-turn attacks. They have responsibly disclosed their findings to major AI labs and are advocating for new defense strategies that analyze the entire conversational context for adversarial drift.