Overview
Severity: HIGH | Affected: Major LLM Providers (OpenAI, Anthropic, Google) | Category: research
Researchers from a leading university have published a paper detailing a novel jailbreak technique called the "Multi-Step Contextual Attack" (MSCA). Unlike single-prompt attacks, MSCA involves a series of seemingly benign prompts that build a specific context, gradually tricking a model's safety alignment into a vulnerable state. Once the context is established, a final malicious prompt can easily bypass safety guardrails to generate harmful or prohibited content. The paper demonstrates a high success rate (over 85%) against several leading proprietary models. The technique highlights the limitations of current static alignment methods and the need for dynamic, state-aware safety mechanisms that can track conversational history for deceptive patterns. The researchers have responsibly disclosed their findings to the affected model providers, who are now working on patches to address this sophisticated evasion vector.