Overview
Severity: HIGH | Affected: Multimodal LLMs | Category: research
Researchers at Carnegie Mellon University have published a paper detailing a novel jailbreak technique named 'Semantic Camouflage.' This attack targets multimodal large language models (LLMs) that process both text and image inputs. The technique works by embedding hidden, malicious prompts within the pixel data of seemingly innocuous images using advanced steganography. When the model processes the image alongside a simple text prompt like 'Describe this scene,' it internally decodes and executes the hidden instruction, bypassing text-based safety filters. This method has been shown to successfully generate harmful content, misinformation, and code for exploits. The research exposes a critical vulnerability in how multimodal models fuse and interpret inputs, demonstrating that safety alignment must be holistically applied across all data modalities to be effective against sophisticated adversarial attacks.