Overview
Severity: HIGH | Affected: Multiple LLM Providers | Category: research
A team from Carnegie Mellon University's CyLab has published a groundbreaking paper on a new jailbreaking method called 'Glyph-Guard'. The technique exploits how Large Language Models (LLMs) process and tokenize Unicode characters, particularly invisible or non-standard glyphs. By embedding these characters within prompts, attackers can create 'semantic blind spots' that cause the model's safety and alignment filters to fail, allowing the generation of harmful, biased, or prohibited content. The research demonstrated a success rate of over 90% against several major closed-source models, including GPT-5 and Claude 4. The technique is particularly concerning because it is difficult to detect with traditional input filters and requires fundamental changes to model tokenizers. The researchers have responsibly disclosed their findings to major AI labs, who are now scrambling to develop patches.