LLM Output Poisoning via Malicious Knowledge Graph Embeddings
Overview
A novel attack vector has emerged targeting AI systems that leverage knowledge graphs (KGs) for enhanced reasoning and fact retrieval, particularly those integrating Large Language Models (LLMs). Attackers can poison the KG embeddings used by the LLM, subtly manipulating the model's understanding of entities and their relationships. This is achieved by injecting subtly altered or entirely fabricated relationships into the KG training data, which are then learned by embedding models like Word2Vec, TransE, or advanced transformer-based embedding techniques. When the LLM queries the poisoned KG, it retrieves incorrect or malicious information, leading to biased, false, or even harmful outputs. For instance, a KG used in a medical diagnosis AI could be poisoned to associate common symptoms with rare, severe diseases, causing misdiagnosis. In a financial analysis AI, it could forge relationships between legitimate companies and fraudulent activities. The discovery involved analyzing the semantic drift in LLM outputs when querying specific entities known to be part of a poisoned KG subset, identifying systematic deviations from expected, factual responses. The impact ranges from generating misinformation and reputational damage to enabling targeted scams or facilitating more sophisticated social engineering attacks by providing credible-sounding but false information.
Affected Systems
Testing Guide
- Create a small, controlled poisoned KG subset with known malicious relationships. - Train embedding models on this poisoned KG and compare embeddings to clean ones. - Query an LLM integrated with the KG about entities related to the poisoned data. - Analyze LLM outputs for factual inaccuracies, biases, or unexpected reasoning patterns compared to baseline queries.
Mitigation Steps
- Implement robust data validation and sanitization for KG ingestion. - Employ anomaly detection techniques on KG embeddings to identify outliers. - Utilize differential privacy during KG embedding training to limit the impact of individual poisoned data points. - Conduct regular audits of KG integrity and LLM output consistency for critical entities. - Isolate KG data sources and implement access controls.