Arbitrary Code Execution via Self-Correction Parser in LangChain Experimental Modules
Overview
A critical vulnerability was discovered in the experimental `Self-Correction` parser within LangChain's `langchain-experimental` package. This feature is designed to allow an LLM to correct its own previously generated code before execution. However, the default implementation and documentation encouraged a dangerous pattern where the LLM's output, which could be influenced by a malicious prompt, was directly passed to a Python `exec()` function without sufficient sanitization. An attacker could craft a prompt that causes the LLM to generate malicious Python code within the correction block, for instance `__import__('os').system('cat /etc/passwd')`. When the `Self-Correction` parser processes this output, it executes the malicious code with the permissions of the application process. This remote code execution (RCE) vulnerability could lead to complete system compromise, data exfiltration, or lateral movement within the host network. The vulnerability highlights the inherent risks of executing LLM-generated code in production environments, especially within agentic frameworks that grant models tool-access or code execution capabilities. The discovery was made by security researchers analyzing the emerging attack surface of autonomous AI agents and their interaction with external tools and code interpreters. The flaw resides in the trust placed in the LLM's output, which acts as a conduit for an attacker's payload.
Affected Systems
Testing Guide
1. Check your project's dependencies for `langchain-experimental`. If present, check its version using `pip show langchain-experimental`. 2. If the version is `0.0.58` or older, you are vulnerable. 3. Review your codebase for any usage of `SelfCorrectingParser` or similar patterns where LLM output is passed to an interpreter like `exec()` or `eval()`. 4. Attempt to pass a benign command like `print('vulnerable')` in a prompt to the agent. If it executes, your application is vulnerable to this pattern.
Mitigation Steps
1. Immediately update the `langchain-experimental` package to version `0.0.59` or later. 2. Avoid using `exec()` on LLM-generated outputs. If code execution is necessary, use a securely sandboxed environment (e.g., Docker container, gVisor, firejail) with strict resource and network limitations. 3. Implement rigorous parsing and validation on any code generated by an LLM before it is executed. Denylist dangerous modules and functions. 4. Treat all LLM output as untrusted user input, particularly in agentic systems where the model can interact with external data sources.
Patch Details
Patched in `langchain-experimental` version 0.0.59. The patch removes the vulnerable `exec` call from the default parser implementation.