Container Escape in NVIDIA Triton Inference Server via Malformed ONNX Model
Overview
A critical vulnerability was identified in the NVIDIA Triton Inference Server that allows for container escape and host compromise. The flaw resides in the server's ONNX runtime backend. A remote attacker who can submit a specially crafted ONNX model file for inference can trigger a use-after-free condition in the memory allocation code path responsible for parsing the model's computational graph. Successful exploitation of this vulnerability allows the attacker to execute arbitrary code within the context of the `tritonserver` process. Because Triton is often run in privileged containers to get direct access to GPU hardware via the NVIDIA Container Toolkit, this code execution can be leveraged to escape the container's security boundaries. An attacker could then gain root access on the host Kubernetes node, compromising all other workloads on that node and potentially the entire cluster. This vulnerability underscores the risk of treating model artifacts as safe, inert data and highlights the large attack surface introduced by complex model parsing and execution runtimes.
Affected Systems
Testing Guide
1. Deploy a vulnerable version of the NVIDIA Triton Inference Server in a controlled, isolated environment. 2. Obtain the proof-of-concept (PoC) malformed ONNX model file from the official security advisory. 3. Use a Triton client to load and run inference on the PoC model. 4. Observe the Triton server for a crash or, if using an instrumented build, for signs of memory corruption. A successful exploit in a non-production environment may trigger a segmentation fault or a specific error message detailed in the advisory.
Mitigation Steps
1. **Upgrade Triton**: Immediately update all instances to NVIDIA Triton Inference Server version 25.08 or newer. 2. **Harden Runtime Environment**: Run Triton in a security-hardened container runtime like gVisor or Kata Containers to provide an extra layer of kernel isolation, making container escape more difficult. 3. **Model Validation Pipeline**: Implement a CI/CD pipeline for models that includes fuzzing and static analysis of model files before they are deployed to a production Triton server. 4. **Limit Model Loading**: If possible, configure Triton to only load models from a trusted, read-only repository and disable dynamic model loading from untrusted sources.
Patch Details
Patched in NVIDIA Triton Inference Server container image version 25.08-py3 and later. All users are urged to upgrade immediately.