NVIDIA Triton Inference Server Heap Overflow Allows Remote Code Execution
Overview
A critical heap-based buffer overflow vulnerability, tracked as CVE-2023-31030, was found in the NVIDIA Triton Inference Server. The flaw resides in the server's logic for parsing model configuration files, specifically affecting models using the ONNX runtime backend. An unauthenticated remote attacker can exploit this vulnerability by crafting a malicious model repository and instructing the server to load a model from it. When the Triton server attempts to parse the malformed `config.pbtxt` file associated with the model, it fails to correctly validate the size of certain input fields. This leads to a heap overflow, which can be controlled by the attacker to overwrite adjacent memory structures. A skilled attacker can leverage this memory corruption to achieve arbitrary code execution. The code executes with the privileges of the Triton server process, which is often `root` within its container. In a Kubernetes environment, this could lead to a container escape and compromise of the underlying host node, providing a gateway into the broader MLOps infrastructure. This vulnerability poses a significant threat to multi-tenant inference services where users are allowed to upload their own models, as a malicious tenant could compromise the entire shared platform. NVIDIA released a security bulletin and patched versions of the Triton Inference Server, strongly advising all users to upgrade.
Affected Systems
Testing Guide
1. Check the version of the Triton Inference Server you are running using `docker inspect <image_name>` or by checking the server's startup logs. The version is typically tagged in the container image (e.g., `nvcr.io/nvidia/tritonserver:23.02-py3`). 2. If the version is earlier than 23.03, you are vulnerable. 3. As a preventative measure, review who has access to your model repositories (e.g., S3 buckets, NFS shares) and remove any unnecessary write permissions.
Mitigation Steps
1. **Upgrade NVIDIA Triton Inference Server** to version 23.03 or later immediately. 2. **Restrict permissions to load models.** Do not allow untrusted users to upload models or modify model repositories on a production inference server. 3. **Run Triton in a least-privilege container.** Use a non-root user and apply security contexts and AppArmor/Seccomp profiles to limit the process's capabilities. 4. **Use a service mesh or network policies** to strictly control network traffic to and from the Triton server, preventing attackers from easily connecting to it or using it as a pivot point.
Patch Details
Patched in Triton Inference Server versions 23.03 and later.