Container Escape Vulnerability in Azure Machine Learning Compute Instances
Overview
Wiz Research discovered a set of high-severity vulnerabilities, dubbed 'Leaky Vessels', affecting cloud-based ML services, including Azure Machine Learning (AML). The primary issue in AML allowed an attacker who had already compromised a single AML compute instance (a containerized development environment) to escape the container and gain root access to the underlying Kubernetes host node. This host node was part of a multi-tenant cluster, meaning it was shared and ran workloads for other customers. The vulnerability stemmed from an overly permissive configuration in the Jupyter environment provided within the AML container, combined with insecure mounting of host paths. An attacker could leverage this to traverse the filesystem and access the Kubelet credentials for the host node. With these credentials, the attacker could pivot to other containers running on the same node, potentially accessing their data, models, and intellectual property. The impact was critical, as it broke the tenant isolation model that is fundamental to cloud security, allowing for cross-customer data access. The disclosure prompted Microsoft to re-architect parts of the AML infrastructure to enforce stricter isolation and remove the attack paths.
Affected Systems
Testing Guide
This vulnerability is not directly testable by end-users, as it involves exploiting the cloud provider's managed infrastructure. Verification that you are no longer affected relies on confirmation and security advisories from Microsoft.
Mitigation Steps
1. **Provider-Side Patching**: Microsoft has patched the underlying infrastructure, and no direct user action is required for this specific vulnerability. Ensure your cloud services are always up to date. 2. **Defense in Depth**: Do not rely solely on provider isolation. Treat your ML compute instances as a security boundary. Monitor them for anomalous activity, such as unexpected network connections or process execution. 3. **Least Privilege for Code**: When running code within an AML instance, ensure it runs with the lowest possible privileges and does not require root access inside the container. 4. **Credential Management**: Avoid storing long-lived credentials or secrets within the compute instance itself. Use managed identities or short-lived tokens for accessing other Azure resources.
Patch Details
Microsoft deployed patches across the Azure Machine Learning infrastructure to fix the misconfigurations and improve host-level isolation after being notified by the researchers.