Cross-Tenant Data Isolation Bypass in Azure Machine Learning Compute Instances
Overview
Security researchers discovered a critical vulnerability in the Azure Machine Learning service that allowed for a full bypass of tenant isolation, enabling an attacker in one tenant to access data and models belonging to other tenants. The vulnerability, dubbed 'ML-DEvProxy', stemmed from a chain of issues including an overly permissive network configuration and a server-side request forgery (SSRF) flaw in an internal proxy service running on Azure ML compute instances. An attacker could first create a malicious compute instance in their own Azure ML workspace. By exploiting the SSRF, they could force the internal service to issue requests to the Azure Instance Metadata Service (IMDS). This allowed them to obtain a privileged service principal credential associated with the underlying Azure ML infrastructure. With this highly privileged credential, the attacker could then directly authenticate to the backend Azure Storage accounts that hosted other tenants' data. This granted them unauthorized read and write access to sensitive customer assets, including training datasets, proprietary models, and intellectual property. The vulnerability demonstrated a significant architectural flaw in a major cloud AI platform, showing that misconfigurations in backend services can undermine the security guarantees of multi-tenant environments. Microsoft promptly patched the issue by restricting network access for the vulnerable service, improving input validation, and rotating all potentially affected credentials.
Affected Systems
Testing Guide
1. **Confirm Workspace Network Configuration**: In the Azure Portal, navigate to your Azure Machine Learning workspace and verify that network isolation settings, such as the use of a Virtual Network and Private Endpoints, are configured according to your organization's security policy. 2. **Review IAM Roles**: Audit the IAM roles assigned to the identities associated with your compute instances and clusters. Ensure they do not have overly broad permissions (e.g., subscription-level Contributor). 3. **Check for Security Advisories**: Monitor the Microsoft Security Response Center (MSRC) for any advisories related to Azure Machine Learning.
Mitigation Steps
1. **Apply Cloud Provider Patches**: The vulnerability was patched by Microsoft on the backend. No direct user action is required to fix the underlying flaw. 2. **Use Private Endpoints**: To enhance security, configure Azure ML workspaces with Private Endpoints to ensure that communication with Azure resources like Storage and Key Vault occurs over a private network, reducing the attack surface. 3. **Implement Least Privilege**: Ensure that the managed identities and service principals associated with your ML workspaces have only the minimum permissions required for their tasks. 4. **Monitor Access Logs**: Regularly review access logs for your Azure Storage accounts and Key Vaults for any anomalous or unauthorized access patterns.
Patch Details
Microsoft patched the backend infrastructure; no customer action was required to fix the vulnerability itself. The fix involved network ACL changes and credential rotation.