Cross-Tenant Model Poisoning in GCP Vertex AI via Insecure Custom Training Job Handling
Overview
A design flaw was identified in the Google Cloud Platform's Vertex AI service that could allow for cross-tenant data poisoning of custom model training jobs. The vulnerability existed in the way Vertex AI handled custom training jobs that referenced data from Google Cloud Storage (GCS) buckets. Under specific circumstances, an attacker in one GCP project could submit a training job that referenced a publicly accessible, but attacker-controlled, dataset in GCS. Due to improper validation of resource ownership at the API level, if the attacker could predict or discover the ID of a victim's running training job, they could issue a malformed update request. This request could trick the Vertex AI backend into swapping the data source of the victim's legitimate job mid-training to the attacker's malicious dataset. The impact was significant: an attacker could secretly poison a competitor's model, introducing subtle biases, backdoors, or causing a catastrophic drop in performance. The victim would be billed for a training job that used corrupted data, potentially without realizing the model had been compromised until it was deployed. This vulnerability highlighted the complexities of securing multi-tenant cloud AI platforms and the critical need for strict resource boundary enforcement.
Affected Systems
Testing Guide
This vulnerability was in the GCP control plane and cannot be directly tested by customers. The primary method of verification is to ensure your GCS data buckets are properly secured: 1. Go to the GCS console and review the permissions on all buckets containing training data. 2. Ensure that 'allUsers' and 'allAuthenticatedUsers' do not have any roles assigned. 3. Confirm that only specific, required service accounts have access to the buckets.
Mitigation Steps
1. **Vendor Patch**: Google Cloud has deployed a server-side patch to the Vertex AI control plane, and no direct user action is required to fix the root cause. 2. **Use VPC Service Controls**: Implement VPC Service Controls to create a service perimeter around your GCP projects, preventing Vertex AI jobs from accessing GCS buckets outside the perimeter. 3. **Principle of Least Privilege**: Apply strict IAM policies on GCS buckets used for training data. Ensure they are not publicly accessible and that only the necessary Vertex AI service accounts have read access. 4. **Audit Logs**: Regularly audit Cloud Audit Logs for any suspicious `JobService.UpdateCustomJob` or other anomalous API calls related to your Vertex AI resources.
Patch Details
A server-side fix was rolled out to all GCP regions by the Vertex AI engineering team, strengthening resource ownership validation for all training job API endpoints.