Server-Side Request Forgery (SSRF) in Cloud AI Service Web Data Ingestion Exposes Instance Metadata
Overview
A critical Server-Side Request Forgery (SSRF) vulnerability was discovered in a data ingestion feature of a major cloud AI platform. The feature allowed users to provide a public URL to a document (e.g., PDF, HTML) which the service would fetch, parse, and use as context for an LLM. Researchers found that the service's URL validation was insufficient, failing to block internal or special-use IP addresses. By providing the cloud's internal metadata service endpoint (e.g., `http://169.254.169.254/latest/meta-data/iam/security-credentials/role-name`), an attacker could trick the AI service's backend infrastructure into making a request to itself. The metadata service would respond with sensitive information, including temporary IAM credentials for the service's underlying compute instance. The AI service would then process this response as if it were a normal document. The attacker could then ask the LLM, 'Summarize the document you just read,' causing the LLM to output the stolen IAM credentials directly in the chat response. These credentials could then be used by the attacker to gain unauthorized access to the cloud provider's internal APIs, potentially allowing them to access data from other customers or manipulate the service. The incident served as a stark reminder that even as AI services become more complex, they remain vulnerable to classic web application flaws like SSRF.
Affected Systems
Testing Guide
1. **Identify URL Input:** Find a feature in the AI service that accepts a URL for processing (e.g., 'Summarize this webpage'). 2. **Provide Metadata URL:** Input the IP address of the relevant cloud's metadata service. For AWS/GCP, this is `http://169.254.169.254/`. For Azure, it's `http://169.254.169.254/metadata/instance`. 3. **Attempt to Access Credentials:** Try a more specific path, like `http://169.254.169.254/latest/meta-data/iam/security-credentials/` (for AWS). 4. **Check the Response:** If the service returns an error indicating the URL could not be resolved or is forbidden, it is likely protected. If it returns any data that looks like JSON containing an `AccessKeyId` or `SecretAccessKey`, it is vulnerable.
Mitigation Steps
1. **Use Allow-lists for URLs:** Cloud providers should restrict URL-fetching features to only connect to explicitly allowed domains or, at a minimum, block all private, link-local, and loopback IP address ranges. 2. **Disable Metadata Access:** Configure backend compute instances to block access to the metadata service where it's not strictly necessary, or use instance metadata service v2 (IMDSv2) which requires a session token and mitigates basic SSRF. 3. **Network Segmentation:** Run data ingestion workers in a separate, highly restricted VPC with strict egress firewall rules that prevent connections to internal infrastructure. 4. **Data Redaction:** Implement a sanitization layer that detects and redacts sensitive data patterns (like credentials) from content before it is passed to an LLM.
Patch Details
Major cloud providers have implemented network-level controls and improved URL validation in their managed AI services to block access to internal metadata endpoints following private disclosure.