Microsoft AI Research Exposes 38TB of Private Data via Misconfigured Azure SAS Token
Overview
A significant data exposure incident occurred when Microsoft's AI research division accidentally published a GitHub repository containing a highly permissive SAS (Shared Access Signature) token for an internal Azure Blob Storage account. The misconfiguration was discovered by security researchers at Wiz. The SAS token was configured to grant 'Full Control' permissions to the entire storage account, rather than providing read-only access to a specific container. This allowed anyone who discovered the token to read, write, and delete all data within the account. The exposed storage account contained 38 terabytes of sensitive data, including backups of two employees' workstations, internal Microsoft Teams messages, and a vast collection of private AI models, training data, and internal source code. The root cause was a flaw in the infrastructure-as-code (IaC) configuration used to generate the link, which incorrectly specified the permission level. This incident highlighted the severe risks associated with misconfigured cloud AI services and the danger of committing secrets to public repositories. It serves as a stark reminder of the need for robust secret scanning in CI/CD pipelines and the principle of least privilege when managing access to cloud-based data stores for AI workloads.
Affected Systems
Testing Guide
1. **Scan GitHub Repositories:** Use a tool like TruffleHog or Gitleaks to scan your organization's GitHub repositories for strings matching the format of Azure SAS tokens. 2. **Audit Azure Storage Accounts:** In the Azure Portal, navigate to your storage accounts. Review 'Shared access signature' settings to check for overly permissive, long-lived tokens. Check the 'Networking' and 'Containers' access level settings to ensure they are not set to 'Public'. 3. **Review IaC Templates:** Manually inspect or use static analysis tools on your ARM, Bicep, or Terraform templates to ensure that SAS token generation logic adheres to the principle of least privilege.
Mitigation Steps
1. **Enforce Least Privilege for SAS Tokens:** Always scope SAS tokens to the minimum required permissions (e.g., read-only) and resources (e.g., a specific container or blob, not the entire account). Set short expiration times. 2. **Implement Secret Scanning:** Integrate automated secret scanning tools (like `git-secrets` or GitHub's native scanner) into your CI/CD pipeline to prevent tokens, keys, and credentials from being committed to repositories. 3. **Use Managed Identities:** Whenever possible, use Azure Managed Identities for services like Azure VMs and Functions to access other Azure resources, which eliminates the need to manage and store SAS tokens or storage keys in code. 4. **Regularly Audit Storage Permissions:** Continuously monitor and audit public access levels and access policies for all cloud storage accounts.
Patch Details
The exposed token was revoked by Microsoft immediately upon notification. The fix is procedural, involving correcting IaC templates and improving secret hygiene.