GitHub Copilot Vulnerable to Repository Squatting for Code Injection
Overview
Researchers demonstrated a supply chain vulnerability in GitHub Copilot's code suggestion mechanism stemming from 'repository squatting' or 'dependency confusion' in a source code context. Copilot is trained on a vast corpus of public GitHub repositories. An attacker can identify a commonly used but non-existent or deprecated repository that developers frequently import in their code (e.g., `import popular_lib.utils`). The attacker then creates a new public repository on GitHub with that exact name and path (`github.com/attacker/popular_lib`) and fills it with malicious code disguised as legitimate functions. When a developer using Copilot types `import popular_lib.utils` and begins to use a function from it, Copilot may suggest code completions directly from the attacker's malicious repository, as it is now part of its knowledge base. This can lead to the developer unknowingly accepting and incorporating backdoors, credential stealers, or other malware directly into their application's source code. The attack is potent because it leverages the developer's trust in the AI assistant and bypasses traditional package manager security controls, as the malicious code is introduced before the dependency management stage.
Affected Systems
Testing Guide
1. Identify a popular but non-existent import statement that might be used in your technology stack (e.g., by analyzing typos in open-source code). 2. Create a public GitHub repository with the corresponding name and populate it with a benign but identifiable function (e.g., one that prints a unique string). 3. In a separate project, with GitHub Copilot active, type the import statement and attempt to invoke the function. 4. Observe if Copilot suggests code completions from your test repository. If it does, the version you are using is susceptible to this attack pattern.
Mitigation Steps
1. **Code Suggestion Scrutiny:** Developers must treat all AI-generated code as untrusted and subject it to the same rigorous code review process as human-written code. 2. **Verify Imports:** Before accepting code that introduces a new import, verify that the source repository is legitimate and maintained by a trusted party. 3. **Use Static and Dynamic Analysis:** Integrate SAST (Static Application Security Testing) and DAST (Dynamic Application Security Testing) tools into the CI/CD pipeline to catch malicious patterns or behaviors introduced by AI-suggested code. 4. **Prefer Official Snippets:** When possible, rely on official documentation and code snippets from trusted sources over AI suggestions for security-critical components.
Patch Details
This is a fundamental challenge of training models on public data. Mitigation relies on developer awareness and process, though AI providers are researching ways to vet training data sources.