Application Security

Code Generated by GitHub Copilot Can Introduce Vulnerabilities: Researchers

A group of researchers has discovered that roughly 40% of the code produced by the GitHub Copilot language model is vulnerable.

Ionut Arghire

Published

August 31, 2021

A group of researchers has discovered that roughly 40% of the code produced by the GitHub Copilot language model is vulnerable.

A group of researchers has discovered that roughly 40% of the code produced by the GitHub Copilot language model is vulnerable.

The artificial intelligence model was designed to help programmers with their work by suggesting lines of code right in the editor. For that, Copilot was trained on publicly available open-source code, with support for dozens of programming languages, including Go, JavaScript, Python, Ruby, and TypeScript.

Looking at the code produced by Copilot, a group of five researchers concluded that a high percentage of it is vulnerable because the AI was trained on vulnerable code.

“However, code often contains bugs—and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions,” the researchers say.

The researchers analyzed the manner in which Copilot performs based on diverse weaknesses, prompts, and domains. They created 89 different scenarios in which the language model produced a total of 1,692 programs, approximately 40% of which were found to be vulnerable.

The academics performed both manual and automated analysis of the code generated by Copilot, and focused on MITRE’s 2021 CWE Top 25 list to evaluate the code generated by the AI model.

Some of the commonly encountered bugs include out-of-bounds write, cross-site scripting, out-of-bounds read, OS command injection, improper input validation, SQL injection, use-after-free, path traversal, unrestricted file upload, missing authentication, and more.

“As Copilot is trained over open-source code available on GitHub, we theorize that the variable security quality stems from the nature of the community-provided code. That is, where certain bugs are more visible in open-source repositories, those bugs will be more often reproduced by Copilot,” the researchers note.

Advertisement. Scroll to continue reading.

The academics conclude that, while Copilot certainly helps developers build code faster, it’s clear that developers should remain vigilant when using the tool. They also recommend the use of security-aware tooling to reduce the risk of introducing security bugs.

In this article:

SecurityWeek

Application Security

Code Generated by GitHub Copilot Can Introduce Vulnerabilities: Researchers

Related Content