A newly disclosed class of CI/CD attacks could have allowed attackers to inject malicious code into the PyTorch repository, leading to massive supply chain compromise, Praetorian security researcher John Stawinski says.
Initially detailed in December 2023, the attack method targets GitHub repositories with self-hosted runners attached and allows a threat actor to execute arbitrary code without requiring approval.
In short, an attacker can use a fork pull request to become a contributor to a repository that has a self-hosted runner attached, and then be able to run any GitHub workflow on the runner. If the runner was configured using the default steps, it is non-ephemeral, enabling persistent access.
The attack flow was discovered by Adnan Khan, who used it against GitHub’s own actions/runner-images repository and received a $20,000 bug bounty reward. Khan and Stawinski then identified thousands of other GitHub repositories prone to the attack.
The machine learning (ML) framework PyTorch, Stawinski explains, was one of their first targets, given its popularity. The child of Meta AI and now part of the Linux Foundation, PyTorch is used in various popular deep learning models.
Following the same steps that allowed them to gain access to GitHub’s repository, the researchers discovered that PyTorch used self-hosted runners that did not require workflow approval for fork pull requests from previous contributors, which allowed them to mount their attack.
Once they gained access, the researchers installed their own self-hosted runner on the vulnerable PyTorch runner, which allowed them to maintain persistence without raising suspicion.
However, they were more interested in the post-exploitation activities they could perform, hoping that a broad level of access would draw attention to the attack and trigger a prompt response.
The researchers were able to extract GitHub secrets used by PyTorch, including several sets of AWS secret access keys and GitHub Personal Access Tokens (PATs) that could allow them to perform various operations.
“Our exploit path resulted in the ability to upload malicious PyTorch releases to GitHub, upload releases to AWS, potentially add code to the main repository branch, backdoor PyTorch dependencies – the list goes on,” Stawinski says.
The researchers discovered they could trigger a workflow that used the compromised GitHub PATs to authenticate to the code hosting platform, and that those secrets “had access to over 93 repositories within the PyTorch organization, including many private repos, and administrative access over several”.
Using these compromised secrets, an attacker could modify releases, add code directly to the PyTorch main branch, or set up other paths to supply chain compromise.
“If the threat actor wanted to be more stealthy, they could add their malicious code to one of the other private or public repositories used by PyTorch within the PyTorch organization. Or they could smuggle their code into a feature branch, or steal more secrets, or do any number of creative techniques to compromise the PyTorch supply chain,” Stawinski notes.
In August 2023, the researchers submitted a vulnerability report to Meta, which informed them two months later that the issue was considered mitigated. After more back-and-forth messages discussing remediation, Meta said it issued a $5,000 bug bounty reward for the finding.
The mitigations for this attack are the same that apply to the GitHub Actions chain: the use of isolated, ephemeral self-hosted runners, and requiring approval for all pull requests coming from outside contributors.
“The issues surrounding these attack paths are not unique to PyTorch. They’re not unique to ML repositories or even to GitHub. Threat actors are starting to catch on, as shown by the year-over-year increase in supply chain attacks,” Stawinski concludes.