AI cybersecurity startup Lasso has discovered more than 1,600 valid Hugging Face API tokens exposed in code repositories, providing access to hundreds of organizations’ accounts.
Leaked secrets, such as tokens, have long been the focus of code-hosting platforms and security researchers, given the high risk they pose when falling into the wrong hands.
Hugging Face API tokens, which allow developers and organizations to integrate large language models (LLMs) and manage Hugging Face repositories, are no different.
A provider of tools for building machine learning (ML) applications, Hugging Face is a popular resource for the developers of LLM projects, providing them with access to hundreds of thousands of AI models and datasets in its repository.
In November 2023, Lasso’s researchers started hunting for exposed Hugging Face API tokens on both Hugging Face and GitHub, eventually identifying 1,681 leaked valid tokens across both platforms.
These tokens, the researchers say, provided access to 723 organizations’ accounts, some pertaining to large organizations such as Google, Meta, Microsoft, VMware, and others.
“Among these accounts, 655 users’ tokens were found to have write permissions, 77 of them to various organizations, granting us full control over the repositories of several prominent companies,” Lasso notes.
Some of the tokens, the security firm says, provided full access to the accounts of organizations that own models with millions of downloads.
“With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities. This implies a dire threat, as the injection of corrupted models could affect millions of users who rely on these foundational models for their applications,” Lasso notes.
The leaked tokens, Lasso says, also expose the repositories to private model theft and to training data poisoning, an attack technique impacting the integrity or ML models.
During Lasso’s investigation, Hugging Face deprecated its org_api tokens and blocked their use in its Python library. While this essentially removed write permissions to the impacted repositories, it did not block read permissions.
Lasso says it has informed the affected users and organizations of its findings and that many of them took immediate action, revoking the tokens and removing the public access token code. Hugging Face was also informed of the findings.
Related: PyPI Packages Found to Expose Thousands of Secrets
Related: GitHub Improves Secret Scanning Feature With Expanded Token Validity Checks
Related: GitHub Warns of Private Repositories Downloaded Using Stolen OAuth Tokens