An analysis of the Python code committed to PyPI packages has revealed the presence of thousands of hardcoded credentials, code security firm GitGuardian warns.
Working together with security researcher Tom Forbes, GitGuardian uncovered close to 4,000 unique secrets inside nearly 3,000 PyPI packages and says that more than 760 of these secrets were found to be valid.
Overall, the researchers identified 151 individual types of secrets, including AWS, Azure AD, GitHub, Dropbox, and Auth0 keys, credentials for MongoDB, MySQL, and PostgreSQL, and SSH, Coinbase, and Twilio Master credentials.
Valid credentials pose a critical and immediate threat to organizations, as threat actors can still exploit them, and validating leaked secrets becomes crucial in incident investigations.
According to GitGuardian, while they were able to validate less than 800 credentials, it does not mean that other leaked credentials are invalid.
“Only once a secret has been properly rotated can you know if it is invalid. Some types of secrets GitGuardian is still working toward automatically validating include Hashicorp Vault Tokens, Splunk Authentication Tokens, Kubernetes Cluster Credentials, and Okta Tokens,” the company notes.
The security firm also notes that the number of secrets leaked in PyPI packages has increased over time, and the inclusion of fresh, valid credentials is steadily increasing as well. More than 1,000 secrets have been added to PyPI over the past year alone.
What’s also alarming is the fact that any leaked secret is often included in multiple releases, which significantly increases the number of occurrences.
“To put those numbers in perspective, there are over 450,000 projects released through the PyPI website, containing over 9.4 million files. There have been over 5 million released versions of these packages. If we add up all the secrets shared across all the releases, we found 56,866 occurrences of secrets,” the researchers note.
Most of the leaked secrets were identified in .py files, but configuration/documentation files such as .json and .yml, along with ‘readme’ files were also found to store credentials. The researchers also found hundreds of secrets in various files within test folders.
The main cause of secrets exposure in PyPI, GitGuardian notes, is accidental leakage. Accidentally published files are a more common issue compared to making an entire package public, and new releases are often pushed quickly to remove those files.
To prevent leaking secrets, Python developers are advised to avoid using unencrypted credentials in their packages and to always scan the code for secrets before a release, making sure that they never leave the local machine.
“Exposing secrets in open-source packages carries significant risks for developers and users alike. Attackers can exploit this information to gain unauthorized access, impersonate package maintainers, or manipulate users through social engineering tactics,” GitGuardian notes.