Security Experts:

Connect with us

Hi, what are you looking for?


Data Protection

Recruitment Site Scraped, Leaked 8 Million GitHub Profiles

A new tech recruitment project scraped user data from GitHub and other similar websites and inadvertently leaked it online through a misconfigured MongoDB database.

A new tech recruitment project scraped user data from GitHub and other similar websites and inadvertently leaked it online through a misconfigured MongoDB database.

Australian security expert Troy Hunt, the owner of the Have I Been Pwned service, was recently provided a 600 Mb MongoDB backup file containing data from a tech recruitment website called GeekedIn. A closer analysis revealed that the file contained information on more than 8 million GitHub profiles, including names, email addresses, locations and other data.

However, just over one million of the exposed email addresses are valid, while the rest are represented as “[email protected]” and are associated with GitHub accounts with no public email address. The MongoDB database also included thousands of accounts apparently taken from BitBucket.

GeekedIn, announced by its developer in June, is a service that crawls code hosting websites, such as GitHub and BitBucket, and creates profiles for open-source projects and developers. The goal of the service is to help recruiters find developers who match their needs and help developers “enrich their CV.”

The data harvested by GeekedIn is publicly available on GitHub and it does not include any sensitive data such as passwords.

However, while GitHub does allow users to scrape public data from its website, it prohibits the use of scraped information for commercial purposes. GeekedIn was planning to ask recruiters and companies for hundreds of euros per month to use the harvested data.

The second problem is that the data was stored in a MongoDB database that was not protected and could have been accessed by anyone. These types of incidents are increasingly common, with some organizations exposing the details of hundreds of millions of individuals due to misconfigured databases.

“As someone in the data breach myself, I don’t want my data being sold this way,” Hunt said. “And again, yes, you can go and pull this data publicly on a per-individual basis but the constant response I got from close confidants I shared this information with is that ‘it just feels wrong’. And it is wrong, not just the scraping of GitHub in the first place in order to commercialise our information, but then subsequently losing it via a MongoDB with no password and now having it float around the web in data breach trading circles.”

After being notified by Hunt, GeekedIn developers promised to take measures to secure the data. They have also taken the website offline.

Users affected by this incident can use the Have I Been Pwned service to find out exactly which of their information was leaked.

Written By

Eduard Kovacs (@EduardKovacs) is a contributing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join this webinar to learn best practices that organizations can use to improve both their resilience to new threats and their response times to incidents.


Join this live webinar as we explore the potential security threats that can arise when third parties are granted access to a sensitive data or systems.


Expert Insights

Related Content

Application Security

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Data Protection

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Data Protection

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Application Security

Many developers and security people admit to having experienced a breach effected through compromised API credentials.


The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Application Security

Fortinet on Monday issued an emergency patch to cover a severe vulnerability in its FortiOS SSL-VPN product, warning that hackers have already exploited the...

Cybersecurity Funding

CommandK announced that it has raised $3 million in a seed funding round for a solution designed to help organizations secure sensitive data.

Cybersecurity Funding

Los Gatos, Calif-based data protection and privacy firm Titaniam has raised $6 million seed funding from Refinery Ventures, with participation from Fusion Fund, Shasta...