An exposed Elasticsearch server was found to contain data on more than 1.2 billion people, Data Viper security researchers report.
The server was accessible without authentication and it contained 4 billion user accounts, spanning more than 4 terabytes of data, security researchers Bob Diachenko and Vinny Troia discovered last month.
Analysis of the data revealed that it pertained to over 1.2 billion unique individuals and that it included names, email addresses, phone numbers, and LinkedIn and Facebook profile information.
Further investigation led the researchers to the conclusion that the data came from two different data enrichment companies. Thus, the leak in fact represents data aggregated from various sources and kept up to date.
Most of the data was stored in 4 separate data indexes, labeled “PDL” and “OXY”, and the researchers discovered that the labels refer to two data aggregator and enrichment companies, namely People Data Labs and OxyData.
Analysis of the nearly 3 billion PDL user records found on the server revealed the presence of data on roughly 1.2 billion unique people, as well as 650 million unique email addresses.
Not only do these numbers fall in line with the statistics the company posted on their website, but the researchers were able to verify that the data on the server was nearly identical to the information returned by the People Data Labs API.
“The only difference being the data returned by the PDL also contained education histories. There was no education information in any of the data downloaded from the server. Everything else was exactly the same, including accounts with multiple email addresses and multiple phone numbers,” the researchers explain.
Vinny Troia also found in the leak information related to a landline phone number he was given roughly 10 years back as part of an AT&T TV bundle. Although the landline was never used, the information was present on the researcher’s profile, and was included in the data set PeopleDataLabs.com had on him.
The company told the researchers that the exposed server, which resided on Google Cloud, did not belong to it. The data, however, was clearly coming from People Data Labs.
Some of the information on the exposed Elasticsearch, the researchers revealed, came from OxyData, although this company too denied being the owner of that server. After receiving a copy of his own user record with the company, Troia confirmed that the leaked information came from there.
The researchers couldn’t establish who was responsible for leaving the server wide open to the Internet, but suggest that this is a customer of both People Data Labs and OxyData and that the data might have been misused rather than stolen.
“Due to the sheer amount of personal information included, combined with the complexities of identifying the data owner, this has the potential to raise questions on the effectiveness of our current privacy and breach notification laws,” the researchers conclude.
“From the perspective of the people whose information was part of this dump, this doesn’t qualify as a cut-and-dry data breach. The information ‘exposed,’ is already available on LinkedIn, Facebook, GitHub, etc. begging a larger discussion about how we feel about data aggregators who compile this information and sell it, because it’s a standard practice,” Dave Farrow, senior director of information security at Barracuda Networks, told SecurityWeek in an emailed comment.
Jason Kent, hacker at Cequence Security, also commented via email, saying, “Here we see a new and potentially dangerous correlation of data like never before. […] if an attacker has a rich set of data, they can formulate very targeted attacks. The sorts of attacks that can result in knowing password recovery information, financial data, communication patterns, social structures, this is how people in power can be targeted and eventually the attack can work.”